// AI security gatewayon-prem only// openai · anthropic · google · azure · vllm · ollamaair-gap capable// PII redactionprompt-injection blocking// per-request audit logtokenized streaming// AI security gatewayon-prem only// openai · anthropic · google · azure · vllm · ollamaair-gap capable// PII redactionprompt-injection blocking// per-request audit logtokenized streaming

[ 00 ] introv 1.0 · 2026

the security gatewayfor the ai-nativeenterprise.

// what it isIn-network gateway between every AI app, agent, copilot and the LLM provider

// what it doesRedacts PII · blocks injection · enforces policy · ships an audit row

// where it runsYour VPC. Your hardware. Air-gap capable. No outbound telemetry.

// next

scroll to discover

[ 01 ]the problem

// every AI integration is a new exfiltration vector

marketing wired a chatbot.sales bought a notetaker.your copilot is loggingprompts to somewhere else.

[ option a ]

block llms entirely

Lose the productivity gains. Watch shadow IT route around you anyway.

[ option b ]

trust each app to redact

Marketing's chatbot, sales' notetaker, and engineering's copilot each reinvent the same broken regex.

[ option c ]

use a saas guardrail

Add another vendor to your data flow, your audit scope, and your incident-response playbook.

[ option d ]

tell the auditor "we don't really know"

Watch them ask again next quarter, and the quarter after that.

[ secureprompt — the fourth option ]

one in-network gateway. owned by you. observable to your team.

Every byte of AI-bound traffic in your network funnels through a single inspection point. Every request answered, attributed, and auditable.

[ 02·1 ]

gateway. drop-in openai-compatible api.

swap a base urlSet OPENAI_API_BASE_URL to your gateway. One sp_… key works across every provider you've registered.

six providers, one shapeOpenAI, Anthropic, Google, Azure, vLLM, Ollama — same request, same audit row, same policy.

streaming-safePlaceholder fragments straddling chunk boundaries are held in a request-scoped vault until they can be safely emitted or restored.

[ 02·2 ]

chat. a chatgpt tied to your idp.

SSO from day onePer-user attribution. Every chat tied to a user and device in the audit log.

same pipeline as the gatewayPII tokenized upstream. Restored on return. The model never sees Alice's data.

desktop & webProductivity surface that doesn't push your team back to shadow IT.

[ 02·3 ]

console. the operator surface.

answer the auditor's questionWhat got sent, by whom, to which model, and was anything sensitive in it.

policy as codeWorkspace-scoped rules. Block, redact, flag, allow — configurable per route.

analytics that map to spendReconciled token counts. Per-user, per-workspace, per-model.

[ 03 ]mechanics

// honest mechanics. visible numbers. no magic.

we show our work.

[ 03·1 ]

pii redaction that actually works.

structural matchersCard numbers, phone, email, IBAN, SSN — exact, deterministic.

multilingual NERNames, organizations, addresses across English, Spanish, German, French, Russian, Uzbek.

positional offsetsReplacement happens on exact byte ranges. Never lazy substring substitution that mangles the rest of your prompt.

[ 03·2 ]

injection blocked at ≥ 0.99.

realistic thresholdEvery meta-prompt looks like injection at low confidence. We block at 0.99. Templating gets through. Real attempts don't.

full audit contextEvery block records the score and the reason. Reviewers see exactly why.

[ 03·3 ]

three honest numbers for tokens.

estimate, charged pre-flightSo concurrent bursts can't slip past the budget.

actual, from the providerThe number your bill is based on.

reconciled, in the dashboardWhat you charged minus what you used. Refunds applied automatically.

// estimate

247

charged pre-flight

// actual

211

from provider

// reconciled

−36

refund applied

[ 03·4 ]

two latency timers, not one.

upstream TTFTTime-to-first-byte from the provider — the metric that matches user experience.

gateway overheadOur pre-flight cost. When chat is slow, you know exactly which one is at fault.

[ 03·a ]audit row

// every request, fully attributed

one row per request.written in order.

request_idreq_8c2f3a91…d44c

workspaceacme-prod

useralice@acme.com

budget_pre247 tokens

redactions2 · {{Person_1}}, {{Email_1}}

injection_score0.07 — allow

policy_matchworkspace-strict

upstreamopenai/gpt-4o

ttft_ms318

overhead_ms42

budget_actual211 · reconciled (−36)

final_actionallow

[ 04 ]on-prem only

// not "an on-prem option" — the only deployment

the cloud versiondoes not exist.

[ 04·1 ]

air-gapped capable.

All required AI models pre-bundled or downloaded once at install. After that, runs without internet.

[ 04·2 ]

local ai inference.

PII detection and injection classification stay inside your cluster. Sensitive data never leaves to be inspected.

[ 04·3 ]

minimal attack surface.

Production binaries stripped of build tooling, runtimes, and shells. What ships is what runs.

[ 04·4 ]

license-gated, fail-closed.

Signed license file controls activation. Expired licenses fail closed at the boundary, before any business logic runs.

[ 04·5 ]

no outbound telemetry. by default.

Optional support tunnel — explicit opt-in, time-bounded, fully auditable. Off until you turn it on.

[ 05 ]built for

// if you have more than a handful of LLM integrations

you have this problem.

[ A ]

security & compliance

Teams who need to say yes to AI without inheriting third-party data risk.

[ B ]

platform engineering

Companies that have outgrown "every team picks their own LLM API key."

[ C ]

regulated industries

Healthcare, finance, legal, government — where data sovereignty isn't optional.

[ D ]

on-prem mandates

IP-sensitive engineering, model evaluation, sovereign deployments.

[ E ]

self-hosted models

Teams running vLLM or Ollama who want one policy and audit layer across self-hosted and commercial.