Why PII Detection Is Not Enough

Most AI platforms detect sensitive data and flag it. AOSentry tokenizes it before any model provider sees it. The difference matters more than you think.

Abstract illustration of sensitive data being encrypted into golden tokens before passing through a gateway

The problem with detection

Most AI platforms that claim to protect sensitive data do the same thing. They scan outbound prompts, identify personally identifiable information, and then flag it. Sometimes they redact it. Sometimes they mask it. Sometimes they just log it and let it through anyway with a warning in a dashboard somewhere.

This is detection. It is the bare minimum. And it is not enough.

Detection tells you that a Social Security number was present in a prompt. It does not prevent that number from reaching a third-party model provider’s servers. Redaction strips the data out entirely, which protects privacy but destroys the context that made the prompt useful in the first place. Masking replaces sensitive values with placeholder characters, but research has repeatedly shown that masked data can be partially reconstructed from surrounding context, especially by the very large language models the masking is supposed to protect against.

There is a deeper problem. Even when detection systems catch obvious PII formats like phone numbers or credit card numbers, they routinely miss less structured sensitive data. A patient describing symptoms alongside their date of birth. A financial analyst referencing account details in natural language. An employee pasting an internal document that contains client addresses embedded in free text. Pattern matching has limits, and those limits are exactly where real-world sensitive data lives.

What detection misses

Consider what happens when a support agent pastes a customer email into an AI assistant to draft a response. A detection system flags the customer’s email address and phone number. Perhaps it redacts them. The prompt goes to the model provider with those fields stripped out.

Now the model has no idea who the customer is. It cannot reference their account. It cannot personalize the response. The agent gets back a generic template that requires manual editing to be useful. The entire point of using AI in that workflow has been undermined.

This is the fundamental tension. Redaction protects privacy by destroying utility. Detection without redaction protects nothing at all. Most platforms force you to choose between one or the other.

Tokenization is different

AOSentry takes a different approach. Instead of detecting and redacting, it tokenizes.

When a prompt passes through AOSentry, every piece of PII is identified and replaced with an encrypted token before it leaves customer infrastructure. The original values are stored locally in a secure vault that never communicates with any external model provider. The token is meaningless to anyone who does not hold the decryption key, which never leaves the customer’s environment.

The model receives the prompt with tokens in place of sensitive data. It processes the prompt, reasons about it, and generates a response that includes those same tokens where the original data would naturally appear. When the response returns through AOSentry, the tokens are decrypted and the original values are restored. The end user sees a natural, coherent response with their actual data. The model provider saw nothing.

This is not masking. Masked data has a statistical relationship to the original. Tokens do not. They are cryptographically generated replacements with zero information leakage. There is no partial reconstruction, no inference from surrounding context, no residual pattern for a model to memorize or a breach to expose.

Granular control by entity type

Not all sensitive data requires the same treatment. A company might need to block Social Security numbers entirely while tokenizing email addresses and allowing anonymized IP addresses through for debugging purposes.

AOSentry supports configurable rules per entity type. Social Security numbers, credit card numbers, email addresses, phone numbers, IP addresses, medical information protected under HIPAA, financial account numbers. Each category can be independently set to one of three actions: block the prompt entirely, redact the value, or tokenize it. This is not a single toggle. It is a policy engine that reflects how organizations actually think about data classification.

A healthcare company can tokenize patient names and medical record numbers while blocking any prompt that contains a raw Social Security number. A financial services firm can tokenize account numbers while redacting credit card CVVs outright. The rules match the regulatory reality of the organization, not the generic defaults of a platform vendor.

Audit everything

Tokenization without accountability is incomplete. Every time a PII token is decrypted in AOSentry, the event is logged with the identity of the accessor and the stated reason for decryption. This creates an immutable audit trail that answers the questions regulators actually ask: who accessed this data, when, and why.

This matters for HIPAA, SOC 2, GDPR, and every other compliance framework that requires demonstrable access controls over sensitive data. Detection systems can tell you that PII was present. Tokenization with audit logging can tell you exactly what happened to it at every stage of its lifecycle.

Privacy by architecture, not by policy

Most approaches to AI privacy are policy-based. They rely on model providers honoring their terms of service, on detection systems catching every possible PII format, on employees following data handling procedures. Policies are necessary but insufficient. They describe what should happen. They do not guarantee what does happen.

AOSentry implements privacy by architecture. Sensitive data is physically prevented from leaving customer infrastructure in raw form. The guarantee is structural, not contractual. It does not depend on a model provider’s data retention policies, because the model provider never receives the data. It does not depend on a detection system’s accuracy, because even if a novel PII format is missed by the classifier, the tokenization layer provides defense in depth through configurable rules that can be tuned as new data types emerge.

The end user gets natural, useful AI responses with their actual data intact. The model provider processes only encrypted tokens. The compliance team gets a complete audit trail. The security team gets cryptographic guarantees instead of vendor promises.

This is what it means to build privacy into the architecture rather than bolt it on as an afterthought. Detection is a feature. Tokenization is a foundation.

← Back to Blog