Skip to content

Detectors

anonymizer ships with detectors organized into tiers by criticality.

P0 — Critical (must be ≥99% recall)

These are the categories where a single miss is unacceptable.

CategoryDetector typeLanguages
Names (personal)NER (Natasha / spaCy)ru, en
Tax IDs (ИНН, СНИЛС, ОГРН)Regex + checksumlanguage-agnostic
Bank cardsRegex + Luhnlanguage-agnostic
IBANRegex + checksumlanguage-agnostic
EmailsRegexlanguage-agnostic
Phone numbers (RU, E.164)Regex + format normalizationlanguage-agnostic

P1 — Important (≥95% recall)

CategoryDetector typeLanguages
Companies (legal entities)NERru, en
AddressesNER + postal-code regexru, en
DatesRegex (all dates masked in MVP-0)language-agnostic
IP / MACRegexlanguage-agnostic
URLsRegexlanguage-agnostic

P2 — Warn-only

Categories that surface as warnings but don’t auto-replace in MVP-0. The user can opt-in via manual mask.

Currently empty for MVP-0.

What is NOT detected

  • Free-form aliases or nicknames not seen in NER training data
  • Handwritten names in scanned PDFs (no OCR in MVP-0)
  • Implicit references like “my client” or “the daughter of”
  • PII inside image attachments or signature blocks

If something important is missed, see Reporting feedback.