INSIGHTS & IDEAS
arrow

PII Detector for LLMs — Governance

After “LLM Part 7 | Governance” I got some questions on how to build the PII/SPI detector for LLM outputs. In the following video, I am using presidio, spacy, and guardrails (however, imho, nemo’s sensitive_data_detection is better option).

# pip install presidio-analyzer presidio-anonymizer -q # python -m spacy download en_core_web_lg -q # pip install guardrails-ai # guardrails hub install hub://guardrails/detect_pii --quiet from guardrails.hub import DetectPII from guardrails.types import OnFailAction import guardrails as gr from rich import print # Create Guard object with this validator One can specify either pre-defined set of PII or SPI (Sensitive Personal # Information) entities by passing in the `pii` or `spi` argument respectively. It can be passed either during # initialization or later through the metadata argument in parse method. One can also pass in a list of entities # supported by Presidio to the `pii_entities` argument. pii_guard = gr.Guard().use(DetectPII(pii_entities="pii", on_fail=OnFailAction.FIX)) # Parse the text pii_text = ("My email address is me@chrisshayan.com and my phone number is 1234567890") pii_output = pii_guard.parse(llm_output=pii_text,) print(pii_output) spi_text = ("My email address is me@chrisshayan.com, my credit card is 012345678912") spi_guard = gr.Guard().use_many(DetectPII(pii_entities="spi", on_fail=OnFailAction.FIX)) spi_output = spi_guard.parse(llm_output=spi_text, ) print(spi_output)