We’ve written about how to prevent logging sensitive information when making network requests, but that approach only works if you’re dealing with parameters.
What happens when you’re dealing with free text? Filtering the entire string may not be an option if an external API needs to process the value. Think chatbots or LLMs.
You could use a regex to filter sensitive information (such as credit card numbers or emails), but that won’t capture everything, since not all sensitive information can be captured with a regex.
Fortunately, named-entity recognition (NER) can be used to identify and classify real-world objects, such as a person, or location. Tools like MITIE Ruby make interfacing with NER models trivial.
By using a combination of regex patterns and NER entities, Top Secret effectively filters sensitive information from free text—here are some real-world examples.
If you want to see Top Secret in action, you might enjoy this live stream. Otherwise, see the examples below.
Working with LLMs
It’s not uncommon to send user data to chatbots. Since the data might be free-form, we should be diligent about filtering it using the approach mentioned above.
However, it’s likely we’ll want to “restore” the filtered values when returning a response from the chatbot. Top Secret returns a mapping that would allow for this.
You’d likely want to provide instructions in the request.
instructions = <<~TEXT
I'm going to send filtered information to you in the form of free text.
If you need to refer to the filtered information in a response, just reference it by the filter.
TEXT
The exchange might look something like this.
Caller sends filtered text
result = TopSecret::Text.filter("Ralph lives in Boston.") # Send this to the API result.output # => [PERSON_1] lives in [LOCATION_1]. # Save the mapping to "restore" response mapping = result.mapping # => { PERSON_1: "Ralph", LOCATION_1: "Boston" }
API responds with filter
"Hi [PERSON_1]! How is the weather in [LOCATION_1] today?"
Caller can “restore” from the mapping
response = "Hi [PERSON_1]! How is the weather in [LOCATION_1] today?" # Restore the response from the mapping result = TopSecret::FilteredText.restore(response, mapping: mapping) result.output # => Hi Ralph! How is the weather in Boston today?
Filtering conversation history
When working with conversation state you should filter every message before including it in the request. This ensures no sensitive data slips through from previous messages. Here’s what that might look like.
require "openai"
require "top_secret"
openai = OpenAI::Client.new(
api_key: Rails.application.credentials.openai.api_key!
)
original_messages = [
"Ralph lives in Boston.",
"You can reach them at ralph@thoughtbot.com or 877-976-2687"
]
# Filter all messages
result = TopSecret::Text.filter_all(original_messages)
filtered_messages = result.items.map(&:output)
user_messages = filtered_messages.map { {role: "user", content: it} }
# Instruct LLM how to handle filtered messages
instructions = <<~TEXT
I'm going to send filtered information to you in the form of free text.
If you need to refer to the filtered information in a response, just reference it by the filter.
TEXT
messages = [
{role: "system", content: instructions},
*user_messages
]
chat_completion = openai.chat.completions.create(messages:, model: :"gpt-5")
response = chat_completion.choices.last.message.content
# Restore the response from the mapping
mapping = result.mapping
restored_response = TopSecret::FilteredText.restore(response, mapping:).output
puts(restored_response)
Prevent storing sensitive information with validations
Top Secret can also be used as a validation tool to prevent storing sensitive information in your database.
class Message < ApplicationRecord
validate :content_cannot_contain_sensitive_information
private
def content_cannot_contain_sensitive_information
result = TopSecret::Text.filter(content)
return if result.mapping.empty?
errors.add(:content, "contains the following sensitive information #{result.mapping.values.to_sentence}")
end
end
If the validation is too strict, you can override or disable any of the filters as needed.
--- a/app/models/message.rb
+++ b/app/models/message.rb
@@ -4,7 +4,7 @@ class Message < ApplicationRecord
private
def content_cannot_contain_sensitive_information
- result = TopSecret::Text.filter(content)
+ result = TopSecret::Text.filter(content, people_filter: nil, location_filter: nil)
return if result.mapping.empty?
errors.add(:content, "contains the following sensitive information #{result.mapping.values.to_sentence}")
Wrapping up
It’s our responsibility to protect user data. This is more important than ever given the rise in popularity of chatbots and LLMs. Tools like Top Secret aim to reduce this burden.