---
title: Anonymizing User, Company, and Location Data Using Faker
teaser:
tags: web,rails
author: Adarsh Pandit
published_on: 2012-02-22
---

Often during development it’s useful to have realistic data to get a sense of
how an app would behave in the wild. [Seed
data](http://railscasts.com/episodes/179-seed-data) is one useful method to get
going pre-launch, but production data is always preferable.

However production data can contain sensitive user information, which is useful
for the dev team but nerve-wracking for those looking to avoid a PR disaster,
say [if sensitive equipment is left lying around in
bars](http://gizmodo.com/5520438/how-apple-lost-the-next-iphone).

One solution we’ve used recently to anonymize client data is to obscure the
relevant content using the [Faker gem](http://faker.rubyforge.org/) and the rand
function.

    User.all.each do |user|
      genders = ['male', 'female']
      user.update_attributes!(
        :born_on => rand(50*365).days.ago
        :email => (rand(1000) + 100).to_s + Faker::Internet.email,
        :first_name => Faker::Name.first_name,
        :gender => genders.rand,
        :last_name => Faker::Name.last_name)
    end

Faker's seed name list is limited and leads to duplicates quickly. You can
further randomize fields by prepending random numbers as above.

Anonymizing company names is also straightforward. Faker provides fun catch
phrase generators of fake business jargon such as "Inverse 24/7 utilisation"

    Company.all.each do |company|
      company.update_attributes!(
        :description_html => Faker::Company.catch_phrase,
        :name => Faker::Company.name,
        :twitter_username => Faker::Internet.user_name,
        :url => 'http://' + Faker::Internet.domain_name)
    end

To anonymize location-based data is trickier - randomizing the
latitude/longitude values would look scattered on a map view. One method is to
keep the (lat, long) pairs together but randomize them across the column by
loading them into an array, shuffling, then replacing the existing data.

Here we use a modified verison of the [previously-mentioned inject
method.](https://thoughtbot.com/blog/post/17782192029/derive-inject-for-a-better-understanding)

    location_array = Location.all.inject([]) do |result, location|
      result &#60;&#60; [location.lat, location.lng]
    end.shuffle

    Location.all.each do |location|
      lat, lng = location_array.pop
      location.update_attributes(
        :city => Faker::Address.city,
        :extended_address => Faker::Address.secondary_address,
        :lat => lat,
        :lng => lng,
        :phone => Faker::PhoneNumber.phone_number,
        :postal_code => Faker::Address.zip_code,
        :state => Faker::Address.state_abbr,
        :street_address => Faker::Address.street_address
      )
    end

Faker generates numbers with prefixes and extension numbers, such as "+1 (877)
976-2687 x1234". For a strict (XXX-XXX-XXXX) format, use:

    :phone => (rand(900) + 100).to_s + "-" + (rand(9000) + 1000).to_s + "-" + (rand(9000) + 1000).to_s

Complete code available at [this Gist](https://gist.github.com/1871104).
