Data Masking using SQL Loader

I use Enterprise Manager Cloud Control to mask Oracle databases so they can be shared with our ERP vendor. I intended to do the same with our legacy Informix database but I found that we did not have the Database Gateway licenses required for using Enterprise Manager Data Masking Pack on Informix. We also had a requirement to replace customer names and

I use Enterprise Manager Cloud Control to mask Oracle databases so they can be shared with our ERP vendor. I intended to do the same with our legacy Informix database but I found that we did not have the Database Gateway licenses required for using Enterprise Manager Data Masking Pack on Informix. We also had a requirement to replace customer names and addresses with a set of twenty arbitrary values. Due to referential integrity, the replacement values had to be deterministic/repeatable – not random.

The Informix DBA sent me flat file exports from the Informix tables containing the customer records. I mounted these as external tables and wrote SQL to select from them and write to Oracle tables. This was so slow that I abandoned that idea and decided to load the files into Oracle tables using SQL Loader. It’s been so long since I have used SQL Loader that I had to relearn how to build control files. I was amazed at how fast the data loaded. I had the idea of using SQL Loader to mask the data during the load.

I decided to use the oracle_hash function inside a case statement in the SQL Loader control file specification for the sensitive columns. This let me replace the names and addresses with new values in a repeatable way. I forced the names to uppercase and then produced a hash value which I associated with the fictional name. That fictional name got loaded into the Oracle table.

I am very satisfied with the results and happy I had the opportunity to re-familiarize myself with SQL Loader.

Here’s the content of the control file:

options
(
direct=true, rows=200000, multithreading=true
)
unrecoverable
load data
infile customers.unl badfile customers.bad discardfile customers.dis
into table customers
fields terminated by ‘|’
(
cust_num,
first_name “case
when (ora_hash(upper(:first_name),20,0) = 0) then ‘Andy’
when (ora_hash(upper(:first_name),20,0) = 1) then ‘Brenda’
when (ora_hash(upper(:first_name),20,0) = 2) then ‘Jim’
when (ora_hash(upper(:first_name),20,0) = 3) then ‘Cathy’
when (ora_hash(upper(:first_name),20,0) = 4) then ‘Bob’
when (ora_hash(upper(:first_name),20,0) = 5) then ‘Sue’
when (ora_hash(upper(:first_name),20,0) = 6) then ‘Joe’
when (ora_hash(upper(:first_name),20,0) = 7) then ‘Carmen’
when (ora_hash(upper(:first_name),20,0) = 8) then ‘Jose’
when (ora_hash(upper(:first_name),20,0) = 9) then ‘Claire’
when (ora_hash(upper(:first_name),20,0) = 10) then ‘Ron’
when (ora_hash(upper(:first_name),20,0) = 11) then ‘Misty’
when (ora_hash(upper(:first_name),20,0) = 12) then ‘Jason’
when (ora_hash(upper(:first_name),20,0) = 13) then ‘Glenda’
when (ora_hash(upper(:first_name),20,0) = 14) then ‘Rick’
when (ora_hash(upper(:first_name),20,0) = 15) then ‘Suzy’
when (ora_hash(upper(:first_name),20,0) = 16) then ‘Peter’
when (ora_hash(upper(:first_name),20,0) = 17) then ‘Sara’
when (ora_hash(upper(:first_name),20,0) = 18) then ‘Isaac’
when (ora_hash(upper(:first_name),20,0) = 19) then ‘Helen’
when (ora_hash(upper(:first_name),20,0) = 20) then ‘Nathan’
else ‘ERROR’ end”,
last_name “case
when (ora_hash(upper(:last_name),20,0) = 0) then ‘Jones’
when (ora_hash(upper(:last_name),20,0) = 1) then ‘Hay’
when (ora_hash(upper(:last_name),20,0) = 2) then ‘Smith’
when (ora_hash(upper(:last_name),20,0) = 3) then ‘Gonzales’
when (ora_hash(upper(:last_name),20,0) = 4) then ‘Wu’
when (ora_hash(upper(:last_name),20,0) = 5) then ‘Cho’
when (ora_hash(upper(:last_name),20,0) = 6) then ‘Einstein’
when (ora_hash(upper(:last_name),20,0) = 7) then ‘Ward’
when (ora_hash(upper(:last_name),20,0) = 8) then ‘Magee’
when (ora_hash(upper(:last_name),20,0) = 9) then ‘Morgan’
when (ora_hash(upper(:last_name),20,0) = 10) then ‘Tenkiller’
when (ora_hash(upper(:last_name),20,0) = 11) then ‘Rice’
when (ora_hash(upper(:last_name),20,0) = 12) then ‘Namath’
when (ora_hash(upper(:last_name),20,0) = 13) then ‘DeMarco’
when (ora_hash(upper(:last_name),20,0) = 14) then ‘Obama’
when (ora_hash(upper(:last_name),20,0) = 15) then ‘Bush’
when (ora_hash(upper(:last_name),20,0) = 16) then ‘Grant’
when (ora_hash(upper(:last_name),20,0) = 17) then ‘Lee’
when (ora_hash(upper(:last_name),20,0) = 18) then ‘Blount’
when (ora_hash(upper(:last_name),20,0) = 19) then ‘Bain’
when (ora_hash(upper(:last_name),20,0) = 20) then ‘Adams’
else ‘ERROR’ end”,
address1 “case
when (ora_hash(upper(:address1),20,0) = 0) then ’10 Main’
when (ora_hash(upper(:address1),20,0) = 1) then ’11 1st’
when (ora_hash(upper(:address1),20,0) = 2) then ’22 2nd’
when (ora_hash(upper(:address1),20,0) = 3) then ’33 3rd’
when (ora_hash(upper(:address1),20,0) = 4) then ’44 4th’
when (ora_hash(upper(:address1),20,0) = 5) then ’55 5th’
when (ora_hash(upper(:address1),20,0) = 6) then ’66 6th’
when (ora_hash(upper(:address1),20,0) = 7) then ’77 7th’
when (ora_hash(upper(:address1),20,0) = 8) then ’88 8th’
when (ora_hash(upper(:address1),20,0) = 9) then ’99 9th’
when (ora_hash(upper(:address1),20,0) = 10) then ‘100 10th’
when (ora_hash(upper(:address1),20,0) = 11) then ‘111 11th’
when (ora_hash(upper(:address1),20,0) = 12) then ‘222 22nd’
when (ora_hash(upper(:address1),20,0) = 13) then ‘333 33rd’
when (ora_hash(upper(:address1),20,0) = 14) then ‘444 44th’
when (ora_hash(upper(:address1),20,0) = 15) then ‘555 55th’
when (ora_hash(upper(:address1),20,0) = 16) then ‘666 66th’
when (ora_hash(upper(:address1),20,0) = 17) then ‘777 77th’
when (ora_hash(upper(:address1),20,0) = 18) then ‘888 88th’
when (ora_hash(upper(:address1),20,0) = 19) then ‘999 99th’
when (ora_hash(upper(:address1),20,0) = 20) then ‘2000 20th’
else ‘ERROR’ end”,
city,
state,
zip,
country_code,
phone “upper(‘8005551212’)”,
email “upper(‘me\@mydomain.com’)” ,
create_date date ‘mm/dd/yyyy’,
)

This article has been kindly republished with permission from Steve Harville’s blog.