Nuix and EDRM republish Enron data set cleansed of more than 10,000 items containing private, health and financial information
A follow-up to an issue recently raised by BeyondRecognition.net and discussed on this blog.
SAN FRANCISCO, CA – May 15, 2013 — Nuix, a worldwide provider of information management technologies, and EDRM, the leading standards organization for the eDiscovery and information governance market, have today republished the EDRM Enron PST Data Set after cleansing it of private, health and personal financial information. Nuix and EDRM have also published the methodology Nuix’s staff used to identify and remove more than 10,000 high-risk items at nuix.com/enron.
The EDRM Enron data set is an industry-standard collection of email data that the legal profession has used for many years for electronic discovery training and testing. It was sourced from the Federal Energy Regulatory Commission’s investigation into collapsed energy firm Enron. In early 2012, the EDRM Enron PST Data Set and the EDRM Enron Data Set v2 became an Amazon Web Services Public Data Set, making them a valuable public resource for researchers across a variety of disciplines
“Recently, we have been working closely with Nuix to cleanse the data set of private information about the company’s former employees and make the cleansed data set readily available to the community,” said George Socha and Tom Gelbmann, co-founders of EDRM. “These efforts help to protect the privacy of hundreds of individuals and we encourage anyone who finds private data that we did not remove to notify us.”
Using a series of investigative workflows on the EDRM Enron PST Data Set, Nuix consultants Matthew Westwood-Hill and Ady Cassidy identified more than 10,000 items including:
- 60 items containing credit card numbers, including departmental contact lists that each contained hundreds of individual credit cards
- 572 containing Social Security or other national identity numbers—thousands of individuals’ identity numbers in total
- 292 containing individuals’ dates of birth
- 532 containing information of a highly personal nature such as medical or legal matters.
Many items contained multiple instances and types of information. This included departmental contact list spreadsheets with dates of birth, credit card numbers, Social Security numbers, home addresses and other private details of dozens of staff members.
The investigative team also clearly demonstrated that these items did not stay within the Enron firewall. For example, some staff emailed “convenience copies” of documents containing private data to their personal addresses.
“Nuix and our partners have conducted sweeps for private and credit card data for dozens of corporate customers and we are yet to encounter a data set that did not include some inappropriately stored personal, financial or health information,” said Eddie Sheehy, CEO of Nuix. “The increasing burden of privacy and data breach regulations, combined with the strict requirements of credit card companies, make this an unacceptable business risk.
“Using the methodology we are publishing alongside the cleansed EDRM Enron data, organizations can identify private and financial data, find out if it has been emailed outside the firewall and take immediate steps to remediate the risks involved.”
Nuix is currently applying the same methodology to the EDRM Enron Data Set v2, which it will also republish at nuix.com/enron.
Nuix will host a Twitter chat to discuss the release of the cleansed EDRM Enron PST Data Set on Thursday, May 23rd 2pm – 3pm ET. Nuix experts will describe the process of identifying unsecured financial, health and personally identifiable information in corporate data. Follow the hashtag #NuixChat and send in your questions beforehand to @nuix.
SOURCE: EDRM and Nuix