Implementing a Data De-Identification Framework

Tuesday, January 29, 2013

Rebecca Herold

65be44ae7088566069cc3bef454174a7

Growing numbers of organizations are trying to figure out the benefits of anonymizing, or as HIPAA (the only regulation that provides specific legal requirements for such actions) puts it “de-identifying,” personal information. Healthcare organizations see benefits for improving healthcare. Their business associates (BAs) see benefits in the ways in which they can minimize the controls around such data. Of course marketing organizations salivate at the prospects of doing advanced analysis with such data to discover new trends and marketing possibilities.  The government wants to use it for investigations. Historians want to use it for, yes, marking historical events. And the list could go on.

I wrote “6 Good Reasons to De-Identify Data” early last year in March.  Recently I participated as a presenter in a very informative webinar hosted by the IAPP, “The De-Identification of Health Records – Risks and Rewards.”  Dr. Khaled El Emam, Founder and CEO, Privacy Analytics, spoke first about the benefits and constraints surrounding health data sharing. As a long-time practicing health informatics researcher, he has great insights you cannot find anywhere else, and his points on de-identification challenges and methods are noteworthy.   Deven McGraw, Director, Health Privacy Project, Center for Democracy & Technology (CDT) then discussed current public laws, regulations and government policies addressing and governing de-identification. Much more additional valuable information!  I then had the opportunity to discuss how to practicably establish a de-identification program within an organization, and I provided the framework around implementing and managing de-identification activities. I am providing some of my high-level talking points about that framework here to help others with implementing their own programs.

You still need safeguards for de-identified data

Information security is necessary to protect all types of data that are valuable to you and your business.  And de-identified data not only still has some levels of probability for re-identification, but your business leaders should also understand that de-identified data is another type of intellectual property you’ve created for some very important purposes to give some type of business edge through marketing, or some type of healthcare breakthrough for your organization to provide, and so on.  Intellectual property requires security controls, and so does de-identified data. So, what are these controls? There are a dozen elements that need to be part of your organization’s de-identification framework.

1.  Policies

Establish and document policies and supporting procedures detailing the situations for when de-identification needs to occur (e.g., for lab research, marketing research, consumer stats reports, and so on) and the associated controls that need to be in place. You should specify within the policies the position or person that will be responsible for not only maintaining these policies, but also who is responsible for following them, and for enforcing them. Your information security and privacy areas should work together to create such policies. Or, you may be able to amend existing policies. For example, if you have information security policies specific to protecting intellectual property, you may be able to update those. Or, if you are a healthcare provider, you may be able to update any HIPAA policies you already have that address de-identification. Make sure they are written to reflect actual practice within your organization. Too many organizations write policies to sound good, but then don’t actually put them into practice; this not only leaves the associated information at risk, it also leaves your organization vulnerable to non-compliance sanctions. 

2.     Procedures

The HIPAA Privacy Rule makes two methods available for de-identifying health information:

  1. Remove the 18 specific identifiers listed in the Privacy Rule and determine there is no other information that may identify the individual (Khaled and Deven went into detail about the related issues during the webinar).  Or,
  2. Obtain an opinion from a qualified statistical expert that the risk of identifying an individual is very small under the circumstances; the methods and justification for the opinion should be documented.

Your de-identification procedures should provide the details to support which of these methods you will use, the areas within your organization that are qualified to do the actual de-identification, or if you don’t have in-house expertise, the vendors that have been approved to do such activities. You need to have these details documented so your staff will know the parameters around which they must do de-identification, and also to help ensure de-identification is done consistently within your organization. You don’t want someone saying, “Hey, Pat has a math degree; that qualifies him as a statistician. Let’s just let him sign off on what we did.”

3.     Responsibility 

As mentioned in the Policies section, you need to ensure someone in your organization has been assigned primary responsibility for overseeing all de-identification activities, and that they will ensure only authorized de-identification methods are used, that the de-identified data is properly secured, that it is not inappropriately shared, and so on. This should be formally documented, and ideally included in the position’s job description.

4.     Training and Awareness

There will be many people interested in accessing and using de-identified data. Marketers may not only view it as a goldmine of marketing possibilities, but business leaders may also view it as a commodity that they can sell to others to bring in a new revenue stream; I’ve had such comments made to me at conferences this fall.  Researchers may view it as another data set they can combine with other “Big Data” repositories and to expand their research activities, sometimes with outside entities. Some people may determine they will just de-identify data on their own. To help prevent bad actions resulting from these types of misunderstandings and simple lack of knowledge on de-identification requirements, you must provide training to clearly and effectively communicate the de-identification policies and procedures you’ve established for your organization, and then send ongoing awareness communications to remind those interested in de-identification of your organization’s requirements. Folks will not know your de-identification requirements without training; such knowledge is not innately known. You must provide training, and then provide reminders to make sure your workers are doing the appropriate activities. Training and awareness are too often omitted; don’t make this mistake. It could lead to costly bad results and sanctions.

5.     Access Controls

What access controls are you going to use to ensure that only those with a business need can access the de-identified data? Who are the ones who actually need access for business purposes? A mistake many organizations make is that they assume everyone, even the public, can access data because there are no personal data items involved. This is a dangerous assumption. Identify the positions and individuals who actually need access for business responsibilities. You need to work with your information security area to determine the options available for then implementing these access controls.

6.     Logical and Physical Controls

What logical and physical controls are you going to use to safeguard the de-identified data? You need to work with your information security area to determine the options available. They will vary based upon how your de-identified data is created, where your de-identified data is created, and where it is stored.  You need to think about whether the data will be stored on endpoint devices, such as laptops, USB drives and so on. You need to determine if you will allow for access to de-identified data from remote locations. You need to think about this topic carefully, and determine the controls that are best for your environment. Again, work closely with your information security department to identify the best solutions for most appropriately mitigating the associated risks.

7.     Retention and Destruction

In speaking with many CEs and BAs about de-identification, they almost always have overlooked the retention issue. The longer you retain data, the more likely it is to be used inappropriately as the business changes over time, as staff changes, and so on. Decide how long you need to retain de-identified data after you stop using it for the purposes for which it was created, document this retention time, and establish procedures for destroying the data, in all forms and in all storage areas, when the retention time expires.

8.     Consent

When you cannot remove all PHI elements, then you won’t have de-identified data, but you may have a limited data set, as defined under HIPAA (and for which covered entities CEs and BAs must comply). Make sure you are aware of these situations, and establish procedures for obtaining consents from the individuals involved when necessary, for activities such as marketing and research.

9.     Risk Assessment

Perform risk analysis to determine the reasonable likelihood of re-identification.  The higher the risk of re-identification, the more security controls that will need to be implemented, and the more restricted the access to the data will need to be.  The lower the risk, the fewer the security controls, and the larger the audience that can be provided access. And, as indicated, it is a good idea to get the opinion of an expert statistician.

10.  Contractual Requirements

When sharing de-identified data with business partners, business associates, and other third parties, make sure you include directives and restrictions to not do actions that would re-identify the de-identified data you’ve entrusted to them. I know several of the business associates I’ve worked with have very limited knowledge and understanding of HIPAA and HITECH, and they’ve believed that if they have obtained de-identified data that they can then use it in any way they want, since it is no longer considered to be protected health information (PHI). Some had actually established ways to sell it to marketing organizations to then combine with other data with the intent to reveal specific individuals. Make sure you address this issue in your contractual requirements.

11.  Third Parties

When you share data with third parties, your responsibility for that data follows it, to varying extents. This is another important topic for extended discussion for another day. However, we have seen how breaches in business associates (BAs) have resulted in sanctions to the covered entities that entrusted their data to the BAs. Make sure that the third parties you entrust de-identified data to have policies that, at a minimum, meet your own organization’s policy requirements. Usually they will need even more, if they are also obtaining data from other CEs.

12.  Audits

Do audits. Do privacy impact assessments. Do not assume that simply the existence of policies and procedures are enough due diligence. The existence of policies and procedures is important, but are not enough on their own. You need to make sure they are actually being put into practice. 

Bottom line for all organizations, from the largest to the smallest:  If you are thinking about using de-identified information, be sure to establish a framework around which you will create, use and retain that data, along with the associated safeguards. Remember: De-identified data still needs to have security controls applied, just not at the same levels as personal information.

Additional information about data de-identification

Here are some other great sources of information about de-identification:

This post was written as part of the IBM for Midsize Business (http://goo.gl/S6P7m) program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

Cross-posted from Privacy Professor

Possibly Related Articles:
10803
General
Information Security
Data Loss Prevention Healthcare Personally Identifiable Information
Post Rating I Like this!
The views expressed in this post are the opinions of the Infosec Island member that posted this content. Infosec Island is not responsible for the content or messaging of this post.

Unauthorized reproduction of this article (in part or in whole) is prohibited without the express written permission of Infosec Island and the Infosec Island member that posted this content--this includes using our RSS feed for any purpose other than personal use.