Committee for the Protection of Human Subjects
Policy on the
Security of Research Subjects' Personally Identifiable Data Held by Researchers

1. Policy Statement

People who volunteer to participate as subjects in research do so with the understanding that the researcher(s) will protect their identity and the information that is obtained from them from inadvertent or inappropriate disclosure. The principle that CPHS upholds in assessing the benefits and risks of the research is codified in the Belmont Report as beneficence—the obligation to do no harm, and moreover, to maximize possible benefits and minimize possible harms to human subjects. As benefits and risks may be reflected in protections for privacy and confidentiality, all human subject research protocols must have in place an acceptable, effective and documented procedure for the protection of identifiable and/or confidential information before the protocol will be approved, granted continuing approval, or determined exempt from full review by the Committee for Protection of Human Subjects (CPHS).

2. Purpose

This policy exists to re-iterate and clarify existing CPHS requirements for researchers to take appropriate data security measures to protect the identity of and/or confidential information obtained about living people when they participate as subjects in research.

3. Scope

This policy applies to all human subject research reviewed by CPHS and conducted by or under the auspices of University of California Berkeley (UCB) faculty, graduate students, other affiliated researchers (investigators) or research conducted using UCB resources. The pertinent information or data containing personally identifiable information may be (or has been) collected or stored in any form such as electronic, digital, paper, audio or video tape. This information or data may be stored within computers or equipment that is privately owned, university-owned or -maintained or reside on removable electronic media, in either case located on university premises or elsewhere.

4. Definitions

A research data set constitutes a body of data elements collected in the course of research with living human beings.

Personal identifiers within a data set are any data elements that singly or in combination can uniquely identify an individual, such as a social security number, name, address, demographic information (e.g. combining gender, race, job and location), student identification numbers or other identifiers (e.g. hospital patient numbers).

A de-identified data set refers to original data that subsequently has been stripped of all elements (including but not limited to personal identifiers) that might enable a reasonably informed and determined person to deduce the identity of the subject. For research that requires that data elements later be linked to an individual's identity, the original data set may be partitioned into two data sets: a de-identified data set and an identity-only data set. The latter should contain any and all personal identity information absolutely necessary for future conduct of the research. For purposes of later merging the identity information with other research data, a researcher-assigned identity code (typically a randomly generated number) that is associated with and unique to each specific individual may be included in both data sets, and later be used to link identity data elements back to the de-identified data set. This identity code should not offer any clue as to the identity of an individual.

Secure location refers to a place (room, file cabinet, etc.) for storing a removable medium, computer or equipment wherein resides data sets with personal identifiers to which only the principal (or lead) investigator has access through lock and key (either physical or electronic keys are acceptable). Access may be provided to other parties with a legitimate need, consistent with the policies below and as disclosed in the research protocol.

Secure data encryption refers to the algorithmic transformation of a data set to an unrecognizable form from which the original data set or any part thereof can be recovered only with knowledge of a secret decryption key.

5. Specific Policies

We recognize that not all research data sets can reasonably be de-identified (for example, an audio recorded interview in which a subject identifies him or herself). In this case, the original research data set must be considered an identified data set and treated accordingly. Identified and identity-only data sets should always be stored in: a) a secure location, or b) secure data-encrypted form.

5.1 Collect the minimum identity data needed and describe in the research protocol exactly what personally identifiable data elements will be collected, and whether the data set will be de-identified, split into a de-identified data set and an identity-only data set, or neither.

5.2 De-identify data as soon as possible after collection and/or separate identifiable elements (create identity key, destroy raw data).

5.3 Limit access to identified or identity-only data set and store it in a secure (locked) location separate from data, or store it in encrypted form, or both. Encrypted form is the only acceptable storage for data stored in a computer or removable medium which is not permanently located in a secure location (e.g. laptop computer or a removable disk which is to be carried in a briefcase) or for transmission across the network (for example as an email attachment).

5.4 The investigator shall develop and disclose to CPHS a plan in writing as to what individuals will have legitimate access to an identified or identity-only data set, either through access to secure location key or to decryption key. This plan must include provision for recovery of a lost decryption key, to insure that a data set cannot be permanently lost.

5.5 When an identified or identity-only data set is stored in personal or university-owned or -maintained computer, investigators are strongly encouraged to ensure that this computer be professionally administered and managed. If this is not possible, investigators should disclose such, and provide CPHS with a plan for how the sensitive data will otherwise be secured.

The opportunity for human error should be reduced through: a) limiting the number of people (both users and administrators) with access to the data and ensuring their expertise and trustworthiness; and/or b) using automatic (embedded) security measures (such as storing data on non-volatile medium only in secure data-encrypted form) that are professionally installed and administered. If this computer is connected to the campus network or to the public Internet, the professional administrator of the computer shall ensure that it complies with all minimum standards for network and data security listed below.

5.6 For existing research data which is not stored in a manner compliant with the above policies the lead investigator must take immediate steps to comply with these policies by April 1, 2005.

5.7 All new protocols and continuing renewals submitted as of April 1, 2005 must include for review and approval by CPHS a detailed plan for data security for all affected CPHS protocols.

6. Related Policies

7. Summary of the Acceptable Security Measures for Maintaining Personally Identifiable Information for Research Purposes

The level of security necessary is relative to the risk posed to the subject should personally identifiable data be inadvertently released or released as a result of malfeasance. In an effort to ensure best practice it is always more desirable to have a higher level of security than to risk operating at a minimal standard. CPHS has the authority to decide if the security plan to protect subjects' confidentiality or anonymity is acceptable. For data that retains identifiers, the protocol must describe adequate administrative, physical and technical safeguards. Investigators are encouraged to consult with appropriate information technology and security experts such as their system administrators to develop appropriate data security plans when working with personally identifiable data.

8. Responsible Administrative Officer

Director, Office for the Protection of Human Subjects
2150 Shattuck Avenue, Suite 313
Berkeley, CA 94704-5940
510/642-7461

Last Policy Revision: 3-9-05
Last Copyedit: 10-20-05

Return to Main Page