Responsible Data Forum: Discussion on the Risks and Mitigations of releasing Data
The Responsible Data Forum initiated a discussion on 26th August 2015 to discuss the risks and mitigations of releasing data.
The discussion was regarding the question of adoption of adequate measures to mitigate risks to people and communities when some data is prepared to be released or for sharing purposes.
The following concerns entailed the discussion:
- What is risk- risks in releasing development data and PII
- What kinds of risks are there
- Risk to whom?
- Risks in dealing with PII, discussed by way of several examples
- What is missing from the world
The first thing to be done is that if a dataset is made, then you have the responsibility that no harm is caused to the people who are connected to the dataset and a balance must be created between good use of the data on one hand and protecting data subjects, sources and managers on the other.
To answer what is risk, it was defined to be the “probability of something happening multiplied by the resulting cost or benefit if it does” (Oxford English Dictionary). So it is based on cost/benefit, probability, and a subject. For probability, all possible risks must be considered and work in terms of how much harm would happen and how likely that is about to happen. These issues must be considered necessarily.
An example in this context was that of the Syrian government where the bakeries were targeted as the bombers knew where the bakeries are, making them easy targets. It was discussed how in this backdrop of secure data release mechanism, local context is an important issue.
Another example of bad practice was the leak of information in the Ashley Madison case wherein several people have committed suicide.
- Kinds of risk:
- physical harm:
The next point of discussion was regarding kinds of the physical risks to data subjects when there is release/sharing of data related to them. Some of them were:
- i. security issues
- ii. hate speech
- iii. voter issues
- iv. police action
Hence PII goes both ways- where some choose to run the risk of PII being identified; on the other hand some run the risk of being identified as the releaser of information.
- Legal harms- to explain what can be legal harms posed in releasing or sharing data, an example was discussed of an image marking exercise of a military camp wherein people joined in, marked military equipment and discovered people who are from that country.
- Reputational harm as an organization primarily.
- Privacy breach- which can lead to all sorts of harms.
- Risk to whom?
Data subjects – this includes:
- i. Data collectors
- ii. Data processing team
- iii. Person releasing the data
- iv. Person using the data
Also, the likely hood of risk ranges from low, medium and high. We as a community are at a risk at worse.
- PII:
- Any data which can be used to identify any specific individual. Such information does not only include names, addresses or phone numbers but could also be data sets that don’t in themselves identify an individual.
For example, in some places sharing of social security number is required for HIV+ status check-up; hence, one needs to be aware of the environment of data sets that go into it. In another situation where there is a small population and there is a need to identify people of a street, village or town for the purpose of religion, then even this data set can put them to risk.
Hence, awareness with respect to the demographics is important to ascertain how many people reside in that place, be aware of the environment and accordingly decide what data set must be made.
- Another way to mitigate risks at the time of release/sharing of data is partial release only to some groups, like for the purpose of academics or to data subjects.
- Different examples were discussed to identify how release of data irresponsibly has affected the data subjects and there is a need to work to mitigate harms caused in such cases.
Example- in the New York City taxi case data about every taxi ride was released-including pickup and drop locations, times, fares. Here it becomes more problematic if someone is visiting strip clubs, then re-identification takes place and this necessitates protection of people against such insinuation.
This shows how data sets can lead to re-identification, even when it is not required. Hence, the involved actors must understand the responsibilities when engaging in data collection or release and accordingly mitigate the risks so associated.
- A concern was raised over collection and processing of the information of genetic diseases of a small population since practically it is not possible to guarantee that the information of data subjects to whom the data relates will not be released or exposed or it won’t be re-identifiable. Though best efforts would be made by experts, however, realistically, it is not possible to guarantee people that they will not be identified. So the question of informing people of such risks is highly crucial. It is suggested that one way of mitigating risks is involving the people and letting them know. Awareness regarding potential impact by breach of data or identification is very important.
- Another factor for consideration is the context in which the information was collected. The context for collection of data seems to change over a period of time. For example, many human rights funders want information on their websites changed or removed in the backdrop of changing contexts, circumstances and situation. In this case also, the collection and release of data and the risks associated become important due to changing contexts.
- What is missing from the world?
Though recognition of risks has been done and is an ongoing process, what is missing from the world are uniform guidelines, rules or law. There are no policies for informed consent or for any means to mitigate risks collectively in a uniform manner. There must be adoption of principles of necessity, proportionality and informed consent.