Privacy after Big Data: Compilation of Early Research
Download the Compilation (PDF)
Privacy after Big Data
Evolving data science, technologies, techniques, and practices, including big data, are enabling shifts in how the public and private sectors carry out their functions and responsibilities, deliver services, and facilitate innovative production and service models to emerge. For example, in the public sector, the Indian government has considered replacing the traditional poverty line with targeted subsidies based on individual household income and assets. The my.gov.in platform is aimed to enable participation of the connected citizens, to pull in online public opinion in a structured manner on key governance topics in the country. The 100 Smart Cities Mission looks forwards to leverage big data analytics and techniques to deliver services and govern citizens within city sub-systems. In the private sector, emerging financial technology companies are developing credit scoring models using big, small, social, and fragmented data so that people with no formal credit history can be offered loans. These models promote efficiency and reduction in cost through personalization and are powered by a wide variety of data sources including mobile data, social media data, web usage data, and passively collected data from usages of IoT or connected devices.
These data technologies and solutions are enabling business models that are based on the ideals of ‘less’: cash-less, presence-less, and paper-less. This push towards an economy premised upon a foundational digital ID in a prevailing condition of absent legal frameworks leads to substantive loss of anonymity and privacy of individual citizens and consumers vis-a-vis both the state and the private sector. Indeed, the present use of these techniques run contrary to the notion of the ‘sunlight effect’ - making the individual fully transparent (often without their knowledge) to the state and private sector, while the algorithms and means of reaching a decision are opaque and inaccessible to the individual.
These techniques, characterized by the volume of data processed, the variety of sources data is processed from, and the ability to both contextualize - learning new insights from disconnected data points - and de-contextualize - finding correlation rather than causation - have also increased the value of all forms of data. In some ways, big data has made data exist on an equal playing field as far as monetisation and joining up are concerned. Meta data can be just as valuable to an entity as content data. As data science techniques evolve to find new ways of collecting, processing, and analyzing data - the benefits of the same are clear and tangible, while the harms are less clear, but significantly present.
Is it possible for an algorithm to discriminate? Will incorrect decisions be made based on data collected? Will populations be excluded from necessary services if they do not engage with certain models or do emerging models overlook certain populations? Can such tools be used to surveil individuals at a level of granularity that was formerly not possible and before a crime occurs? Can such tools be used to violate rights – for example target certain types of speech or groups online? And importantly, when these practices are opaque to the individual, how can one seek appropriate and effective remedy.
Traditionally, data protection standards have defined and established protections for certain categories of data. Yet, data science techniques have evolved beyond data protection principles. It is now infinitely harder to obtain informed consent from an individual when data that is collected can be used for multiple purposes by multiple bodies. Providing notice for every use is also more difficult – as is fulfilling requirements of data minimization. Some say privacy is dead in the era of big data. Others say privacy needs to be re-conceptualized, while others say protecting privacy now, more than ever, requires a ‘regulatory sandbox’ that brings together technical design, markets, legislative reforms, self regulation, and innovative regulatory frameworks. It also demands an expanding of the narrative around privacy – one that has largely been focused on harms such as misuse of data or unauthorized collection – to include discrimination, marginalization, and competition harms.
In this compilation we have put together a series of articles that we have developed as we explore the impacts – positive and negative – of big data. This includes looking at India’s data protection regime in the context of big data, reviewing literature on the benefits of harms of big data, studying emerging predictive policing techniques that rely on big data, and analyzing closely the impact of big data on specific privacy principles such as consent. This is a growing body of research that we are exploring and is relevant to multiple areas of our work including privacy and surveillance. Feedback and comments on the compilation are welcome and appreciated.
Elonnai Hickok
Director - Internet Governance