Benefits and Harms of "Big Data"

Posted by Scott Mason at Dec 30, 2015 02:48 AM | Permalink

Filed under: Internet Governance, Big Data

Today the quantity of data being generated is expanding at an exponential rate. From smartphones and televisions, trains and airplanes, sensor-equipped buildings and even the infrastructures of our cities, data now streams constantly from almost every sector and function of daily life.

Introduction

In 2011 it was estimated that the quantity of data produced globally would surpass 1.8 zettabyte[1]. By 2013 that had grown to 4 zettabytes[2], and with the nascent development of the so-called 'Internet of Things' gathering pace, these trends are likely to continue. This expansion in the volume, velocity, and variety of data available[3] , together with the development of innovative forms of statistical analytics, is generally referred to as "Big Data"; though there is no single agreed upon definition of the term. Although still in its initial stages, Big Data promises to provide new insights and solutions across a wide range of sectors, many of which would have been unimaginable even 10 years ago.

Despite enormous optimism about the scope and variety of Big Data's potential applications however, many remain concerned about its widespread adoption, with some scholars suggesting it could generate as many harms as benefits[4]. Most notably these have included concerns about the inevitable threats to privacy associated with the generation, collection and use of large quantities of data [5]. However, concerns have also been raised regarding, for example, the lack of transparency around the design of algorithms used to process the data, over-reliance on Big Data analytics as opposed to traditional forms of analysis and the creation of new digital divides to just name a few.

The existing literature on Big Data is vast, however many of the benefits and harms identified by researchers tend to relate to sector specific applications of Big Data analytics, such as predictive policing, or targeted marketing. Whilst these examples can be useful in demonstrating the diversity of Big Data's possible applications, it can nevertheless be difficult to gain an overall perspective of the broader impacts of Big Data as a whole. As such this article will seek to disaggregate the potential benefits and harms of Big Data, organising them into several broad categories, which are reflective of the existing scholarly literature.

What are the potential benefits of Big Data?

From politicians to business leaders, recent years have seen Big Data confidently proclaimed as a potential solution to a diverse range of problems from, world hunger and diseases, to government budget deficits and corruption. But if we look beyond the hyperbole and headlines, what do we really know about the advantages of Big Data? Given the current buzz surrounding it, the existing literature on Big Data is perhaps unsurprisingly vast, providing innumerable examples of the potential applications of Big Data from agriculture to policing. However, rather than try (and fail) to list the many possible applications of Big Data analytics across all sectors and industries, for the purposes of this article we have instead attempted to distil the various advantages of Big Data discussed within literature into the following five broad categories; Decision-Making, Efficiency & Productivity, Research & Development, Personalisation and Transparency, each of which will be discussed separately below.

Decision-Making

Whilst data analytics have always been used to improve the quality and efficiency of decision-making processes, the advent of Big Data means that the areas of our lives in which data driven decision- making plays a role is expanding dramatically; as businesses and governments become better able to exploit new data flows. Furthermore, the real-time and predictive nature of decision-making made possible by Big Data, are increasingly allowing these decisions to be automated. As a result, Big Data is providing governments and business with unprecedented opportunities to create new insights and solutions; becoming more responsive to new opportunities and better able to act quickly - and in some cases preemptively - to deal with emerging threats.

This ability of Big Data to speed up and improve decision-making processes can be applied across all sectors from transport to healthcare and is often cited within the literature as one of the key advantages of Big Data. Joh, for example, highlights the increased use of data driven predictive analysis by police forces to help them to forecast the times and geographical locations in which crimes are most likely to occur. This allows the force to redistribute their officers and resources according to anticipated need, and in certain cities has been highly effective in reducing crime rates [6]. Raghupathi meanwhile cites the case of healthcare, where predictive modelling driven by big data is being used to proactively identify patients who could benefit from preventative care or lifestyle changes[7].

One area in particular where the decision-making capabilities of Big Data are having a significant impact is in the field of risk management [8]. For instance, Big Data can allow companies to map their entire data landscape to help detect sensitive information, such as 16 digit numbers - potentially credit card data - which are not being stored according to regulatory requirements and intervene accordingly. Similarly, detailed analysis of data held about suppliers and customers can help companies to identify those in financial trouble, allowing them to act quickly to minimize their exposure to any potential default[9].

Efficiency and Productivity

In an era when many governments and businesses are facing enormous pressures on their budgets, the desire to reduce waste and inefficiency has never been greater. By providing the information and analysis needed for organisations to better manage and coordinate their operations, Big Data can help to alleviate such problems, leading to the better utilization of scarce resources and a more productive workforce [10].

Within the literature such efficiency savings are most commonly discussed in relation to reductions in energy consumption [11]. For example, a report published by Cisco notes how the city of Olso has managed to reduce the energy consumption of street-lighting by 62 percent through the use of smart solutions driven by Big Data[12]. Increasingly, however, statistical models generated by Big Data analytics are also being utilized to identify potential efficiencies in sourcing, scheduling and routing in a wide range of sectors from agriculture to transport. For example, Newell observes how many local governments are generating large databases of scanned license plates through the use of automated license plate recognition systems (ALPR), which government agencies can then use to help improve local traffic management and ease congestion[13].

Commonly these efficiency savings are only made possible by the often counter-intuitive insights generated by the Big Data models. For example, whilst a human analyst planning a truck route would always tend to avoid 'drive-bys' - bypassing one stop to reach a third before doubling back - Big Data insights can sometimes show such routes to be more efficient. In such cases efficiency saving of this kind would in all likelihood have gone unrecognised by a human analyst, not trained to look for such patterns[14].

Research, Development, and Innovation

Perhaps one of the most intriguing benefits of Big Data is its potential use in the research and development of new products and services. As is highlighted throughout the literature, Big Data can help businesses to gain an understanding of how others perceive their products or identify customer demand and adapt their marketing or indeed the design of their products accordingly[15]. Analysis of social media data, for instance, can provide valuable insights into customers' sentiments towards existing products as well as discover demands for new products and services, allowing businesses to respond more quickly to changes in customer behaviour[16].

In addition to market research, Big Data can also be used during the design and development stage of new products; for example by helping to test thousands of different variations of computer-aided designs in an expedient and cost-effective manner. In doing so, business and designers are able to better assess how minor changes to a products design may affect its cost and performance, thereby improving the cost-effectiveness of the production process and increasing profitability.

Personalisation

For many consumers, perhaps the most familiar application of Big Data is its ability to help tailor products and services to meet their individual preferences. This phenomena is most immediately noticeable on many online services such as Netflix; where data about users activities and preferences is collated and analysed to provide a personalised service, for example by suggesting films or television shows the user may enjoy based upon their previous viewing history[17]. By enabling companies to generate in-depth profiles of their customers, Big Data allows businesses to move past the 'one size fits all' approach to product and services design and instead quickly and cost-effectively adapt their services to better meet customer demand.

In addition to service personalisation, similar profiling techniques are increasingly being utilized in sectors such as healthcare. Here data about a patient's medical history, lifestyle, and even their gene expression patterns are collated, generating a detailed medical profile which can then be used to tailor treatments to meet their specific needs[18]. Targeted care of this sort can not only help to reduce costs for example by helping to avoid over-prescriptions, but may also help to improve the effectiveness of treatments and so ultimately their outcome.

Transparency

If 'knowledge is power', then, - so say Big Data enthusiasts - advances in data analytics and the quantity of data available can give consumers and citizens the knowledge to hold governments and businesses to account, as well as make more informed choices about the products and services they use. Nevertheless, data (even lots of it) does not necessarily equal knowledge. In order for citizens and consumers to be able to fully utilize the vast quantities of data available to them, they must first have some way to make sense of it. For some, Big Data analytics provides just such a solution, allowing users to easily search, compare and analyze available data, thereby helping to challenge existing information asymmetries and make business and government more transparent[19].

In the private sector, Big Data enthusiasts have claimed that Big Data holds the potential to ensure complete transparency of supply chains, enabling concerned consumers to trace the source of their products, for example to ensure that they have been sourced ethically [20]. Furthermore, Big Data is now making accessible information which was previously unavailable to average consumers and challenging companies whose business models rely on the maintenance of information asymmetries.The real-estate industry, for example, relies heavily upon its ability to acquire and control proprietary information, such as transaction data as a competitive asset. In recent years, however, many online services have allowed consumers to effectively bypass agents, by providing alternative sources of real-estate data and enabling prospective buyers and sellers to communicate directly with each other[21]. Therefore, providing consumers with access to large quantities of actionable data . Big Data can help to eliminate established information asymmetries, allowing them to make better and more informed decisions about the products they buy and the services they enlist.

This potential to harness the power of Big Data to improve transparency and accountability can also be seen in the public sector, with many scholars suggesting that greater access to government data could help to stem corruption and make politics more accountable. This view was recently endorsed by the UN who highlighted the potential uses of Big Data to improve policymaking and accountability in a report published by the Independent Expert Advisory Group on the "Data Revolution for Sustainable Development". In the report experts emphasize the potential of what they term the 'data revolution', to help achieve sustainable development goals by for example helping civil society groups and individuals to 'develop data literacy and help communities and individuals to generate and use data, to ensure accountability and make better decisions for themselves' [22].

What are the potential harms of Big Data?

Whilst it is often easy to be seduced by the utopian visions of Big Data evangelists, in order to ensure that Big Data can deliver the types of far-reaching benefits its proponents promise, it is vital that we are also sensitive to its potential harms. Within the existing literature, discussions about the potential harms of Big Data are perhaps understandably dominated by concerns about privacy. Yet as Big Data has begun to play an increasingly central role in our daily lives, a broad range of new threats have begun to emerge including issues related to security and scientific epistemology, as well as problems of marginalisation, discrimination and transparency; each of which will be discussed separately below.

Privacy

By far the biggest concern raised by researchers in relation to Big Data is its risk to privacy. Given that by its very nature Big Data requires extensive and unprecedented access to large quantities of data; it is hardly surprising that many of the benefits outlined above in one way or another exist in tension with considerations of privacy. Although many scholars have called for a broader debate on the effects of Big Data on ethical best practice ^{^[23]}, a comprehensive exploration into the complex debates surrounding the ethical implications of Big Data go far beyond the scope of this article. Instead we will simply attempt to highlight some of the major areas of concern expressed in the literature, including its effects on established principles of privacy and the implication of Big Data on the suitability of existing regulatory frameworks governing privacy and data protection.

1. Re-identification

Traditionally many Big Data enthusiasts have used de-identification - the process of anonymising data by removing personally identifiable information (PII) - as a way of justifying mass collection and use of personal data. By claiming that such measures are sufficient to ensure the privacy of users, data brokers, companies and governments have sought to deflect concerns about the privacy implications of Big Data, and suggest that it can be compliant with existing regulatory and legal frameworks on data protection.

However, many scholars remain concerned about the limits of anonymisation. As Tene and Polonetsky observe 'Once data-such as a clickstream or a cookie number-are linked to an identified individual, they become difficult to disentangle'[24]. They cite the example of University of Texas researchers Narayanan and Shmatikov, who were able to successfully re-identify anonymised Netflix user data by cross referencing it with data stored in a publicly accessible online database. As Narayanan and Shmatikov themselves explained, 'once any piece of data has been linked to a person's real identity, any association between this data and a virtual identity breaks anonymity of the latter' [25]. The quantity and variety of datasets which Big Data analytics has made associable with individuals is therefore expanding the scope of the types of data that can be considered PII, as well as undermining claims that de-identification alone is sufficient to ensure privacy for users.

2. Privacy Frameworks Obsolete?

In recent decades privacy and data protection frameworks based upon a number of so-called 'privacy principles' have formed the basis of most attempts to encourage greater consideration of privacy issues online[26]. For many however, the emergence of Big Data has raised question about the extent to which these 'principles of privacy' are workable in an era of ubiquitous data collection.

Collection Limitation and Data Minimization : Big Data by its very nature requires the collection and processing of very large and very diverse data sets. Unlike other forms scientific research and analysis which utilize various sampling techniques to identify and target the types of data most useful to the research questions, Big Data instead seeks to gather as much data as possible, in order to achieve full resolution of the phenomenon being studied, a task made much easier in recent years as a result of the proliferation of internet enabled devices and the growth of the Internet of Things. This goal of attaining comprehensive coverage exists in tension however with the key privacy principles of collection limitation and data minimization which seek to limit both the quantity and variety of data collected about an individual to the absolute minimum[27].

Purpose Limitation: Since the utility of a given dataset is often not easily identifiable at the time of collection, datasets are increasingly being processed several times for a variety of different purposes. Such practices have significant implications for the principle of purpose limitation, which aims to ensure that organizations are open about their reasons for collecting data, and that they use and process the data for no other purpose than those initially specified [28].

Notice and Consent: The principles of notice and consent have formed the cornerstones of attempts to protect privacy for decades. Nevertheless in an era of ubiquitous data collection, the notion that an individual must be required to provide their explicit consent to allow for the collection and processing of their data seems increasingly antiquated, a relic of an age when it was possible to keep track of your personal data relationships and transactions. Today as data streams become more complex, some have begun to question suitability of consent as a mechanism to protect privacy. In particular commentators have noted how given the complexity of data flows in the digital ecosystem most individuals are not well placed to make truly informed decisions about the management of their data[29]. In one study, researchers demonstrated how by creating the perceptions of control, users were more likely to share their personal information, regardless of whether or not the users had actually gained control [30]. As such, for many, the garnering of consent is increasingly becoming a symbolic box-ticking exercise which achieves little more than to irritate and inconvenience customers whilst providing a burden for companies and a hindrance to growth and innovation [31].

Access and Correction: The principle of 'access and correction' refers to the rights of individuals to obtain personal information being held about them as well as the right to erase, rectify, complete or otherwise amend that data. Aside from the well documented problems with privacy self-management, for many the real-time nature of data generation and analysis in an era of Big Data poses a number of structural challenges to this principle of privacy. As x comments, 'a good amount of data is not pre-processed in a similar fashion as traditional data warehouses. This creates a number of potential compliance problems such as difficulty erasing, retrieving or correcting data. A typical big data system is not built for interactivity, but for batch processing. This also makes the application of changes on a (presumably) static data set difficult'[32].

Opt In-Out: The notion that the provision of data should be a matter of personal choice on the part of the individual and that the individual can, if they chose decide to 'opt-out' of data collection, for example by ceasing use of a particular service, is an important component of privacy and data protection frameworks. The proliferation of internet-enabled devices, their integration into the built environment and the real-time nature of data collection and analysis however are beginning to undermine this concept. For many critics of Big Data the ubiquity of data collection points as well as the compulsory provision of data as a prerequisite for the access and use of many key online services is making opting-out of data collection not only impractical but in some cases impossible. [33]

3. "Chilling Effects"

For many scholars the normalization of large scale data collection is steadily producing a widespread perception of ubiquitous surveillance amongst users. Drawing upon Foucault's analysis of Jeremy Bentham's panopticon and the disciplinary effects of surveillance, they argue that this perception of permanent visibility can cause users to sub-consciously 'discipline' and self- regulate of their own behavior, fearful of being targeted or identified as 'abnormal' [34]. As a result, the pervasive nature of Big Data risks generating a 'chilling effect' on user behavior and free speech.

Although the notion of "chilling effects" is quite prevalent throughout the academic literature on surveillance and security, the difficulty of quantifying the perception and effects of surveillance on online behavior and practices means that there have only been a limited number of empirical studies of this phenomena, and none directly related to the chilling effects of Big Data. One study, conducted by researchers at MIT however, sought to assess the impact of Edward Snowden's revelations about NSA surveillance programs on Google search trends. Nearly 6,000 participants were asked to individually rate certain keywords for their perceived degree of privacy sensitivity along multiple dimensions. Using Google's own publicly available search data, the researchers then analyzed search patterns for these terms before and after the Snowden revelations. In doing so they were able to demonstrate a reduction of around 2.2% in searchers for those terms deemed to be most sensitive in nature. According to the researchers themselves, the results 'suggest that there is a chilling effect on search behaviour from government surveillance on the Internet'[35]. Although this study focussed on the effects on government surveillance, for many privacy advocates the growing pervasiveness of Big Data risks generating similar results. [36]

4. Dignitary Harms of Predictive Decision-Making

In addition to its potentially chilling effects on free speech, the automated nature of Big Data analytics also possess the potential to inflict so-called 'dignitary harms' on individuals, by revealing insights about themselves that they would have preferred to keep private [37].

In an infamous example, following a shopping trip to the retail chain Target, a young girl began to receive mail at her father's house advertising products for babies including, diapers, clothing, and cribs. In response, her father complained to the management of the company, incensed by what he perceived to be the company's attempts to "encourage" pregnancy in teens. A few days later however, the father was forced to contact the store again to apologies, after his daughter had confessed to him that she was indeed pregnant. It was later revealed that Target regularly analyzed the sale of key products such as supplements or unscented lotions in order to generate "pregnancy prediction" scores, which could be used to assess the likelihood that a customer was pregnant and to therefore target them with relevant offers[38]. Such cases, though anecdotal illustrate how Big Data if not adopted sensitively can lead to potential embarrassing information about users being made public.

Security

In relation to cybersecurity Big Data can be viewed to a certain extent as a double-edged sword. On the one hand, the unique capabilities of Big Data analytics can provide organizations with new and innovative methods of enhancing their cybersecurity systems. On the other however, the sheer quantity and diversity of data emanating from a variety of sources creates its own security risks.

5. "Honey-Pot"

The larger the quantities of confidential information stored by companies on their databases the more attractive those databases may appear to potential hackers.

6. Data Redundancy and Dispersion

Inherent to Big Data systems is the duplication of data to many locations in order to optimize query processing. Data is dispersed across a wide range of data repositories in different servers, in different parts of the world. As a result it may be difficult for organizations to accurately locate and secure all items of personal information.

Epistemological and Methodological Implications

In 2008 Chris Anderson infamously proclaimed the 'end of theory'. Writing for Wired Magazine, Anderson predicted that the coming age of Big Data would create a 'deluge of data' so large that the scientific methods of hypothesis, sampling and testing would be rendered 'obsolete' [39]. 'There is now a better way' Anderson insisted, 'Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot'[40].

In spite of these bold claims however, many theorists remain skeptical of Big Data's methodological benefits and have expressed concern about its potential implications for conventional scientific epistemologies. For them the increased prominence of Big Data analytics in science does not signal a paradigmatic transition to a more enlightened data-driven age, but a hollowing out of the scientific method and an abandonment of casual knowledge in favor of shallow correlative analysis[41].

7. Obfuscation

Although Big Data analytics can be utilized to study almost any phenomena where enough data exists, many theorists have warned that simply because Big Data analytics can be used does not necessarily mean that they should be used[42]. Bigger is not always better and indeed the sheer quantity of data made available to users may in fact act to obscure certain insights. Whereas traditional scientific methods use sampling techniques to identify the most important and relevant data, Big Data by contrast encourages the collection and use of as much data as possible, in an attempt to attain full resolution of the phenomena being studied. However, not all data is equally useful and simply inputting as much data as possible into an algorithm is unlikely to produce accurate results and may instead obscure key insights.

Indeed, whilst the promise of automation is central to a large part of Big Data's appeal, researchers observe that most Big Data analysis still requires an element of human judgement to filter out the 'good' data from the 'bad', and to decide what aspects of the data are relevant to the research objectives. As Boyd and Crawford observe, 'in the case of social media data, there is a 'data cleaning' process: making decisions about what attributes and variables will be counted, and which will be ignored. This process is inherently subjective"^{^[43]}.

Google's Flu Trend project provides an illustrative example of how Big Data's tendency to try to maximise data inputs can produce misleading results. Designed to accurately track flu outbreaks based upon data collected from Google searches, the project was initially proclaimed to be a great success. Gradually however it became apparent that the results being produced were not reflective of the reality on the ground. Later it was discovered that the algorithms used by the project to interpret search terms were insufficiently accurate to filter out anomalies in searches, such as those related to the 2009 H1N1 flu pandemic. As such, despite the great promise of Big Data, scholars insist it remains critical to be mindful of its limitations, remain selective about the types of data included in the analysis and exercise caution and intuition whenever interpreting its results ^{^[44]}.

8. "Apophenia"

In complete contrast to the problem of obfuscation, Boyd and Crawford observe how Big Data may also lead to the practice of 'apophenia', a phenomena whereby analysts interpret patterns where none exist, 'simply because enormous quantities of data can offer connections that radiate in all directions" ^{^[45]}. David Leinweber for example demonstrated that data mining techniques could show strong but ultimately spurious correlations between changes in the S&P 500 stock index and butter production in Bangladesh [46]. Such spurious correlation between disparate and unconnected phenomena are a common feature of Big Data analytics and risks leading to unfounded conclusions being draw from the data.

Although Leinweber's primary focus of analysis was the use of Data-Mining technologies, his observations are equally applicable to Big Data. Indeed the tendency amongst Big Data analysts to marginalise the types of domain specific expertise capable of differentiating between relevant and irrelevant correlations in favour of algorithmic automation can in many ways be seen to exacerbate many of the problems Leinweber identified.

9. From Causation to Correlation

Closely related to the problem of Aphonenia is the concern that Big Data's emphasis on correlative analysis risks leading to an abandonment of the pursuit of causal knowledge in favour of shallow descriptive accounts of scientific phenomena[47].

For many, Big Data enthusiasts 'correlation is enough', producing inherently meaningful results interpretable by anyone without the need for pre-existing theory or hypothesis. Whilst proponents of Big Data claim that such an approach allows them to produce objective knowledge, by cleansing the data of any kind of philosophical or ideological commitment, for others by neglecting the knowledge of domain experts, Big Data risks generating a shallow type of analysis, since it fails to adequately embed observations within a pre-existing body of knowledge.

This commitment to an empiricist epistemology and methodological monism is particularly problematic in the context of studies of human behaviour, where actions cannot be calculated and anticipated using quantifiable data alone. In such instances, a certain degree of qualitative analysis of social, historical and cultural variables may be required in order to make the data meaningful by embedding it within a broader body of knowledge. The abstract and intangible nature of these variables requires a great deal of expert knowledge and interpretive skill to comprehend. It is therefore vital that the knowledge of domain specific experts is properly utilized to help 'evaluate the inputs, guide the process, and evaluate the end products within the context of value and validity'[48].

As such, although Big Data can provide unrivalled accounts of "what" people do, it fundamentally fails to deliver robust explanations of "why" people do it. This problem is especially critical in the case of public policy-making since without any indication of the motivations of individuals, policy-makers can have no basis upon which to intervene to incentivise more positive outcomes.

Digital Divides and Marginalisation

Today data is a highly valuable commodity. The market for data in and of itself has been steadily growing in recent years with the business models of many online services now formulated around the strategy of harvesting data from users^{^[49]}. As with the commodification of anything however, inequalities can easily emerge between the haves and have not's. Whilst the quantity of data currently generated on a daily basis is many times greater than at any other point in human history, the vast majority of this data is owned and tightly controlled by a very small number of technology companies and data brokers. Although in some instances limited access to data may be granted to university researchers or to those willing and able to pay a fee, in many cases data remains jealously guarded by data brokers, who view it as an important competitive asset. As a result these data brokers and companies risk becoming the gatekeepers of the Big Data revolution, adjudicating not only over who can benefit from Big Data, but also in what context and under what terms. For many such inconsistencies and inequalities in access to data raises serious doubts about just how widely distributed the benefits of Big Data will be. Others go even further claiming that far from helping to alleviate inequalities, the advent of Big Data risks exacerbating already significant digital divides that exist as well as creating new ones ^{^[50]}.

10. Anti-Competitive Practices

As a result of the reluctance of large companies to share their data, there increasingly exists a divide in access between small start-ups companies and their larger and more established competitors. Thus, new entrants to the marketplace may be at a competitive disadvantage in relation to large and well established enterprises, being as they are unable to harness the analytical power of the vast quantities of data available to large companies by virtue of their privileged market position. Since the performance of many online services are today often intimately connected with the collation and use of users data, some researchers have suggested that this inequity in access to data could lead to a reduction in competition in the online marketplace, and ultimately therefore to less innovation and choice for consumers[51].

As a result researchers including Nathan Newman of New York University have called for a reassessment and reorientation of anti-trust investigations and regulatory approaches more generally to 'to focus on how control of personal data by corporations can entrench monopoly power and harm consumer welfare in an economy shaped increasingly by the power of "big data"'[52]. Similarly a report produced by the European Data Protection Supervisor concluded that, 'The scope for abuse of market dominance and harm to the consumer through refusal of access to personal information and opaque or misleading privacy policies may justify a new concept of consumer harm for competition enforcement in digital economy' [53].

11. Research

From a research perspective barriers to access to data caused by proprietary control of datasets are problematic, since certain types of research could become restricted to those privileged enough to be granted access to data. Meanwhile those denied access are left not only incapable of conducting similar research projects, but also unable to test, verify or reproduce the findings of those who do. The existence of such gatekeepers may also lead to reluctance on the part of researchers to undertake research critical of the companies, upon whom they rely for access, leading to a chilling effect on the types of research conducted[54].

12. Inequality

Whilst bold claims are regularly made about the potential of Big Data to deliver economic development and generate new innovations, some critics of remain concerned about how equally the benefits of Big Data will be distributed and the effects this could have on already established digital divides [55].

Firstly, whilst the power of Big Data is already being utilized effectively by most economically developed nations, the same cannot necessarily be said for many developing countries. A combination of lower levels of connectivity, poor information infrastructure, underinvestment in information technologies and a lack of skills and trained personnel make it far more difficult for the developing world to fully reap the rewards of Big Data. As a consequence the Big Data revolution risks deepening global economic inequality as developing countries find themselves unable to compete with data rich nations whose governments can more easily exploit the vast quantities of information generated by their technically literate and connected citizens.

Likewise, to the extent that the Big Data analytics is playing a greater role in public policy-making, the capacity of individuals to generate large quantities of data, could potentially impact upon the extent to which they can provide inputs into the policy-making process. In a country such as India for example, where there exist high levels of inequality in access to information and communication technologies and the internet, there remain large discrepancies in the quantities of data produced by individuals. As a result there is a risk that those who lack access to the means of producing data will be disenfranchised, as policy-making processes become configured to accommodate the needs and interests of a privilege minority [56].

Discrimination

13. Injudicious or Discriminatory Outcomes

Big Data presents the opportunity for governments, businesses and individuals to make better, more informed decisions at a much faster pace. Whilst this can evidently provide innumerable opportunities to increase efficiency and mitigate risk, by removing human intervention and oversight from the decision-making process Big Data analysts run the risk of becoming blind to unfair or injudicious results generated by skewed or discriminatory programming of the algorithms.

There currently exists a large number of automated decision-making algorithms in operation across a broad range of sectors including most notably perhaps those used to asses an individual's suitability for insurance or credit. In either of these cases faults in the programming or discriminatory assessment criteria can have potentially damaging implications for the individual, who may as a result be unable to attain credit or insurance. This concern with the potentially discriminatory aspects of Big Data is prevalent throughout the literature and real life examples have been identified by researchers in a large number of major sectors in which Big Data is currently being used[57].

Yu for instance, cites the case of the insurance company Progressive, which required its customers to install 'Snapsnot' - a small monitoring device - into their cars in order to receive their best rates. The device tracked and reported the customers driving habits, and offered discounts to those drivers who drove infrequently, broke smoothly, and avoided driving at night - behaviors that correlate with a lower risk of future accidents. Although this form of price differentiation provided incentives for customers to drive more carefully, it also had the unintended consequence of unfairly penalizing late-night shift workers. As Yu observes, 'for late night shift-workers, who are disproportionately poorer and from minority groups, this differential pricing provides no benefit at all. It categorizes them as similar to late-night party-goers, forcing them to carry more of the cost of the intoxicated and other irresponsible driving that happens disproportionately at night'[58].

In another example, it is noted how Big Data is increasingly being used to evaluate applicants for entry-level service jobs. One method of evaluating applicants is by the length of their commute - the rationale being that employees with shorter commutes are statistically more likely to remain in the job longer. However, since most service jobs are typically located in town centers and since poorer neighborhoods tend to be those on the outskirts of town, such criteria can have the effect of unfairly disadvantaging those living in economically deprived areas. Consequently such metrics of evaluation can therefore also unintentionally act to reinforce existing social inequalities by making it more difficult for economically disadvantaged communities to work their way out of poverty[59].

14. Lack of Algorithmic Transparency.

If data is indeed the 'oil of the 21^st century'[60] then algorithms are very much the engines which are driving innovation and economic development. For many companies the quality of their algorithms is often a crucial factor in providing them with a market advantage over their competitor. Given their importance, the secrets behind the programming of algorithms are often closely guarded by companies, and are typically classified as trade secrets and as such are protected by intellectual property rights. Whilst companies may claim that such secrecy is necessary to encourage market competition and innovation, many scholars are becoming increasingly concerned about the lack of transparency surrounding the design of these most crucial tools.

In particular there is a growing sentiment common amongst many researchers that there currently exists a chronic lack of accountability and transparency in terms of how Big Data algorithms are programmed and what criteria are used to determine outcomes ^{^[61]}. As Frank Pasquale observed,

' hidden algorithms can make (or ruin) reputations, decide the destiny of entrepreneurs, or even devastate an entire economy. Shrouded in secrecy and complexity, decisions at major Silicon Valley and Wall Street firms were long assumed to be neutral and technical. But leaks, whistleblowers, and legal disputes have shed new light on automated judgment. Self-serving and reckless behavior is surprisingly common, and easy to hide in code protected by legal and real secrecy'[62].

As such, without increased transparency in algorithmic design, instances of Big Data discrimination may go unnoticed as analyst are unable to access the information necessary to identify them.

Conclusion

Today Big Data presents us with as many challenges as it does benefits. Whilst Big Data analytics can offer incredible opportunities to reduce inefficiency, improve decision-making, and increase transparency, concerns remain about the effects of these new technologies on issues such as privacy, equality and discrimination. Although the tensions between the competing demands of Big Data advocates and their critics may appear irreconcilable; only by highlighting these points of contestation can we hope to begin to ask the types of important and difficult questions necessary to do so, including; how can we reconcile Big Data's need for massive inputs of personal information with core principles of privacy such as data minimization and collection limitation? What processes and procedures need to be put in place during the design and implementation of Big Data models and algorithms to provide sufficient transparency and accountability so as to avoid instances of discrimination? What measures can be used to help close digital divides and ensure that the benefits of Big Data are shared equitably? Questions such as these are today only just beginning to be addressed; each however, will require careful consideration and reasoned debate, if Big Data is to deliver on its promises and truly fulfil its 'revolutionary' potential.

[1] Gantz, J., &Reinsel, D. Extracting Value from Chaos, IDC, (2011), available at: http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf

[2] Meeker, M. & Yu, L. Internet Trends, Kleiner Perkins Caulfield Byers, (2013), http://www.slideshare.net/kleinerperkins/kpcb-internet-trends-2013 .

[3] Douglas, L. "3D Data Management: Controlling Data Volume, Velocity and Variety" . Gartner, (2001)

[4] Boyd, D., and Crawford, K. 'Critical Questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon', Information, Communication & Society,Vol 15, Issue 5, (2012) http://www.tandfonline.com/doi/abs/10.1080/1369118X.2012.678878, Tene, O., &Polonetsky, J. Big Data for All: Privacy and User Control in the Age of Analytics, 11 Nw. J. Tech. &Intell. Prop. 239 (2013) http://scholarlycommons.law.northwestern.edu/njtip/vol11/iss5/1

[5] Ibid.,

[6] Joh. E, 'Policing by Numbers: Big Data and the Fourth Amendment', Washington Law Review, Vol. 85: 35, (2014) https://digital.law.washington.edu/dspace-law/bitstream/handle/1773.1/1319/89WLR0035.pdf?sequence=1

[7] Raghupathi, W., &Raghupathi, V. Big data analytics in healthcare: promise and potential. Health Information Science and Systems, (2014)

[8] Anderson, R., & Roberts, D. 'Big Data: Strategic Risks and Opportunities, Crowe Horwarth Global Risk Consulting Limited, (2012) https://www.crowehorwath.net/uploadedfiles/crowe-horwath-global/tabbed_content/big%20data%20strategic%20risks%20and%20opportunities%20white%20paper_risk13905.pdf

[9] Ibid.

[10] Kshetri. N, 'The Emerging role of Big Data in Key development issues: Opportunities, challenges, and concerns'. Big Data & Society (2014)http://bds.sagepub.com/content/1/2/2053951714564227.abstract,

[11] Tene, O., &Polonetsky, J. Big Data for All: Privacy and User Control in the Age of Analytics, 11 Nw. J. Tech. &Intell. Prop. 239 (2013) http://scholarlycommons.law.northwestern.edu/njtip/vol11/iss5/1

[12] Cisco, 'IoE-Driven Smart Street Lighting Project Allows Oslo to Reduce Costs, Save Energy, Provide Better Service', Cisco, (2014) Available at: http://www.cisco.com/c/dam/m/en_us/ioe/public_sector/pdfs/jurisdictions/Oslo_Jurisdiction_Profile_051214REV.pdf

[13] Newell, B, C. Local Law Enforcement Jumps on the Big Data Bandwagon: Automated License Plate Recognition Systems, Information Privacy, and Access to Government Information. University of Washington - the Information School, (2013) http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2341182

[14] Morris, D. Big data could improve supply chain efficiency-if companies would let it, Fortune, August 5 2015, http://fortune.com/2015/08/05/big-data-supply-chain/

[15] Tucker, Darren S., & Wellford, Hill B., Big Mistakes Regarding Big Data, Antitrust Source, American Bar Association, (2014). Available at SSRN: http://ssrn.com/abstract=2549044

[16] Davenport, T., Barth., Bean, R. How is Big Data Different, MITSloan Management Review, Fall (2012), Available at, http://sloanreview.mit.edu/article/how-big-data-is-different/

[17] Tucker, Darren S., & Wellford, Hill B., Big Mistakes Regarding Big Data, Antitrust Source, American Bar Association, (2014). Available at SSRN: http://ssrn.com/abstract=2549044

[18] Raghupathi, W., &Raghupathi, V. Big data analytics in healthcare: promise and potential. Health Information Science and Systems, (2014)

[19] Brown, B., Chui, M., Manyika, J. 'Are you Ready for the Era of Big Data?', McKinsey Quarterly, (2011), Available at, http://www.t-systems.com/solutions/download-mckinsey-quarterly-/1148544_1/blobBinary/Study-McKinsey-Big-data.pdf ; Benady, D., 'Radical transparency will be unlocked by technology and big data', Guardian (2014) Available at: http://www.theguardian.com/sustainable-business/radical-transparency-unlocked-technology-big-data

[20] Ibid.

[21] Ibid.

[22] United Nations, A World That Counts: Mobilising the Data Revolution for Sustainable Development, Report prepared at the request of the United Nations Secretary-General,by the Independent Expert Advisory Group on a Data Revolutionfor Sustainable Development. (2014), pg. 18, see also, Hilbert, M. Big Data for Development: From Information- to Knowledge Societies (2013). Available at SSRN: http://ssrn.com/abstract=2205145

[23] Greenleaf, G. Abandon All Hope? Foreword for Issue 37(2) of the UNSW Law Journal on 'Communications Surveillance, Big Data, and the Law' ,(2014) http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2490425##, Boyd, D., and Crawford, K. 'Critical Questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon', Information, Communication & Society, Vol. 15, Issue 5, (2012) http://www.tandfonline.com/doi/abs/10.1080/1369118X.2012.678878

[24] Tene, O., &Polonetsky, J. Big Data for All: Privacy and User Control in the Age of Analytics, 11 Nw. J. Tech. &Intell. Prop. 239 (2013) http://scholarlycommons.law.northwestern.edu/njtip/vol11/iss5/1

[25] Narayanan and Shmatikov quoted in Ibid.,

[26] OECD, Guidelines on the Protection of Privacy and Transborder Flows of Personal Data, The Organization for Economic Co-Operation and Development, (1999); The European Parliament and the Council of the European Union, EU Data Protection Directive, "Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data," (1995)

[27] Barocas, S., &Selbst, A, D., Big Data's Disparate Impact,California Law Review, Vol. 104, (2015). Available at SSRN: http://ssrn.com/abstract=2477899

[28] Article 29 Working Group., Opinion 03/2013 on purpose limitation, Article 29 Data Protection Working Party, (2013) available at: http://ec.europa.eu/justice/data-protection/article-29/documentation/opinion-recommendation/files/2013/wp203_en.pdf

[29] Solove, D, J. Privacy Self-Management and the Consent Dilemma, 126 Harv. L. Rev. 1880 (2013), Available at: http://scholarship.law.gwu.edu/cgi/viewcontent.cgi?article=2093&context=faculty_publications

[30] Brandimarte, L., Acquisti, A., & Loewenstein, G., Misplaced Confidences:

Privacy and the Control Paradox, Ninth Annual Workshop on the Economics of Information Security (WEIS) June 7-8 2010, Harvard University, Cambridge, MA, (2010), available at: https://fpf.org/wp-content/uploads/2010/07/Misplaced-Confidences-acquisti-FPF.pdf

[31] Solove, D, J., Privacy Self-Management and the Consent Dilemma, 126 Harv. L. Rev. 1880 (2013), Available at: http://scholarship.law.gwu.edu/cgi/viewcontent.cgi?article=2093&context=faculty_publications

[32] Yu, W, E., Data., Privacy and Big Data-Compliance Issues and Considerations, ISACA Journal, Vol. 3 2014 (2014), available at: http://www.isaca.org/Journal/archives/2014/Volume-3/Pages/Data-Privacy-and-Big-Data-Compliance-Issues-and-Considerations.aspx

[33] Ramirez, E., Brill, J., Ohlhausen, M., Wright, J., & McSweeny, T., Data Brokers: A Call for Transparency and Accountability, Federal Trade Commission (2014) https://www.ftc.gov/system/files/documents/reports/data-brokers-call-transparency-accountability-report-federal-trade-commission-may-2014/140527databrokerreport.pdf

[34] Michel Foucault, Discipline and Punish: The Birth of the Prison. Translated by Alan Sheridan, London: Allen Lane, Penguin, (1977)

[35] Marthews, A., & Tucker, C., Government Surveillance and Internet Search Behavior (2015), available at SSRN: http://ssrn.com/abstract=2412564

[36] Boyd, D., and Crawford, K. 'Critical Questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon', Information, Communication & Society, Vol. 15, Issue 5, (2012)

[37] Hirsch, D., That's Unfair! Or is it? Big Data, Discrimination and the FTC's Unfairness Authority, Kentucky Law Journal, Vol. 103, available at: http://www.kentuckylawjournal.org/wp-content/uploads/2015/02/103KyLJ345.pdf

[38] Hill, K., How Target Figured Out A Teen Girl Was Pregnant Before Her Father Didhttp://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/

[39] Anderson, C (2008) "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete", WIRED, June 23 2008, www.wired.com/2008/06/pb-theory/

[40] Ibid.,

[41] Kitchen, R (2014) Big Data, new epistemologies and paradigm shifts, Big Data & Society, April-June 2014: 1-12

[42] Boyd D and Crawford K (2012) Critical questions for big data. Information, Communication and Society 15(5): 662-679

[43] Ibid

[44] Lazer, D., Kennedy, R., King, G., &Vespignani, A. " The Parable of Google Flu: Traps in Big Data Analysis ." Science 343 (2014): 1203-1205. Copy at http://j.mp/1ii4ETo

[45] Boyd, D., and Crawford, K. 'Critical Questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon', Information, Communication & Society,Vol 15, Issue 5, (2012) http://www.tandfonline.com/doi/abs/10.1080/1369118X.2012.678878

[46] Leinweber, D. (2007) 'Stupid data miner tricks: overfitting the S&P 500', The Journal of Investing, vol. 16, no. 1, pp. 15-22. http://m.shookrun.com/documents/stupidmining.pdf

[47] Boyd D and Crawford K (2012) Critical questions for big data. Information, Communication and Society 15(5): 662-679

[48] McCue, C., Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis, Butterworth-Heinemann, (2014)

[49] De Zwart, M. J., Humphreys, S., & Van Dissel, B. Surveillance, big data and democracy: lessons for Australia from the US and UK. Http://www.unswlawjournal.unsw.edu.au/issue/volume-37-No-2. (2014) Retrieved from https://digital.library.adelaide.edu.au/dspace/handle/2440/90048

[50] Boyd, D., and Crawford, K. 'Critical Questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon', Information, Communication & Society,Vol 15, Issue 5, (2012) http://www.tandfonline.com/doi/abs/10.1080/1369118X.2012.678878; Newman, N., Search, Antitrust and the Economics of the Control of User Data, 31 YALE J. ON REG. 401 (2014)

[51] Newman, N., The Cost of Lost Privacy: Search, Antitrust and the Economics of the Control of User Data (2013). Available at SSRN: http://ssrn.com/abstract=2265026, Newman, N. ,Search, Antitrust and the Economics of the Control of User Data, 31 YALE J. ON REG. 401 (2014)

[52] Ibid.,

[53] European Data Protection Supervisor, Privacy and competitiveness in the age of big data:

The interplay between data protection, competition law and consumer protection in the Digital Economy, (2014), available at: https://secure.edps.europa.eu/EDPSWEB/webdav/shared/Documents/Consultation/Opinions/2014/14-03-26_competitition_law_big_data_EN.pdf

[54] Boyd, D., and Crawford, K. 'Critical Questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon', Information, Communication & Society,Vol 15, Issue 5, (2012) http://www.tandfonline.com/doi/abs/10.1080/1369118X.2012.678878

[55] Schradie, J., Big Data Not Big Enough? How the Digital Divide Leaves People Out, MediaShift, 31 July 2013, (2013), available at: http://mediashift.org/2013/07/big-data-not-big-enough-how-digital-divide-leaves-people-out/

[56] Crawford, K., The Hidden Biases in Big Data, Harvard Business Review, 1 April 2013 (2013), available at: https://hbr.org/2013/04/the-hidden-biases-in-big-data

[57] Robinson, D., Yu, H., Civil Rights, Big Data, and Our Algorithmic Future, (2014) http://bigdata.fairness.io/introduction/

[58] Ibid.

[59] Ibid

[60] Rotellla, P., Is Data The New Oil? Forbes, 2 April 2012, (2012), available at: http://www.forbes.com/sites/perryrotella/2012/04/02/is-data-the-new-oil/

[61] Barocas, S., &Selbst, A, D., Big Data's Disparate Impact,California Law Review, Vol. 104, (2015). Available at SSRN: http://ssrn.com/abstract=2477899; Kshetri. N, 'The Emerging role of Big Data in Key development issues: Opportunities, challenges, and concerns'. Big Data & Society(2014) http://bds.sagepub.com/content/1/2/2053951714564227.abstract

[62] Pasquale, F., The Black Box Society: The Secret Algorithms That Control Money and Information, Harvard University Press , (2015)

Benefits and Harms of "Big Data"

Introduction

What are the potential benefits of Big Data?

What are the potential harms of Big Data?

Privacy

Security

Epistemological and Methodological Implications

Digital Divides and Marginalisation

Discrimination

Conclusion

Meta

Author

Scott Mason

Offices

Follow our Works