Demystifying Data Breaches in India

Posted by Pawan Singh at Oct 17, 2022 04:14 PM |
Despite the rate at which data breaches occur and are reported in the media, there seems to be little information about how and when they are resolved. This post examines the discourse on data breaches in India with respect to their historical forms, with a focus on how the specific terminology to describe data security incidents has evolved in mainstream news media reportage.

Edited by Arindrajit Basu and Saumyaa Naidu


India saw a 62% drop in data breaches in the first quarter of 2022. Yet, it ranked fifth on the list of countries most hit by cyberattacks according to a 2022 report by Surfshark, a Netherlands-based VPN company. Another report on the cost of data breaches researched by the Ponemon Institute and published by IBM reveals that the breach of about 29500 records between March 2021 and March 2022 resulted in a 25% increase in the average cost from INR 165 million in 2021 to INR 176 million in 2022.

These statistics are certainly a cause for concern, especially in the context of India’s rapidly burgeoning digital economy shaped by the pervasive platformization of private and public services such as welfare, banking, finance, health, and shopping among others. Despite the rate at which data breaches occur and are reported in the media, there seems to be little information about how and when they are resolved. This post examines the discourse on data breaches in India with respect to their historical forms, with a focus on how the specific terminology to describe data security incidents has evolved in mainstream news media reportage.

While expert articulations of cybersecurity in general and data breaches in particular tend to predominate the public discourse on data privacy, this post aims to situate broader understandings of data breaches within the historical context of India’s IT revolution and delve into specific concepts and terminology that have shaped the broader discourse on data protection. The late 1990s and early 2000s offer a useful point of entry into the genesis of the data security landscape in India.

Data Breaches and their Predecessor Forms

The articulation of data security concerns around the late 1990s and early 2000s isn’t always consistent in deploying the phrase, ‘data breach’ to signal cybersecurity concerns in India. The terms such as ‘data/ identity theft’ and ‘data leak’ figure prominently in the public articulation of concerns with the handling of personal information by IT systems, particularly in the context of business process outsourcing (BPO) and e-commerce activities. Other pertinent terms such as “security breach”, “data security”, and ‘“cyberfraud” also capture the specificity of growing concerns around outsourced data to India. At the time, i.e. around mid-2000s regulatory frameworks were still evolving to accommodate and address the complexities arising from a dynamic reconfiguration of the telecommunications and IT landscape in India.

Some of the formative cases that instantiate the usage of the aforementioned terms are instructive to understand shifts in the reporting of such incidents over time. The earliest case during that period concerns a 2002 case concerning the theft and sale of source code by an IIT Kharagpur student who intended to sell the code to two undercover FBI agents who worked with the CBI to catch the thief. A straightforward case of data theft was framed by media stories around the time as a cybercrime involving the illegal sale of the source code of a software package, as software theft of intellectual property in the context of outsourcing and as an instance of industrial espionage in poor nations without laws protecting foreign companies. This case became the basis of the earliest calls for the protection of data privacy and security in the context of the Indian BPO sector. The Indian IT Act, 2000 at the time only covered unauthorized access and data theft from computers and networks without any provisions for data protection, interception or computer forgery. The BPO boom in India brought with it employment opportunities for India’s English-speaking, educated youth but in the absence of concrete data privacy legislation, the country was regarded as an unsafe destination for outsourcing aside from the political ramifications concerning the loss of American jobs.

In a major 2005 incident, employees of the Mphasis BFL call centre in Pune extracted sensitive bank account information of Citibank’s American customers to divert INR 1.90 crore into new accounts set up in India. The media coverage of this incident calls it India’s first outsourcing cyberfraud and a well planned scam, a cybercrime in a globalized world, and a case of financial fraud and a scam that required no hacking skills, and a case of data theft and misuse. Within the ambit of cybercrime, media reports of these incidents refer to them as cases of “fraud”, “scam” and “theft''.

Two other incidents in 2005 set the trend for a critical spotlight on data security practices in India. In a June 2005 incident, an employee of a Delhi-based BPO firm, Infinity e-systems, sold the account numbers and passwords of 1000 bank customers to the British Tabloid, The Sun. The Indian newspaper, Telegraph India, carried an online story headlined, “BPO Blot in British Backlash: Indian Sells Secret Data,” which reported that the employee, Kkaran Bahree, 24, was set up by a British journalist, Oliver Harvey. Harvey filmed Bahree accepting wads of cash for the stolen data. Bahree’s theft of sensitive information is described both as a data fraud and a leak in the above 2005 BBC story by Soutik Biswar. Another story on the incident calls it a “scam” involving the leakage of credit card information. The use of the term ‘leak’ appears consistently across other media accounts such as a 2005 story on Karan Bahree in the Times of India and another story in the Economic Times about the Australian Broadcasting Corporation’s (ABC) sting operation similar to the one in Delhi, describing the scam by the fraudsters as a leak of the online information of Australians. Another media account of the coverage describes the incident in more generic terms such as an “outsourcing crime”.

The other case concerned four former employees of Parsec technologies who stole classified information and diverted calls from potential customers, causing a sudden drop in the productivity of call centres managed by the company in November 2005. Another call centre fraud came to light in 2009 through a BBC sting operation in which British reporters went to Delhi and secretly filmed a deal with a man selling credit card and debit card details obtained from Symantec call centres, which sold software made by Norton. This BBC story uses the term “breach” to refer to the incident.

In the broader framing of these cases generally understood as cybercrime, which received transnational media coverage, the terms “fraud”, “leak”, “scam”, and “theft” appear interchangeably. The term “data breach” does not seem to be a popular or common usage in these media accounts of the BPO-related incidents. A broader sense of breach (of confidentiality, privacy) figures in the media reportage in implicitly racial terms of cultural trust, as a matter of ethics and professionalism and in the language of scandal in some cases.

These early cases typify a specific kind of cybercrime concerning the theft or misappropriation of outsourced personal data belonging to British or American residents. What’s remarkable about these cases is the utmost sensitivity of the stolen personal information including financial details, bank account and credit/debit card numbers, passwords, and in one case, source code. While these cases rang the alarm bells on the Indian BPO sector’s data security protocols, they also directed attention to concerns around the training of Indian employees on the ethics of data confidentiality and vetting through psychometric tests for character assessment. In the wake of these incidents, the National Association of Software and Service Companies (NASSCOM), an Indian non-governmental trade and advocacy group, launched a National Skills Registry for IT professionals to enable employers to conduct background checks in 2006.

These data theft incidents earned India a global reputation of an unsafe destination for business process outsourcing, seen to be lacking both, a culture of maintaining data confidentiality and concrete legislation for data protection at the time. Importantly, the incidents of data theft or misappropriation were also traceable back to a known source, a BPO employee or a group of malefactors, who often sold sensitive data belonging to foreign nationals to others in India.

The phrase “data leak” also caught on in another register in the context of the widespread use of camera-equipped mobile phones in India. The 2004 Delhi MMS case offers an instance of a date leak, recapitulating the language of scandal in moralistic terms.

The Delhi MMS Case

The infamous 2004 incident involved two underage Delhi Public School (DPS) students who recorded themselves in a sexually explicit act on a cellular phone. After a fall out, the male student passed the low-resolution clip on to his friend in which his female friend’s face is seen. The clip, distributed far and wide in India, ended up on the famous e-shopping and auction website, bazee.com leading to the arrest of the website’s CEO Avinash Bajaj for hosting the listing for sale. Another similar case in 2004 mimicked the mechanics of visual capture through hand-held MMS-enabled mobile phones. A two-minute MMS of a top South-Indian actress taking a shower went viral on the Internet in 2004, the year when another MMS of two prominent Bollywood actors kissing had already done the rounds. The MMS case also marked the onset of a national moral panic around the amateur uses of mobile phone technologies, capable of corrupting young Indian minds under a sneaky regime of new media modernity. The MMS case, not strictly the classic case of a data breach - non-visual information generally stored in databases - became an iconic case of a data leak framed in the media as a scandal that shocked the country, with calls for the regulation of mobile phone use in schools. The case continued its scandalous afterlife in a 2009 Bollywood film, Dev D and another 2010 film, Love, Sex and Dhokha,

Taken together, the BPO data thefts and frauds and the data leak scandals prefigure the contemporary discourse on data breaches in the second decade of the 21st century, or what may also be called the Decade of Datafication. The launch of the Indian biometric identity project, Aadhaar, in 2009, which linked access to public services and welfare delivery with biometric identification, resulted in large-scale data collection of the scheme’s subscribers. Such linking raised the spectre of state surveillance as alleged by the critics of Aadhaar, marking a watershed moment in the discourse on data privacy and protection.

Aadhaar Data Security and Other Data Breaches

Aadhaar was challenged in the Indian Supreme Court in 2012 when it was made mandatory for welfare and other services such as banking, taxation and mobile telephony. The national debate on the status of privacy as a cultural practice in Indian society and a fundamental right in the Indian Constitution led to two landmark judgments - the 2017 Puttaswamy ruling holding privacy to be a constitutional right subject to limitations and the 2018 Supreme Court judgment holding mandatory Aadhaar to be constitutional only for welfare and taxation but no other service.

While these judgments sought to rein in Aadhaar’s proliferating mandatory uses, biometric verification remained the most common mode of identity authentication with most organizations claiming it to be mandatory for various purposes. During the same period from 2010 onwards, a range of data security events concerning Aadhaar came to light. These included app-based flaws, government websites publishing Aadhaar details of subscribers, third party leaks of demographic data, duplicate and forged Aadhaar cards and other misuses.

In 2015, the Indian government launched its ambitious Digital India Campaign to provide government services to Indian citizens through online platforms. Yet, data security breach incidents continued to increase, particularly the trade in the sale and purchase of sensitive financial information related to bank accounts and credit card numbers. The online availability of a rich trove of data, accessible via a simple Google search without the use of any extractive software or hacking skills within a thriving shadow economy of data buyers and sellers makes India a particularly vulnerable digital economy, especially in the absence of robust legislation. The lack of awareness around digital crimes and low digital literacy further exacerbates the situation given that datafication via government portals, e-commerce, and online apps has outpaced the enforcement of legislative frameworks for data protection and cybersecurity.

In the context of Aadhaar data security issues, the term “data leak” seems to have more traction in media stories followed by the term “security breach”. Given the complexity of the myriad ways in which Aadhaar data has been breached, terms such as data leak and exposure (of 11 crore Indian farmers’ sensitive information) add to the specificity of the data security compromise. The term “fraud” also makes a comeback in the context of Aadhaar-related data security incidents. These cases represent a mix of data frauds involving fake identities, theft of thumb prints for instance from land registries and inadvertent data leaks in numerous incidents involving government employees in Jharkhand, voter ID information of Indian citizens in Andhra Pradesh and Telangana and activist reports of Indian government websites leaking Aadhaar data.

Aadhaar-related data security events parallel the increase in corporate data breaches during the decade of datafication. The term “data leak” again alternates with the term “data breach” in most media accounts while other terms such as “theft” and “scam” all but disappear in the media coverage of corporate data breaches.

From 2016 onwards, incidents of corporate data breaches in India continued to rise. A massive debit card data breach involving the YES Bank ATMs and point-of-sale (PoS) machines compromised through malware between May and July of 2016 resulted in the exposure of ATM PINs and non-personal identifiable information of customers. It went undetected for nearly three months. Another data leak in 2018 concerned a system run by Indane, a state-owned utility company, which allowed anyone to download private information on all Aadhaar holders including their names, services they were connected to and the unique 12-digit Aadhaar number. Data breaches continued to be reported in India concurrent with the incidents of data mismanagement related to Aadhaar. Some prominent data breaches included a cyberattack on the systems of airline data service provider SITA resulting in the leak of Air India passenger data, leakage of the personal details of the Common Admission Test (CAT) applicants, details of credit card and order preferences of Domino’s pizza customers on the dark web, leakage of COVID-19 patients’ test results leaked by government websites, user data of Justpay and Big Basket for sale on the dark web and an SBI data breach among others between 2019 and 2021.

The media reportage of these data breaches use the term “cyberattack” to describe the activities of hackers and cybercriminals operating within a shadow economy or the dark web. Recent examples of cyberattacks by hackers who leak user data for sale on the dark web include 8.2 terabytes of 110 million sensitive financial data (KYC details, Aadhaar, credit/debit cards and phone numbers) of the payments app MobiKwik users, 180 million Domino’s pizza orders (name, location, emails, mobile numbers), and Flipkart’s Cleartrip users’ data. In these incidents again, three terms appear prominently in the media reportage - cyberattack, data breach, and leak. The term “data breach” remains the most frequently used epithet in the media coverage of the lapses of data security. While it alternates with the term “leak” in the stories, the term “data breach” appears consistently across most headlines in the news stories.

The exposure of sensitive, personal, and non-personal data by public and private entities in India is certainly a cause for concern, given the ongoing data protection legislative vacuum.

The media coverage of data breaches tends to emphasize the quantum of compromised user data aside from the types of data exposed. The media framing of these breaches in quantitative terms of financial loss as well as the magnitude and the number of breaches certainly highlights the gravity of these incidents but harm to individual users is often not addressed.

Evolving Terminology and the Source of Data Harms

The main difference in the media reportage of the BPO cybersecurity incidents during the early aughts and the contemporary context of datafication is the usage of the term, “data breach”, which figures prominently in contemporary reportage of data security incidents but not so much in the BPO-related cybercrimes.

THe BPO incidents of data theft and the attendant fraud must be understood in the context of the anxieties brought on by a globalizing world of Internet-enabled systems and transnational communications. In most of these incidents regarded as cybercrimes, the language of fraud and scam ventures further to attribute such illegal actions of the identifiable malefactors to cultural factors such as lack of ethics and professionalism.The usage of the term “data leak” in these media reports functions more specifically to underscore a broader lapse in data security as well as a lack of robust cybersecurity laws. The broader term, “breach”, is occasionally used to refer to these incidents but the term, “data breach” doesn’t appear as such.

The term “data breach” gains more prominence in media accounts from 2009 onwards in the context of Aadhaar and the online delivery of goods and services by public and private players. The term “data breach” is often used interchangeably with the term “leak” within the broader ambit of cyberattacks in the corporate sector. The media reportage frames Aadhaar-related security lapses as instances of security/data breaches, data leaks, fraud, and occasionally scam.

In contrast to the handful of data security cases in the BPO sector, data breaches have abounded in the second decade of the twenty-first century. What further differentiates the BPO-related incidents to the contemporary data breaches is the source of the data security lapse. Most corporate data breaches remain attributable to the actions of hackers and cybercriminals while the BPO security lapses were traceable back to ex-employees or insiders with access to sensitive data. We also see in the coverage of the BPO-related incidents, the attribution of such data security lapses to cultural factors including a lack of ethics and professionalism often in racial overtones. The media reportage of the BBC and ABC sting operations suggests that the India BPOs lack of preparedness to handle and maintain personal data confidentiality of foreigners point to the absence of a privacy culture in India. Interestingly, this transnational attribution recurs in a different form in the national debate on Aadhaar and how Indians don’t care about their privacy.

The question of the harms of data breaches to individuals is also an important one. In the discourse on contemporary data breaches, the actual material harm to an individual user is rarely ever established in the media reportage and generally framed as potential harm that could be devastating given the sensitivity of the compromised data. The harm is reported to be predominantly a function of organizational cybersecurity weakness or attributed to hackers and cybercriminals.

The reporting of harm in collective terms of the number of accounts breached, financial costs of a data breach, the sheer number of breaches and the global rankings of countries with the highest reported cases certainly suggests a problem with cybersecurity and the lack of organizational preparedness. However, this collective framing of a data breach’s impact usually elides an individual user’s experience of harm. Even in the case of Aadhaar-related breaches - a mix of leaking data on government websites and other online portals and breaches - the notion of harm owing to exposed data isn’t clearly established. This is, however, different from the extensively documented cases of Aadhaar-related issues in which welfare benefits have been denied, identities stolen and legitimate beneficiaries erased from the system due to technological errors.

Future Directions of Research

This brief, qualitative foray into the media coverage of data breaches over two decades has aimed to trace the usage of various terms in two different contexts - the Indian BPO-related incidents and the contemporary context of datafication. It would be worth exploring at length, the relationship between frequent reports of data breaches, and the language used to convey harm in the contemporary context of a concrete data protection legislation vacuum. It would be instructive to examine the specific uses of the terms such as “fraud”, “leak”, “scam”, “theft” and “breach” in media reporting of such data security incidents more exhaustively. Such analysis would elucidate how media reportage shapes public perception towards the safety of user data and an anticipation of attendant harm as data protection legislation continues to evolve.

Especially with Aadhaar, which represents a paradigm shift in identity verification through digital means, it would be useful to conduct a sentiment analysis of how biometric identity related frauds, scams, and leaks are reported by the mainstream news media. A study of user attitudes and behaviours in response to the specific terminology of data security lapses such as the terms “breach”, “leak”, “fraud”, “scam”, “cybercrime”, and “cyberattack” would further contribute to how lay users understand the gravity of a data security lapse. Such research would go beyond expert understandings of data security incidents that tend to dominate media reportage to elucidate the concerns of lay users and further clarify the cultural meanings of data privacy.