The Centre for Internet and Society
http://editors.cis-india.org
These are the search results for the query, showing results 1 to 5.
CIS Seminar Series
http://editors.cis-india.org/internet-governance/blog/cis-seminar-series
<b>The CIS seminar series will be a venue for researchers to share works-in-progress, exchange ideas, identify avenues for collaboration, and curate research. We also seek to mitigate the impact of Covid-19 on research exchange, and foster collaborations among researchers and academics from diverse geographies. Every quarter we will be hosting a remote seminar with presentations, discussions and debate on a thematic area. </b>
<p style="text-align: justify; ">The first seminar series was held on 7th and 8th October on the theme of <a href="https://cis-india.org/internet-governance/blog/cis-seminar-series-information-disorder">‘Information Disorder: Mis-, Dis- and Malinformation’</a>,</p>
<h3 style="text-align: justify; ">Theme for the Second Seminar (to be held online)</h3>
<h3>Moderating Data, Moderating Lives: Debating visions of (automated) content moderation in the contemporary</h3>
<p style="text-align: justify; ">Artificial Intelligence (AI) and Machine Learning (ML) based approaches have become increasingly popular as “solutions” to curb the extent of mis-, dis- mal-information, hate speech, online violence and harassment on social media. The pandemic and the ensuing work from home policy forced many platforms to shift to automated moderation which further highlighted the inefficacy of existing models (<a href="https://www.zotero.org/google-docs/?u73Lwx">Gillespie, 2020)</a> to deal with the surge in misinformation and harassment. These efforts, however, raise a range of interrelated concerns such as freedom and regulation of speech on the privately public sphere of social media platforms; algorithmic governance, censorship and surveillance; the relation between virality, hate, algorithmic design and profits; and social, political and cultural implications of ordering social relations through computational logics of AI/ML.</p>
<p style="text-align: justify; ">On one hand, large-scale content moderation approaches (that include automated AI/ML-based approaches) have been deemed “necessary” given the enormity of data generated <a href="https://www.zotero.org/google-docs/?JHQ0rF">(Gillespie, 2020)</a>, on the other hand, they have been regarded as “technological fixtures” offered by the Silicon Valley <a href="https://www.zotero.org/google-docs/?YLFnLm">(Morozov, 2013)</a>, or “tyrannical” as they erode existing democratic measures <a href="https://www.zotero.org/google-docs/?Ia8JYp">(Harari, 2018)</a>. Alternatively, decolonial, feminist and postcolonial approaches insist on designing AI/ML models that centre voices of those excluded to sustain and further civic spaces on social media (<a href="https://www.zotero.org/google-docs/?1Sa8vf">Siapera, 2022)</a>.</p>
<p style="text-align: justify; ">From the global south perspective, issues around content moderation foreground the hierarchies inbuilt in the existing knowledge infrastructures. First, platforms remain unwilling to moderate content in under-resourced languages of the global south citing technological difficulties. Second, given the scale and reach of social media platforms and inefficient moderation models, the work is outsourced to workers in the global south who are meant to do the dirty work of scavenging content off these platforms for the global north. Such concerns allow us to interrogate the techno-solutionist approaches as well as their critiques situated in the global north. These realities demand that we articulate a different relationship with AI/ML while also being critical of AI/ML as an instrument of social empowerment for those at the “bottom of the pyramid” <a href="https://www.zotero.org/google-docs/?bvx6mV">(Arora, 2016)</a>.</p>
<p style="text-align: justify; ">The seminar invites scholars interested in articulating nuanced responses to content moderation that take into account the harms perpetrated by algorithmic governance of social relations and irresponsible intermediaries while being cognizant of the harmful effects of mis-, dis- mal-information, hate speech, online violence and harassment on social media.</p>
<p style="text-align: justify; ">We invite abstract submissions that respond to these complexities vis-a-vis content moderation models or propose provocations regarding automated moderation models and their in/efficacy in furthering egalitarian relationships on social media, especially in the global south.</p>
<p style="text-align: justify; ">Submissions can reflect on the following themes using legal, policy, social, cultural and political approaches. Also, the list is not exhaustive and abstracts addressing other ancillary concerns are most welcome:</p>
<ul>
<li>Metaphors of (content) moderation: mediating utopia, dystopia, scepticism surrounding AI/ML approaches to moderation.</li>
<li>From toxic to healthy, from purity to impurity: Interrogating gendered, racist, colonial tropes used to legitimize content moderation </li>
<li>Negotiating the link between content moderation, censorship and surveillance in the global south</li>
<li>Whose values decide what is and is not harmful? </li>
<li>Challenges of building moderation models for under resourced languages.</li>
<li>Content moderation, algorithmic governance and social relations. </li>
<li>Communicating algorithmic governance on social media to the not so “tech-savvy” among us.</li>
<li>Speculative horizons of content moderation and the future of social relations on the internet. </li>
<li>Scavenging abuse on social media: Immaterial/invisible labour for making for-profit platforms safer to use.</li>
<li>Do different platforms moderate differently? Interrogating content moderation on diverse social media platforms, and multimedia content.</li>
<li>What should and should not be automated? Understanding prevalence of irony, sarcasm, humour, explicit language as counterspeech.</li>
<li>Maybe we should not automate: Alternative, bottom-up approaches to content moderation</li>
</ul>
<h3>Seminar Format</h3>
<p>We are happy to welcome abstracts for one of two tracks:</p>
<p><strong>Working paper presentation</strong></p>
<p style="text-align: justify; ">A working paper presentation would ideally involve a working draft that is presented for about 15 minutes followed by feedback from workshop participants. Abstracts for this track should be 600-800 words in length with clear research questions, methodology, and questions for discussion at the seminar. Ideally, for this track, authors should be able to submit a draft paper two weeks before the conference for circulation to participants.</p>
<p><strong>Coffee-shop conversations</strong></p>
<p style="text-align: justify; ">In contrast to the formal paper presentation format, the point of the coffee-shop conversations is to enable an informal space for presentation and discussion of ideas. Simply put, it is an opportunity for researchers to “think out loud” and get feedback on future research agendas. Provocations for this should be 100-150 words containing a short description of the idea you want to discuss.</p>
<p style="text-align: justify; ">We will try to accommodate as many abstracts as possible given time constraints. We welcome submissions from students and early career researchers, especially those from under-represented communities.</p>
<p>All discussions will be private and conducted under the Chatham House Rule. Drafts will only be circulated among registered participants.</p>
<p>Please send your abstracts to <a href="mailto:workshops@cis-india.org">workshops@cis-india.org</a>.</p>
<h3>Timeline</h3>
<div id="_mcePaste"><ol>
<li>Abstract Submission Deadline: 18th April</li>
<li>Results of Abstract review: 25th April</li>
<li>Full submissions (of draft papers): 25th May</li>
<li>Seminar date: Tentative 31st May</li>
</ol></div>
<h3>References</h3>
<p class="MsoNormal" style="text-align:justify; "><span><a href="https://www.zotero.org/google-docs/?ZHb88g"><span>Arora, P. (2016). Bottom of the Data Pyramid: Big Data and the Global South. </span></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><i><span>International Journal of Communication</span></i></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><span>, </span></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><i><span>10</span></i></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><span>(0), 19.</span></a></span><span> </span></p>
<p class="MsoNormal" style="text-align:justify; "><span><a href="https://www.zotero.org/google-docs/?ZHb88g"><span>Gillespie, T. (2020). Content moderation, AI, and the question of scale. </span></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><i><span>Big Data & Society</span></i></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><span>, </span></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><i><span>7</span></i></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><span>(2), 2053951720943234. https://doi.org/10.1177/2053951720943234</span></a></span><span> </span></p>
<p class="MsoNormal" style="text-align:justify; "><span><a href="https://www.zotero.org/google-docs/?ZHb88g"><span>Harari, Y. N. (2018, August 30). </span></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><i><span>Why Technology Favors Tyranny</span></i></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><span>. The Atlantic. https://www.theatlantic.com/magazine/archive/2018/10/yuval-noah-harari-technology-tyranny/568330/</span></a></span><span> </span></p>
<p class="MsoNormal" style="text-align:justify; "><span><a href="https://www.zotero.org/google-docs/?ZHb88g"><span>Morozov, E. (2013). </span></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><i><span>To save everything, click here: The folly of technological solutionism</span></i></a><a href="https://www.zotero.org/google-docs/?ZHb88g"><span> (First edition). PublicAffairs.</span></a></span><span> </span></p>
<p><a href="https://www.zotero.org/google-docs/?ZHb88g" style="text-align: justify; ">Siapera, E. (2022). AI Content Moderation, Racism and (de)Coloniality. </a><a href="https://www.zotero.org/google-docs/?ZHb88g" style="text-align: justify; "><i>International Journal of Bullying Prevention</i></a><a href="https://www.zotero.org/google-docs/?ZHb88g" style="text-align: justify; ">, </a><a href="https://www.zotero.org/google-docs/?ZHb88g" style="text-align: justify; "><i>4</i></a><a href="https://www.zotero.org/google-docs/?ZHb88g" style="text-align: justify; ">(1), 55–65. https://doi.org/10.1007/s42380-021-00105-7</a></p>
<p>
For more details visit <a href='http://editors.cis-india.org/internet-governance/blog/cis-seminar-series'>http://editors.cis-india.org/internet-governance/blog/cis-seminar-series</a>
</p>
No publisherCheshta AroraInternet GovernanceMachine LearningArtificial IntelligenceEventSeminar2022-04-11T15:19:11ZBlog EntryCISxScholars Delhi - Harsh Gupta - FAT ML for Lawyers and Lawmakers (June 29, 5:30 pm)
http://editors.cis-india.org/raw/cisxscholars-harsh-gupta-machine-learning-for-lawyers-and-lawmakers-20170629
<b>We are proud to announce that Harsh Gupta will discuss "FAT ML (Fairness, Accountability, and Transparency in Machine Learning) for Lawyers and Lawmakers" at the CIS office in Delhi on Thursday, June 29, at 5:30 pm. This will be a two and half hour session: beginning with a 45 minute talk, followed by 15 minute break, another talk for 45 minutes, and then a discussion session. Please RSVP if you are joining us: <raw@cis-india.org>. </b>
<p> </p>
<p><em>CISxScholars are informal events organised by CIS for presentation, discussion, and exchange of academic research and policy analysis.</em></p>
<hr />
<h3><strong>FAT ML (Fairness, Accountability, and Transparency in Machine Learning) for Lawyers and Lawmakers</strong></h3>
<p>From tagging people in photos to determining risk of loan defaults, use of data based tools is affecting more and areas of our lives. In some areas there have been very successful applications of such tools, in others areas they has been found to not only reflect the existing bias and discrimination found in today's society but also exaggerate it.</p>
<h3><strong>Harsh Gupta</strong></h3>
<p>Harsh Gupta is a recent graduate from IIT Kharagpur with B.Sc and M.Sc in Mathematics and Computing and will be joining JP Morgan and Chase as a data scientist. He completed his master's thesis in "Discrimination Aware Machine Learning". He was also an intern at The Center for Internet and Society during summer of 2016.</p>
<p> </p>
<p>
For more details visit <a href='http://editors.cis-india.org/raw/cisxscholars-harsh-gupta-machine-learning-for-lawyers-and-lawmakers-20170629'>http://editors.cis-india.org/raw/cisxscholars-harsh-gupta-machine-learning-for-lawyers-and-lawmakers-20170629</a>
</p>
No publishersumandroFAT MLCISxScholarsBig DataMachine LearningResearchers at WorkEventArtificial Intelligence2017-06-27T09:16:48ZEventNew Media, personalisation and the role of algorithms
http://editors.cis-india.org/internet-governance/new-media-personalisation-and-the-role-of-algorithms
<b>In his much acclaimed book, The Filter Bubble, Eli Pariser explains how personalisation of services on the web works and laments that they are creating individual bubbles for each user, which run counter to the idea of the Internet as an inherently open place. While Pariser’s book looks at the practices of various large companies providing online services, he briefly touches upon the role of new media such as search engines and social media portals in new curation. Building upon Pariser’s unexplored argument, this article looks at the impact of algorithmic decision-making and Big Data in the context of news reporting and curation.</b>
<em><br /></em>
<blockquote>
<div>
<div><em>Everything which bars freedom and fullness of communication sets up barriers that divide human beings into sets and cliques, into antagonistic sects and factions, and thereby undermines the democratic way of life. </em>—John Dewey</div>
</div>
</blockquote>
<p> Eli Pariser, in his book, The Filter Bubble,[1] refers to the scholarship by Walter Lippmann and John Dewey as integral to the evolution of the understanding of the democratic and ethical duties of the Fourth Estate. Lippmann was disillusioned by the role of newspapers in propaganda for the First World War. He responded with three books in quick succession — Liberty and the News,[2] Public Opinion[3] and The Phantom Public.[4] Lippmann brought attention the fact that the process of news-reporting was conducted through privately determined and unexamined standards. The failure of the Fourth Estate to perform its democratic functions, was, in the opinion of Lippmann, one of the prime factors responsible for the public not being an informed and rational entity. John Dewey, while rejecting Lippmann’s arguments that matters of public policy can only be determined by inside experts with training and education, did acknowledge the his critique of the media.</p>
<p>Pariser points to the creation of a wall between editorial decisionmaking and advertiser interests, as the eventual result of the Lippmann and Dewey debate. While accepting that this division between the financial and reporting sides of media houses has not been always observed, Pariser emphasises that the fact that the standard exists is important.[5] Unlike traditional media, the new media which relies on algorithmic decision-making for personalisation is not subject to the same standards which try to mitigate the influence of commercial interests on editorial decisions while performing many of the same functions as the traditional media.[6] </p>
<h3>How personalisation algorithms work</h3>
<p dir="ltr">Kevin Slavin, at his famous talk in the TEDGLobal Conference, characterised algorithms as “maths that computers use to decide stuff” and that it was infiltrating every aspect of our lives.[7] According to Slavin’s view, algorithms can be seen as control technologies and shape our world constantly through media and information systems, dynamically modifying content and function through these programmed routines. Search engines and social media platforms perpetually rank user-generated content through algorithms.[8]</p>
<p>Personalisation technologies have various advantages. It translates into more relevant content, which for service providers means more clicks and revenue and for consumer, less time spent on finding the content.[9] However, it also leads to privacy compromise, lack of control and reduced individual capability.[10] Search engines like Google use the famous PageRank algorithm, which combined with geographical location and previous searches yields most relevant search results.[11] PageRank algorithm uses various real time variables dependent on both voluntary and involuntary user inputs. These variables include number of clicks, number of occurrences of the key terms and number of references by other credible pages etc. This data in turn determines the order of pages in search results and influences the way we perceive, understand and analyse information.[12] Maps showing real time traffic information retrieve data from laser and infrared sensors alongside the road and from information from devices of users. Once this real time data is combined with historical trends, these maps recommend rout to every user, hence influencing the traffic patterns.[13]</p>
<p>Even though this phenomenon of personalization may appears to be new, it has been prevalent in the society for ages.[14] The history of mass media culture clearly shows personalization has always been a method to increase market, market reach and customer satisfaction.[15] Newspapers have sections dedicated to special topics, radio and TV have channels dedicated to different interest groups, age groups and consumers.[16] These personalised sections in a newspaper and personalised channels on radio and television don’t just provide greater satisfaction to the readers or listeners or consumers, they also provide targeted advertisement space for the advertisers and content developers. However, digital footprints and mass collection of data have made this phenomenon much more granular and detailed. Geographical location of an individual can tell a lot about their community, their culture and other important traits local to a community.[17] This data further assists in personalisation. Current developments in technology not only help in better collection of data about personal preferences but also help in better personalisation.</p>
<p>Pariser mentions three ways in which the personalization technologies of this day are different from those of the past. First, for the very first time, individuals are alone in the filter bubble. While in traditional forms of personalisation, there were various individuals who shared the same frame of reference, now there is a separate sets of filters governing the dissemination of content to each individual.[18] Second, the personalisation technologies are entirely invisible now, and there is little that consumers can do to control or modify them.[19] Third, often the decision to be subject to these personalisation technologies is not an informed choice. A good example of this would be an individual’s geographical location.[20]</p>
<h3>The neutrality of New Media?</h3>
<p dir="ltr">More and more, we have noticed personalisation technologies having an impact on how we consume news on the Internet. Google News, Facebook’s News Feed which tries to put together a dynamic feed for both personal and global stories, and Twitter’s trending hashtag feature, have brought forward these services are key drivers of an emerging news ecosystem. Initially, this new media was hailed as a natural consequence of the Internet which would enable greater public participation, allow journalists to find more stories and engage with the readers directly. An illustration of the same could be seen in the way Internet based news media and social networking websites behaved in the aftermath of Israel’s attacks on a United Nations run school in Gaza strip. While much of the international Internet media covered the story, Israel’s home media did not cover the story. The only exception to this was the liberal Israeli news website Ha’aretz.[21] Network graph details of Twitter, for a few days immediately after the incident clearly show the social media manifestation of the event in the personalised cyberspace. It is clearly visible that when most of the word was re-tweeting news of this heinous act of Israel, Israeli’s hardly re-tweeted this news. In fact they were busty re-tweeting the news of rocket attacks on Israel.[22]</p>
<p>The use of social media in newsmaking was hailed by many scholars as symptomatic of the decentralisation characteristic of the Internet. It has been seen as movement towards greater grassroots participation by negating the ‘gatekeeping’ role traditionally played by editors. Thomas Poell and José van Dijck punch holes in theory of social media and other online technologies as mere facilitators of user participation and translators of user preferences through Big Data analytics.[23] They quote T. Gillespie’s work which talks of the narrative of these online services as platforms which are “open, neutral, egalitarian and progressive support for activity.”[24]</p>
<p>Pedro Domingos calls the overwhelming number of choices as the defining problem of the information age, and machine learning and data analytics as the largest part of this solution.[25] The primary function of algorithmic decision making in the context of consumption of content is to narrow down the choices. Domingos is more optimistic about the impact of these technologies, and he says “last step of the decision is usually still for humans to make, but learners intelligently reduce the choices to something a human can manage.”[26] On the other hand, Pariser is more circumspect about the coercive result of machine learning algorithms. Whichever way we lean, we have to accept that a large part of personalisation algorithms is to select and prioritize content by categorising it on the basis of relevance and popularity. </p>
<p>Poell and van Dijck call this a new knowledge logic which in effect replaces human judgement (as, earlier exercised by editors) to some kind of proxy decisionmaking based on data. Their main thesis is that there is little evidence to suggest that the latter is more democratic than former and creates new problems of its own. They go on to compare the practices of various services including Facebook’s new graph and Twitter’s trending topic, and conclude that they prioritise breaking news stories over other kinds of content.[27] For instance, the algorithm for the trending topics depends not on the volume but the velocity of the tweets with the hashtag or term. It could be argued that given this predilection, the algorithms will rarely prefer complex content. If we go by Lippmann and Dewey’s idea that the role of the Fourth Estate is to inform public debate and accountability of those in positions of power, this aspect of Big Data algorithms does not correspond with this role.</p>
<h3>Quantified Audience</h3>
<p dir="ltr">Another aspect of use of Big Data and algorithms in New Media that requires attention is that the networked infrastructure enables a quantified audience. C W Anderson who has studied newsroom practices in the US looked at role played by audience quantification and rationalization in shifting newswork practices. He concluded that more and more, journalists are less autonomous in their news decisions and increasingly reliant on audience metrics as a supplement to news judgment.[28] Poell and van Dijck review the the practices by some leading publications such a New York Times, L.A. Times and Huffington Post, and degree to which audience metrics dictates editorial decisions. While New York Times seems to prioritise content on their social media portals based on expectation of spike in user traffic, L.A. Times goes one step further by developing content specifically aimed towards promoting greater social participation. Neither of these practices though compare to the reliance on SEO and SMO strategies of web-born news providers like Huffington Post. They have traffic editors who trawl the Internet for trending topics and popular search terms, the feedback from them dictates the content creation.[29]</p>
<h3>Conclusion</h3>
<p dir="ltr">The above factors demonstrate that the idea of New Media leading to the Fourth Estate performing its democratic functions does not take into account the actual practices. This idea is based on the erroneous assumption that technology, in general and algorithms, in particular are neutral. While the emergence of New Media might have reduced the gatekeeping role played by the editors, its strong prioritisation of content that will be popular reduce the validity of arguments that it leads to more informed public discussion. As Pariser said, the traditional media scores over the New Media inasmuch as there is an existence of a standard of division between editorial decisionmaking and advertiser interest. While this standard is flouted by media houses all the time, it exists as a metric to aspire to and measure service providers against. The New Media performs many of the same functions and maybe it is time to evolve some principles and ethical standards that take into account the need for it to perform these democratic functions.</p>
<h3>Endnotes </h3>
<p class="normal"><sup><sup>[1]</sup></sup> Eli Pariser, The Filter Bubble: What the Internet is
hiding from you (The Penguin Press, New York, 2011) </p>
<p dir="ltr"><span class="MsoFootnoteReference"><span class="MsoFootnoteReference">[2]</span></span> Walter Lippmann, Liberty and News (Harcourt, Brace
and Howe, New York 1920) available at<a href="https://archive.org/details/libertyandnews01lippgoog">https://archive.org/details/libertyandnews01lippgoog</a></p>
<p class="normal"><sup><sup>[3]</sup></sup> Walter Lippmann, Public Opinion (Harcourt, Brace and
Howe, New York 1920) available at <a href="http://xroads.virginia.edu/~Hyper2/CDFinal/Lippman/cover.html">http://xroads.virginia.edu/~Hyper2/CDFinal/Lippman/cover.html</a></p>
<p class="normal"><sup><sup>[4]</sup></sup> Walter Lippmann, The Phantom Public (Transaction
Publishers, New York, 1925)</p>
<p class="normal"><sup><sup>[5]</sup></sup> <em>Supra</em> Note
1 at 35.</p>
<p class="normal"><sup><sup>[6]</sup></sup> <em>Supra</em> Note
1 at 36.</p>
<p class="normal"><sup><sup>[7]</sup></sup> <a href="https://www.ted.com/talks/kevin_slavin_how_algorithms_shape_our_world/transcript?language=en">https://www.ted.com/talks/kevin_slavin_how_algorithms_shape_our_world/transcript?language=en</a></p>
<p class="normal"><sup><sup>[8]</sup></sup> Fenwick McKelvey, “Algorithmic Media Need Democratic
Methods: Why Publics Matter”, available at <a href="http://www.fenwickmckelvey.com/wp-content/uploads/2014/11/2746-9231-1-PB.pdf">http://www.fenwickmckelvey.com/wp-content/uploads/2014/11/2746-9231-1-PB.pdf</a>.</p>
<p class="normal"><sup><sup>[9]</sup></sup> <a href="http://mashable.com/2011/06/03/filters-eli-pariser/#9tIHrpa_9Eq1">http://mashable.com/2011/06/03/filters-eli-pariser/#9tIHrpa_9Eq1</a></p>
<p class="normal"><sup><sup>[10]</sup></sup> Helen Ashman, Tim Brailsford, Alexandra Cristea, Quan
Z Sheng, Craig Stewart, Elaine Torns and Vincent Wade, “The ethical and social
implications of personalization technologies for e-learning” available at <a href="http://www.sciencedirect.com/science/article/pii/S0378720614000524">http://www.sciencedirect.com/science/article/pii/S0378720614000524</a>.</p>
<p class="normal"><sup><sup>[11]</sup></sup> Sergey Brin and Lawrence Page, “The Anatomy of a
Large-Scale Hypertextual Web Search Engine” available at <a href="http://infolab.stanford.edu/pub/papers/google.pdf">http://infolab.stanford.edu/pub/papers/google.pdf</a>.</p>
<p class="normal"><sup><sup>[12]</sup></sup> Ian Rogers, “The Google Pagerank Algorithm and How It
Works” available at <a href="http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm">http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm</a>.</p>
<p class="normal"><sup><sup>[13]</sup></sup> Trygve Olson and Terry Nelson, “The Internet’s Impact
on Political Parties and Campaigns”, available at <a href="http://www.kas.de/wf/doc/kas_19706-544-2-30.pdf?100526130942">http://www.kas.de/wf/doc/kas_19706-544-2-30.pdf?100526130942</a>.</p>
<p class="normal"><sup><sup>[14]</sup></sup> Ian Witten, “Bias, privacy and and personalisation on
the web”, available at <a href="http://www.cs.waikato.ac.nz/~ihw/papers/07-IHW-Bias,privacyonweb.pdf">http://www.cs.waikato.ac.nz/~ihw/papers/07-IHW-Bias,privacyonweb.pdf</a>.</p>
<p class="normal"><sup><sup>[15]</sup></sup> <em>Supra</em> Note
1 at 10.</p>
<p class="normal"><sup><sup>[16]</sup></sup> <a href="https://www.americanpressinstitute.org/publications/reports/survey-research/social-demographic-differences-news-habits-attitudes/">https://www.americanpressinstitute.org/publications/reports/survey-research/social-demographic-differences-news-habits-attitudes/</a></p>
<p class="normal"><sup><sup>[17]</sup></sup> Charles Heatwole, “Culture: A Geographical Perspective”
available at <a href="http://www.p12.nysed.gov/ciai/socst/grade3/geograph.html">http://www.p12.nysed.gov/ciai/socst/grade3/geograph.html</a>.</p>
<p class="normal"><sup><sup>[18]</sup></sup> <em>Supra</em> Note
1 at 10.</p>
<p class="normal"><sup><sup>[19]</sup></sup> <em>Id</em>.</p>
<p class="normal"><sup><sup>[20]</sup></sup> <em>Supra</em> Note
1 at 11.</p>
<p class="normal"><sup><sup>[21]</sup></sup> Paul Mason, “Why Israel is losing the social media
war over Gaza?” available at <a href="http://blogs.channel4.com/paul-mason-blog/impact-social-media-israelgaza-conflict/1182">http://blogs.channel4.com/paul-mason-blog/impact-social-media-israelgaza-conflict/1182</a>.</p>
<p class="normal"><sup><sup>[22]</sup></sup> Gilad Lotan, Israel, Gaza, War & Data: Social
Networks and the Art of Personalizing Propaganda available at <a href="http://www.huffingtonpost.com/entry/israel-gaza-war-social-networks-data_b_5658557.html">www.huffingtonpost.com/entry/israel-gaza-war-social-networks-data_b_5658557.html</a></p>
<p class="normal"><sup><sup>[23]</sup></sup> Thomas Poell and José van Dijck, “Social Media and
Journalistic Independence” in Media Independence: Working with Freedom or
Working for Free?, edited by James Bennett & Niki Strange. (Routledge,
London, 2015)</p>
<p class="normal"><sup><sup>[24]</sup></sup> T Gillespie, “The politics of ‘platforms,” in New
Media & Society (Volume 12, Issue 3).</p>
<p class="normal"><sup><sup>[25]</sup></sup> Pedro Domingos, The Master Algorithm: How the quest
for the ultimate learning machine will re-make the world (Basic Books, New
York, 2015) at 38.</p>
<p class="normal"><sup><sup>[26]</sup></sup> <em>Ibid</em> at 40.</p>
<p class="normal"><sup><sup>[27]</sup></sup> <em>Supra</em> Note
23.</p>
<p class="normal"><sup><sup>[28]</sup></sup> C W Anderson, Between creative and quantified
audiences: Web metrics and changing patterns of newswork in local US newsrooms,
available at <a href="https://www.academia.edu/10937194/Between_Creative_And_Quantified_Audiences_Web_Metrics_and_Changing_Patterns_of_Newswork_in_Local_U.S._Newsrooms">https://www.academia.edu/10937194/Between_Creative_And_Quantified_Audiences_Web_Metrics_and_Changing_Patterns_of_Newswork_in_Local_U.S._Newsrooms</a></p>
<p dir="ltr">
<sup><sup>[29]</sup></sup> <em>Supra </em>Note 23.</p>
<p dir="ltr"><span id="docs-internal-guid-24b4db2a-a606-d425-16ff-1d76b980367d"><br /></span></p>
<p>
For more details visit <a href='http://editors.cis-india.org/internet-governance/new-media-personalisation-and-the-role-of-algorithms'>http://editors.cis-india.org/internet-governance/new-media-personalisation-and-the-role-of-algorithms</a>
</p>
No publisheramberHuman RightsBig DataInternet GovernanceMachine LearningAlgorithmsNew Media2017-01-16T07:20:52ZBlog EntryBig Data in India: Benefits, Harms, and Human Rights - Workshop Report
http://editors.cis-india.org/internet-governance/big-data-in-india-benefits-harms-and-human-rights-a-report
<b>The Centre for Internet and Society held a one-day workshop on “Big Data in India: Benefits, Harms and Human Rights” at India Habitat Centre, New Delhi on the 1st of October, 2016. This report is a compilation of the the issues discussed, ideas exchanged and challenges recognized during the workshop. The objective of the workshop was to discuss aspects of big data technologies in terms of harms, opportunities and human rights. The discussion was designed around an extensive study of current and potential future uses of big data for governance in India, that CIS has undertaken over the last year with support from the MacArthur Foundation.</b>
<p> </p>
<p><strong>Contents</strong></p>
<p><a href="#1"><strong>Big Data: Definitions and Global South Perspectives</strong></a></p>
<p><a href="#2"><strong>Aadhaar as Big Data</strong></a></p>
<p><a href="#3"><strong>Seeding</strong></a></p>
<p><a href="#4"><strong>Aadhaar and Data Security</strong></a></p>
<p><a href="#5"><strong>Aadhaar’s Relational Arrangement with Big Data Scheme</strong></a></p>
<p><a href="#6"><strong>The Myths surrounding Aadhaar</strong></a></p>
<p><a href="#7"><strong>IndiaStack and FinTech Apps</strong></a></p>
<p><a href="#8"><strong>Problems with UID</strong></a></p>
<hr />
<h2 id="1">Big Data: Definitions and Global South Perspectives</h2>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">“Big Data” has been defined by multiple scholars till date. The first consideration at the workshop was to discuss various definitions of big data, and also to understand what could be considered Big Data in terms of governance, especially in the absence of academic consensus. One of the most basic ways to define it, as given by the National Institute of Standards and Technology, USA, is to take it to be the data that is beyond the computational capacity of current systems. This definition has been accepted by the UIDAI of India. Another participant pointed out that Big Data is not only indicative of size, but rather the nature of data which is unstructured, and continuously flowing. The Gartner definition of Big Data relies on the three Vs i.e. Volume (size), Velocity (infinite number of ways in which data is being continuously collected) and Variety (the number of ways in which data can be collected in rows and columns).</p>
<p style="text-align: justify;" dir="ltr">The presentation also looked at ways in which Big Data is different from traditional data. It was pointed out that it can accommodate diverse unstructured datasets, and it is ‘relational’ i.e. it needs the presence of common field(s) across datasets which allows these fields to be conjoined. For e.g., the UID in India is being linked to many different datasets, and they don’t constitute Big Data separately, but do so together. An increasingly popular definition is to define data as “Big Data” based on what can be achieved through it. It has been described by authors as the ability to harness new kinds of insight which can inform decision making. It was pointed out that CIS does not subscribe to any particular definition, and is still in the process of coming up with a comprehensive definition of Big Data.</p>
<p style="text-align: justify;" dir="ltr">Further, discussion touched upon the approach to Big Data in the Global South. It was pointed out that most discussions about Big Data in the Global South are about the kind of value that it can have, the ways in which it can change our society. The Global North, on the other hand, has moved on to discussing the ethics and privacy issues associated with Big Data.</p>
<p style="text-align: justify;" dir="ltr">After this, the presentation focussed on case studies surrounding key Central Government initiatives and projects like Aadhaar, Predictive Policing, and Financial Technology (FinTech).</p>
<h2 id="2">Aadhaar as Big Data</h2>
<p style="text-align: justify;" dir="ltr">In presenting CIS’ case study on Aadhaar, it was pointed out that initially, Aadhaar, with its enrollment dataset was by itself being seen as Big Data. However, upon careful consideration in light of definitions discussed above, it can be seen as something that enables Big Data. The different e-governance projects within Digital India, along with Aadhaar, constitute Big Data. The case study discussed the Big Data implications of Aadhaar, and in particular looked at a ‘cradle to grave’ identity mapping through various e-government projects and the datafication of various transaction generated data.</p>
<h2 id="3">Seeding</h2>
<p style="text-align: justify;" dir="ltr">Any digital identity like Aadhaar typically has three features: 1. Identification i.e. a number or card used to identify yourself; 2. Authentication, which is based on your number or card and any other digital attributes that you might have; 3. Authorisation: As bearers of the digital identity, we can authorise the service providers to take some steps on our behalf. The case study discussed ‘seeding’ which enables the Big Data aspects of Digital India. In the process of seeding, different government databases can be seeded with the UID number using a platform called Ginger. Due to this, other databases can be connected to UIDAI, and through it, data from other databases can be queried by using your Aadhaar identity itself. This is an example of relationality, where fractured data is being brought together. At the moment, it is not clear whether this access by UIDAI means that an actual physical copy of such data from various sources will be transferred to UIDAI’s servers or if they will just access it through internet, but the data remains on the host government agency’s server. An example of even private parties becoming a part of this infrastructure was raised by a participant when it was pointed out that Reliance Jio is now asking for fingerprints. This can then be connected to the relational infrastructure being created by UIDAI. The discussion then focused on how such a structure will function, where it was mentioned that as of now, it cannot be said with certainty that UIDAI will be the agency managing this relational infrastructure in the long run, even though it is the one building it.</p>
<h2 id="4">Aadhaar and Data Security</h2>
<p style="text-align: justify;" dir="ltr">This case study also dealt with the sheer lack of data protection legislation in India except for S.43A of the IT Act. The section does not provide adequate protection as the constitutionality of the rules and regulations under S.43A is ambivalent. More importantly, it only refers to private bodies. Hence, any seeding which is being done by the government is outside the scope of data protection legislation. Thus, at the moment, no legal framework covers the processes and the structures being used for datasets. Due to the inapplicability of S.43A to public bodies, questions were raised as to the existence of a comprehensive data protection policy for government institutions. Participants answered the question in the negative. They pointed out that if any government department starts collecting data, they develop their own privacy policy. There are no set guidelines for such policies and they do not address concerns related to consent, data minimisation and purpose limitation at all. Questions were also raised about the access and control over Big Data with government institutions. A tentative answer from a participant was that such data will remain under the control of the domain specific government ministry or department, for e.g. MNREGA data with the Ministry of Rural Development, because the focus is not on data centralisation but rather on data linking. As long as such fractured data is linked and there is an agency that is responsible to link them, this data can be brought together. Such data is primarily for government agencies. But the government is opening up certain aspects of the data present with it for public consumption for research and entrepreneurial purposes.The UIDAI provides you access to your own data after paying a minimal fee. The procedure for such access is still developing.</p>
<h2 id="5">Aadhaar’s Relational Arrangement with Big Data Scheme</h2>
<p style="text-align: justify;" dir="ltr">The various Digital India schemes brought in by the government were elucidated during the workshop. It was pointed out that these schemes extend to myriad aspects of a citizen’s daily life and cover all the essential public services like health, education etc. This makes Aadhaar imperative even though the Supreme Court has observed that it is not mandatory for every citizen to have a unique identity number. The benefits of such identity mapping and the ecosystem being generated by it was also enumerated during the discourse. But the complete absence of any data ethics or data confidentiality principles make us unaware of the costs at which these benefits are being conferred on us. Apart from surveillance concerns, the knowledge gap being created between the citizens and the government was also flagged. Three main benefits touted to be provided by Aadhaar were then analysed. The first is the efficient delivery of services. This appears to be an overblown claim as the Aadhaar specific digitisation and automation does not affect the way in which employment will be provided to citizens through MNREGA or how wage payment delays will be overcome. These are administrative problems that Aadhaar and associated technologies cannot solve. The second is convenience to the citizens. The fallacies in this assertion were also brought out and identified. Before the Aadhaar scheme was rolled in, ration cards were issued based on certain exclusion and inclusion criteria.. The exclusion and inclusion criteria remain the same while another hurdle in the form of Aadhaar has been created. As India is still lacking in supporting infrastructure such as electricity, server connectivity among other things, Aadhaar is acting as a barrier rather than making it convenient for citizens to enroll in such schemes.The third benefit is fraud management. Here, a participant pointed out that this benefit was due to digitisation in the form of GPS chips in food delivery trucks and electronic payment and not the relational nature of Aadhaar. Aadhaar is only concerned with the linking up or relational part. About deduplication, it was pointed out how various government agencies have tackled it quite successfully by using technology different from biometrics which is unreliable at the best of times.</p>
<h2 id="6">The Myths surrounding Aadhaar</h2>
<p style="text-align: justify;" dir="ltr">The discussion also reflected on the fact that Aadhaar is often considered to be a panacea that subsumes all kinds of technologies to tackle leakages. However, this does not take into account the fact that leakages happen in many ways. A system should have been built to tackle those specific kinds of leakages, but the focus is solely on Aadhaar as the cure for all. Notably, participants who have been a part of the government pointed out how this myth is misleading and should instead be seen as the first step towards a more digitally enhanced country which is combining different technologies through one medium.</p>
<h2 id="7">IndiaStack and FinTech Apps</h2>
<h3 id="71">What is India Stack?</h3>
<p style="text-align: justify;" dir="ltr">The focus then shifted to another extremely important Big Data project, India Stack, being conceptualised and developed by a team of private developers called iStack, for the NPCI. It builds on the UID project, Jan Dhan Yojana and mobile services trinity to propagate and develop a cashless, presence-less, paperless and granular consent layer based on UID infrastructure to digitise India.</p>
<p style="text-align: justify;" dir="ltr">A participant pointed out that the idea of India Stack is to use UID as a platform and keep stacking things on it, such that more and more applications are developed. This in turn will help us to move from being a ‘data poor’ country to a ‘data rich’ one. The economic benefits of this data though as evidenced from the TAGUP report - a report about the creation of National Information Utilities to manage the data that is present with the government - is for the corporations and not the common man. The TAGUP report openly talks about privatisation of data.</p>
<h3 id="72">Problems with India Stack</h3>
<p style="text-align: justify;" dir="ltr">The granular consent layer of India Stack hasn’t been developed yet but they have proposed to base it on MIT Media Lab’s OpenPDS system. The idea being that, on the basis of the choices made by the concerned person, access to a person’s personal information may be granted to an agency like a bank. What is more revolutionary is that India Stack might even revoke this access if the concerned person expresses a wish to do so or the surrounding circumstances signal to India Stack that it will be prudent to do so. It should be pointed out that the the technology required for OpenPDS is extremely complex and is not available in India. Moreover, it’s not clear how this system would work. Apart from this, even the paperless layer has its faults and has been criticised by many since its inception, because an actual government signed and stamped paper has been the basis of a claim.. In the paperless system, you are provided a Digilocker in which all your papers are stored electronically, on the basis of your UID number. However, it was brought to light that this doesn’t take into account those who either do not want a Digilocker or UID number or cases where they do not have access to their digital records. How in such cases will people make claims?</p>
<h3 id="73">A Digital Post-Dated Cheque: It’s Ramifications</h3>
<p style="text-align: justify;" dir="ltr">A key change that FinTech apps and the surrounding ecosystem want to make is to create a digital post-dated cheque so as to allow individuals to get loans from their mobiles especially in remote areas. This will potentially cut out the need to construct new banks, thus reducing the capital expenditure , while at the same time allowing the credit services to grow. The direct transfer of money between UID numbers without the involvement of banks is a step to further help this ecosystem grow. Once an individual consents to such a system, however, automatic transfer of money from one’s bank accounts will be affected, regardless of the reason for payment. This is different from auto debt deductions done by banks presently, as in the present system banks have other forms of collateral as well. The automatic deduction now is only affected if these other forms are defaulted upon. There is no knowledge as to whether this consent will be reversible or irreversible. As Jan Dhan Yojana accounts are zero balance accounts, the account holder will be bled dry. The implication of schemes such as “Loan in under 8 minutes” were also discussed. The advantage of such schemes is that transaction costs are reduced.The financial institution can thus grant loans for the minimum amount without any additional enquiries. It was pointed out that this new system is based on living on future income much like the US housing bubble crash. Interestingly, in Public Distribution Systems, biometrics are insisted upon even though it disrupts the system. This can be seen as a part of the larger infrastructure to ensure that digital post-dated cheques become a success.</p>
<h3 id="74">The Role of FinTech Apps</h3>
<p style="text-align: justify;" dir="ltr">FinTech ‘apps’ are being presented with the aim of propagating financial inclusion. The Technology Advisory Group for Unique Projects report stated that as managing such information sources is a big task, just like electricity utilities, a National Information Utilities (NIU) should be set up for data sources. These NIUs as per the report will follow a fee based model where they will be charging for their services for government schemes. The report identified two key NIUs namely the National Payments Corporation of India (NPCI) and the Goods and Services Tax Network (GSTN). The key usage that FinTech applications will serve is credit scoring. The traditional credit scoring data sources only comprised a thin file of records for an individual, but the data that FinTech apps collect - a person’s UID number, mobile number. and bank account number all linked up, allow for a far more comprehensive credit rating. Government departments are willing to share this data with FinTech apps as they are getting analysis in return. Thus, by using UID and the varied data sources that have been linked together by UID, a ‘thick file’ is now being created by FinTech apps. Banking apps have not yet gone down the route of FinTech apps to utilise Big Data for credit scoring purposes.</p>
<p style="text-align: justify;" dir="ltr"> </p>
<p style="text-align: justify;" dir="ltr">The two main problems with such apps is that there is no uniform way of credit scoring. This distorts the rate at which a person has to pay interest. The consent layer adds another layer of complication as refusal to share mobile data with a FinTech app may lead to the app declaring one to be a risky investment thus, subjecting that individual to a higher rate of interest .</p>
<div style="text-align: justify;" dir="ltr"> </div>
<h3 id="75">Regulation of FinTech Apps and the UID Infrastructure</h3>
<p style="text-align: justify;" dir="ltr"> India Stack and the applications that are being built on it, generate a lot of transaction metadata that is very intimate in nature. The privacy aspects of the UID legislation doesn't cover such data. The granular consent layer which has been touted to cover this still has to come into existence. Also, Big Data is based on sharing and linking of data. Here, privacy concerns and Big Data objectives clash. Big Data by its very nature challenges privacy principles like data minimisation and purpose limitation.The need for regulation to cover the various new apps and infrastructure which are being developed was pointed out.</p>
<h2 id="8">Problems with UID</h2>
<p style="text-align: justify;" dir="ltr">It has been observed that any problem present with Aadhaar is usually labelled as a teething problem, it’s claimed that it will be solved in the next 10 years. But, this begs the question - why is the system online right now?</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">Aadhaar is essentially a new data condition and a new exclusion or inclusion criteria. Data exclusion modalities as observed in Rajasthan after the introduction of biometric Point of Service (POS) machines at ration shops was found to be 45% of the population availing PDS services. This number also includes those who were excluded from the database by being included in the wrong dataset. There is no information present to tell us how many actual duplicates and how many genuine ration card holders were weeded out/excluded by POS.</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">It was also mentioned that any attempt to question Aadhaar is considered to be an attempt to go back to the manual system and this binary thinking needs to change. Big Data has the potential to benefit people, as has been evidenced by the scholarship and pension portals. However, Big Data’s problems arise in systems like PDS, where there is centralised exclusion at the level of the cloud. Moreover, the quantity problem present in the PDS and MNREGA systems persists. There is still the possibility of getting lesser grains and salary even with analysis of biometrics, hence proving that there are better technologies to tackle these problems. Presently, the accountability mechanisms are being weakened as the poor don’t know where to go to for redressal. Moreover, the mechanisms to check whether the people excluded are duplicates or not is not there. At the time of UID enrollment, out of 90 crores, 9 crore were rejected. There was no feedback or follow-up mechanism to figure out why are people being rejected. It was just assumed that they might have been duplicates.</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">Another problem is the rolling out of software without checking for inefficiencies or problems at a beta testing phase. The control of developers over this software, is so massive that it can be changed so easily without any accountability.. The decision making components of the software are all proprietary like in the the de-duplication algorithm being used by the UIDAI. Thus, this leads to a loss of accountability because the system itself is in flux, none of it is present in public domain and there are no means to analyse it in a transparent fashion..</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">These schemes are also being pushed through due to database politics. On a field study of NPR of citizens, another Big Data scheme, it was found that you are assumed to be an alien if you did not have the documents to prove that you are a citizen. Hence, unless you fulfill certain conditions of a database, you are excluded and are not eligible for the benefits that being on the database afford you.</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">Why is the private sector pushing for UIDAI and the surrounding ecosystem?</p>
<p style="text-align: justify;" dir="ltr">Financial institutions stand to gain from encouraging the UID as it encourages the credit culture and reduces transaction costs.. Another advantage for the private sector is perhaps the more obvious one, that is allows for efficient marketing of products and services..</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">The above mentioned fears and challenges were actually observed on the ground and the same was shown through the medium of a case study in West Bengal on the smart meters being installed there by the state electricity utility. While the data coming in from these smart meters is being used to ensure that a more efficient system is developed,it is also being used as a surrogate for income mapping on the basis of electricity bills being paid. This helps companies profile neighbourhoods. The technical officer who first receives that data has complete control over it and he can easily misuse the data. This case study again shows that instruments like Aadhaar and India Stack are limited in their application and aren’t the panacea that they are portrayed to be.</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">A participant pointed out that in the light of the above discussions, the aim appears to be to get all kinds of data, through any source, and once you have gotten the UID, you link all of this data to the UID number, and then use it in all the corporate schemes that are being started. Most of the problems associated with Big Data are being described as teething problems. The India Stack and FinTech scheme is coming in when we already know about the problems being faced by UID. The same problems will be faced by India Stack as well.</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">Can you opt out of the Aadhaar system and the surrounding ecosystem?</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">The discussion then turned towards whether there can be voluntary opting out from Aadhaar. It was pointed out that the government has stated that you cannot opt out of Aadhaar. Further, the privacy principles in the UIDAI bill are ambiguously worded where individuals only have recourse for basic things like correction of your personal information. The enforcement mechanism present in the UIDAI Act is also severely deficient. There is no notification procedure if a data breach occurs. . The appellate body ‘Cyber Appellate Tribunal’ has not been set up in three years.</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">CCTNS: Big Data and its Predictive Uses</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">What is Predictive Policing?</p>
<p style="text-align: justify;" dir="ltr">The next big Big Data case study was on the Crime and Criminal Tracking Network & Systems (CCTNS). Originally it was supposed to be a digitisation and interconnection scheme where police records would be digitised and police stations across the length and breadth of the country would be interconnected. But, in the last few years some police departments of states like Chandigarh, Delhi and Jharkhand have mooted the idea of moving on to predictive policing techniques. It envisages the use of existing statistical and actuarial techniques along with many other tropes of data to do so. It works in four ways: 1. By predicting the place and time where crimes might occur; 2. To predict potential future offenders; 3. To create profiles of past crimes in order to predict future crimes; 4. Predicting groups of individuals who are likely to be victims of future crimes.</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">How is Predictive Policing done?</p>
<p style="text-align: justify;" dir="ltr">To achieve this, the following process is followed: 1. Data collection from various sources which includes structured data like FIRs and unstructured data like call detail records, neighbourhood data, crime seasonal patterns etc. 2. Analysis by using theories like the near repeat theory, regression models on the basis of risk factors etc. 3. Intervention</p>
<div style="text-align: justify;" dir="ltr"> </div>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">Flaws in Predictive Policing and questions of bias</p>
<p style="text-align: justify;" dir="ltr">An obvious weak point in the system is that if the initial data going into the system is wrong or biased, the analysis will also be wrong. Efforts are being made to detect such biases. An important way to do so will be by building data collection practices into the system that protect its accuracy. The historical data being entered into the system is carrying on the prejudices inherited from the British Raj and biases based on religion, caste, socio-economic background etc.</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">One participant brought about the issue of data digitization in police stations, and the impact of this haphazard, unreliable data on a Big Data system. This coupled with paucity of data is bound to lead to arbitrary results. An effective example was that of black neighbourhoods in the USA. These are considered problematic and thus they are policed more, leading to a higher crime rate as they are arrested for doing things that white people in an affluent neighbourhood get away with. This in turn further perpetuates the crime rate and it becomes a self-fulfilling prophecy. In India, such a phenomenon might easily develop in the case of migrants, de-notified tribes, Muslims etc. A counter-view on bias and discrimination was offered here. One participant pointed out that problems with haphazard or poor quality of data is not a colossal issue as private companies are willing to fill this void and are actually doing so in exchange for access to this raw data. It was also pointed out how bias by itself is being used as an all encompassing term. There are multiplicities of biases and while analysing the data, care should be taken to keep it in mind that one person’s bias and analysis might and usually does differ from another. Even after a computer has analysed the data, the data still falls into human hands for implementation.</p>
<p style="text-align: justify;" dir="ltr">The issue of such databases being used to target particular communities on the basis of religion, race, caste, ethnicity among other parameters was raised. Questions about control and analysis of data were also discussed, i.e. whether it will be top-down with data analysis being done in state capitals or will this analysis be done at village and thana levels as well too. It was discussed as topointed out how this could play a major role in the success and possible persecutory treatment of citizens, as the policemen at both these levels will have different perceptions of what the data is saying. . It was further pointed out, that at the moment, there’s no clarity on the mode of implementation of Big Data policing systems. Police in the USA have been seen to rely on Big Data so much that they have been seen to become ‘data myopic’. For those who are on the bad side of Big Data, in the Indian context, laws like preventive detention can be heavily misused.There’s a very high chance that predictive policing due to the inherent biases in the system and the prejudices and inefficiency of the legal system will further suppress the already targeted sections of the society. A counterpoint was raised and it was suggested that contrary to our fears, CCTNS might lead to changes in our understanding and help us to overcome longstanding biases.</p>
<p style="text-align: justify;" dir="ltr">Open Knowledge Architecture as a solution to Big Data biases?</p>
<p style="text-align: justify;" dir="ltr">The conference then mulled over the use of ‘Open Knowledge’ architecture to see whether it can provide the solution to rid Big Data of its biases and inaccuracies if enough eyes are there. It was pointed out that Open Knowledge itself can’t provide foolproof protection against these biases as the people who make up the eyes themselves are predominantly male belonging to the affluent sections of the society and they themselves suffer from these biases.</p>
<p style="text-align: justify;" dir="ltr">Who exactly is Big Data supposed to serve?</p>
<p style="text-align: justify;" dir="ltr">The discussion also looked at questions such as who is this data for? Janata Information System (JIS), is a concept developed by MKSS where the data collected and generated by the government is taken to be for the common citizens. For e.g. MNREGA data should be used to serve the purposes of the labourers. The raw data as is available at the moment, usually cannot be used by the common man as it is so vast and full of information that is not useful for them at all. It was pointed out that while using Big Data for policy planning purposes, the actual string of information that turned out to be needed was very little but the task of unravelling this data for civil society purposes is humongous. By presenting the data in the right manner, the individual can be empowered. The importance of data presentation was also flagged. It was agreed upon that the content of the data should be for the labourer and not a MNC, as the MNC has the capability to utilise the raw data on it’s own regardless.</p>
<p style="text-align: justify;" dir="ltr">Concerns about Big Data usage</p>
<ol><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">Participants pointed out that privacy concerns are usually brushed under the table due to a belief that the law is sufficient or that the privacy battle has already been lost. </p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">In the absence of knowledge of domain and context, Big Data analysis is quite limited. Big Data’s accuracy and potential to solve problems needs to be factually backed.</p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">The narrative of Big Data often rests on the assumption that descriptive statistics take over inferential statistics, thus eliminating the need for domain specific knowledge. It is claimed that the data is so big that it will describe everything that we need to know.</p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">Big Data is creating a shift from a deductive model of scientific rigour to an inductive one. In response to this, a participant offered the idea that troves of good data allow us to make informed questions on the basis of which the deductive model will be formed. A hybrid approach combining both deductive and inductive might serve us best.</p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">The need to collect the right data in the correct format, in the right place was also expressed.</p>
</li></ol>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">Potential Research Questions & Participants’ Areas of Research</p>
<p style="text-align: justify;" dir="ltr">Following this discussion, participants brainstormed to come up with potential areas of research and research questions. They have been captured below:</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">Big Data, Aadhaar and India Stack:</p>
<div style="text-align: justify;" dir="ltr"> </div>
<ol><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">Has Aadhaar been able to tackle illegal ways of claiming services or are local negotiations and other methods still prevalent?</p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">Is the consent layer of India Stack being developed in a way that provides an opportunity to the UID user to give informed consent? The OpenPDS and its counterpart in the EU i.e. the My Data Structure were designed for countries with strong privacy laws. Importantly, they were meant for information shared on social media and not for an individual’s health or credit history. India is using it in a completely different sphere without strong data protection laws. What were the granular consent layer structures present in the West designed for and what were they supposed to protect?</p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">The question of ownership of data needs to be studied especially in context of a globalised world where MNCs are collecting copious amounts of data of Indian citizens. What is the interaction of private parties in this regard?</p>
</li></ol>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">Big Data and Predictive Policing:</p>
<div style="text-align: justify;" dir="ltr"> </div>
<ol><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">How are inequalities being created through the Big Data systems? Lessons should be taken from the Western experience with the advent of predictive policing and other big data techniques - they tend to lead to perpetuation of the current biases which are already ingrained in the system.</p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">It was also pointed out how while studying these topics and anything related to technology generally, we become aware of a divide that is present between the computational sciences and social sciences. This divide needs to be erased if Big Data or any kind of data is to be used efficiently. There should be a cross-pollination between different groups of academics. An example of this can be seen to be the ‘computational social sciences departments’ that have been coming up in the last 3-4 years.</p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">Why are so many interim promises made by Big Data failing? A study of this phenomenon needs to be done from a social science perspective. This will allow one to look at it from a different angle.</p>
</li></ol>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">Studying Big Data:</p>
<div style="text-align: justify;" dir="ltr"> </div>
<ol><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">What is the historical context of the terms of reference being used for Big Data? The current Big Data debate in India is based on parameters set by the West. For better understanding of Big Data, it was suggested that P.C. Mahalanobis’ experience while conducting the Indian census, (which was the Big Data of that time) can be looked at to get a historical perspective on Big Data. This comparison might allow us to discover questions that are important in the Indian context. It was also suggested that rather than using ‘Big Data’ as a catchphrase to describe these new technological innovations, we need to be more discerning.</p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">What are the ideological aspects that must be considered while studying Big Data? What does the dialectical promise of technology mean? It was contended that every time there is a shift in technology, the zeitgeist of that period is extremely excited and there are claims that it will solve everything. There’s a need to study this dialectical promise and the social promise surrounding it.</p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">Apart from the legitimate fears that Big Data might lead to exclusion, what are the possibilities in which it improve inclusion too?</p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">The diminishing barrier between the public and private self, which is a tangent to the larger public-private debate was mentioned.</p>
</li><li style="list-style-type: decimal;" dir="ltr">
<p style="text-align: justify;" dir="ltr">How does one distinguish between technology failure and process failure while studying Big Data? </p>
</li></ol>
<div style="text-align: justify;" dir="ltr"> </div>
<div style="text-align: justify;" dir="ltr"> </div>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">Big Data: A Friend?</p>
<p style="text-align: justify;" dir="ltr">In the concluding session, the fact that the Big Data moment cannot be wished away was acknowledged. The use of analytics and predictive modelling by the private sector is now commonplace and India has made a move towards a database state through UID and Digital India. The need for a nuanced debate, that does away with the false equivalence of being either a Big Data enthusiast or a luddite is crucial.</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">A participant offered two approaches to solving a Big Data problem. The first was the Big Data due process framework which states that if a decision has been taken that impacts the rights of a citizen, it needs to be cross examined. The efficacy and practicality of such an approach is still not clear. The second, slightly paternalistic in nature, was the approach where Big Data problems would be solved at the data science level itself. This is much like the affirmative algorithmic approach which says that if in a particular dataset, the data for the minority community is not available then it should be artificially introduced in the dataset. It was also suggested that carefully calibrated free market competition can be used to regulate Big Data. For e.g. a private personal wallet company that charges higher, but does not share your data at all can be an example of such competition. </p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">Another important observation was the need to understand Big Data in a Global South context and account for unique challenges that arise. While the convenience of Big Data is promising, its actual manifestation depends on externalities like connectivity, accurate and adequate data etc that must be studied in the Global South.</p>
<div style="text-align: justify;" dir="ltr"> </div>
<p style="text-align: justify;" dir="ltr">While the promises of Big Data are encouraging, it is also important to examine its impacts and its interaction with people's rights. Regulatory solutions to mitigate the harms of big data while also reaping its benefits need to evolve.</p>
<div style="text-align: justify;" dir="ltr"> </div>
<div style="text-align: justify;" dir="ltr"> </div>
<p><span id="docs-internal-guid-90fa226f-6157-27d9-30cd-050bdc280875"></span></p>
<div style="text-align: justify;" dir="ltr"> </div>
<p>
For more details visit <a href='http://editors.cis-india.org/internet-governance/big-data-in-india-benefits-harms-and-human-rights-a-report'>http://editors.cis-india.org/internet-governance/big-data-in-india-benefits-harms-and-human-rights-a-report</a>
</p>
No publisherVidushi Marda, Akash Deep Singh and Geethanjali JujjavarapuHuman RightsUIDBig DataPrivacyArtificial IntelligenceInternet GovernanceMachine LearningFeaturedDigital IndiaAadhaarInformation TechnologyE-Governance2016-11-18T12:58:19ZBlog EntryA.I. Hype Cycles and Artistic Subversions
http://editors.cis-india.org/raw/ai-hype-cycles-and-artistic-subversions
<b>Gene Kogan will give a talk on "A.I. hype cycles and artistic subversions" on Friday, January 22, 2016 at the Centre for Internet and Society office, 6 pm - 8 pm.</b>
<p> </p>
<p><img src="http://www.genekogan.com/images/style-transfer/ml_egypt_crab_maps.jpg" alt="Gene Kogan - Style Transfer - Mona Lisa" width="800" /></p>
<h6>Mona Lisa restyled by Egyptian hieroglyphs, the Crab Nebula, and Google Maps. <a href="http://www.genekogan.com/works/style-transfer.html">Style Transfer</a>. Gene Kogan.</h6>
<p> </p>
<p style="text-align: justify;">Recent years have seen a resurgence of popular interest in machine learning and artificial intelligence, as emerging methods have set new scientific benchmarks and introduced classes of neural networks capable of imitating human behavior, among other impressive feats. More importantly, the study of these algorithms is rapidly crossing over into mainstream culture and industry as AI applications begin to inhabit more of our daily lives. Numerous initiatives have appeared, attempting to demystify and make these previously obscure research tracks more accessible to the public. Open source software like Torch, Theano, and TensorFlow have equipped amateurs with the same software which is achieving state-of-the-art results in industry and academia.</p>
<p style="text-align: justify;">This talk will examine the most recent wave of artistic projects applying these methods in various cultural contexts, producing troves of machine-hallucinated text, images, sounds, and videos, demonstrating a previously unseen capacity for imitating human style and sensibility. These experimental works attempt to show the capacity of these machines for producing aesthetically meaningful media, yet challenging and subverting them to illuminate their most obscure and counterintuitive properties.</p>
<p>A recent article by the speaker about this: <a href="http://bit.ly/1OhFcQr">From Pixels to Paragraphs: How artistic experiments with deep learning guard us from hype</a>.</p>
<p>Relevant projects by the speaker that will be presented include: <a href="http://bit.ly/1RyUH76">Style Transfer</a>, <a href="http://bit.ly/1QDNxOI">A Book from the Sky 天书</a>, <a href="http://bit.ly/1QDNClo">Learning to Generate Text and Audio</a>, and <a href="http://bit.ly/1QDNG4D">Deepdream Prototypes</a>.</p>
<h2>Gene Kogan</h2>
<p style="text-align: justify;">Gene Kogan is an artist and programmer who is interested in generative systems and applications of emerging technology in artistic and expressive contexts. He writes code for live music, performance, and visual art. He contributes to numerous open-source software projects and frequently gives workshops and demonstrations on topics related to code and art.</p>
<p style="text-align: justify;">He is a contributor to openFrameworks, Processing, and p5.js, an adjunct professor at Bennington College and NYU, a former resident at Eyebeam Art & Technology Center, and a former Fulbright scholar in Bangalore, India, 2012-2013.</p>
<p>
For more details visit <a href='http://editors.cis-india.org/raw/ai-hype-cycles-and-artistic-subversions'>http://editors.cis-india.org/raw/ai-hype-cycles-and-artistic-subversions</a>
</p>
No publishersharathGenerative ArtArtPracticeMachine LearningResearchers at WorkEventArtificial Intelligence2016-01-01T07:52:20ZEvent