The Infrastructure Turn in the Humanities

An extended survey of digital initiatives in arts and humanities practices in India was undertaken during the last year. Provocatively called 'mapping digital humanities in India', this enquiry began with the term 'digital humanities' itself, as a 'found' name for which one needs to excavate some meaning, context, and location in India at the present moment. Instead of importing this term to describe practices taking place in this country - especially when the term itself is relatively unstable and undefined even in the Anglo-American context - what I chose to do was to take a few steps back, and outline a few questions/conflicts that the digital practitioners in arts and humanities disciplines are grappling with. The final report of this study will be published serially. This is the fourth among seven sections.

 

Sections

01. Digital Humanities in India?

02. A Question of Digital Humanities

03. Reading from a Distance – Data as Text

04. The Infrastructure Turn in the Humanities

05. Living in the Archival Moment

06. New Modes and Sites of Humanities Practice

07. Digital Humanities in India – Concluding Thoughts


In an article in the Digital Humanities Quarterly describing the emergence of the term cyberinfrastructure, Patrik Svensson speaks of an ‘infrastructure turn’ in the humanities, pointing towards a seemingly new found interest and investment in resources and tools for humanities research, pedagogy and publication in many universities and other knowledge institutions (Svensson 2011). Though the term has not been significantly used otherwise, it is interesting to note the implications of such a statement in the context of other such important ‘turns’ in the history of ideas, such as the linguistic or cultural turn. Particularly in the predominant debates around digital humanities, which are largely Anglo-American, infrastructure is an important and inherent component of any thinking around this area, as it derives many of its theoretical and practical concerns from a history of humanities computing. A lot of early work in DH was done in in the area of digital archives and knowledge repositories, such as The Walt Whitman Archive, Rossetti and Blake archives (Gold and Groom 2011, Drucker 2011), where digitization and algorithmic querying were important developments in terms of imagining and opening up the archive. From there to seemingly complex projects on data mapping, visualization, distant reading and cultural analytics, which require parsing through a huge corpora of humanities data, the growth of infrastructure has been a key aspect of these developments, although this many not be emphasized in the early literature about the field. The use of computational methods and the move towards the use of big data in the humanities has been an important change in terms of objects of the enquiry and methodology, and infrastructure is an essential condition of both these changes.

Like with other disciplines the nature of infrastructure and resources available to the humanities – in the form of galleries, archives, libraries, museums and now online repositories, language laboratories, and bibliographic, writing and editing tools and software – have also in some manner influenced the nature or scope of questions that could be asked of an object or text. It is therefore useful to explore the influence of infrastructure at a very conceptual level, in terms of what new ways of enquiry have been made possible with digital technologies and the internet. Now with new tools that can parse many pages of text at a go, or an algorithm that can derive patterns from a data set of images, video or other cultural artifacts, the scope of the enquiry seems to have increased exponentially, as much literature around DH suggests (Berry 2011). Indeed this point is also a bone of contention for many traditional humanities scholars, as it not only seems to be a technologically deterministic notion, but also one that takes away from more conventional methods of humanities research, which are based on close reading and interpretation of texts. In the Indian context however, these possibilities still seem distant owing to several gaps in terms of requirements of infrastructure, resources and material. In many institutions, the lack of basic infrastructure and resources in the form of libraries, classroom teaching-learning resources and access to the internet and other digital tools for the humanities continues to remain a problem. Existing institutional infrastructure is lesser that what is required, and mostly outdated.

This conflict over whether new tools and resources for the humanities is taking away or adding to humanities research is better understood in the light of how the concept of infrastructure has been understood, and specifically in the context of communication and research. Brian Larkin (2008) describes infrastructures as “institutionalized networks that facilitate the flow of goods in a wider cultural as well as physical sense”. He talks about both technical (such as transport, telecommunications, urban planning, energy and water) and ‘soft’ infrastructure such as the knowledge of a language, or cultural style and religious learnings. He therefore defines infrastructure as “this totality of both technical and cultural systems that create institutionalized structures whereby goods of all sorts circulate, connecting and binding people into collectivities.” This definition opens out the understanding of the term a little more, for it brings within the ambit different kinds of goods – such as knowledge, and proposes that infrastructure has the power to bind people within collectivities, thus emphasizing both its limitations as well as potentialities.

The notion of infrastructure as not being neutral to culture is further emphasized when Larkin talks about its mediating capacities, brought about by a layering of new technologies over old ones. "Infrastructures…mediate and shape the nature of economic and cultural flows and the fabric of urban life. One powerful articulation of this mediation is the monumental presence of infrastructures themselves" (Ibid.: 6). Thus the understanding of infrastructures as merely a means of the execution of ideas is one of the obstacles in terms of imagining them as more central to the work of the humanities. Often, the notion of infrastructure has been understood in terms of the institutional infrastructure in place, and not in terms of the smaller networks, tools or resources that build it, which are often located at the level of individuals. Ownership is a key aspect of the problem here, because the ownership of such infrastructure is largely with the state or large corporate entities, and not something within the ambit of small and private institutions or even individuals, and this often mandates the manner of their use. Indeed in the case of DH, there are certain kinds of technologies and resources that cannot be replicated easily at all, as such it is something that needs investment from the state and large knowledge institutions such as the university. Another problem, as rightly identified by Svensson is that the imagination of research infrastructures has been primarily in terms of the needs of the natural sciences, as a result of which resources, tools and materials for the humanities often end up being inadequate, in terms of financial and intellectual investment. Thus not only is there a challenge in terms of the availability of infrastructure, but also with respect to the optimum utilization of what is available.

 

Some of the practitioners and scholars interviewed as part of this mapping have also repeatedly brought up a number of concerns about (or the lack of) infrastructure they have had to use, modify and develop as part of their projects and research. Dr. Indira Chowdhury, historian and Founder-Director of the Centre for Public History (CPH) at the Srishti School of Art, Design and Technology, Bangalore finds it rather ironic that a city like Bangalore, with so much infrastructure at its disposal has such little thinking in the humanities. There are of course several reasons for this, she says, and in many places infrastructure development is restricted for certain reasons, like for example in Kashmir, where the use of internet and mobile phones is regulated strictly due to security concerns. The key question of course is to have more of a dialogue between places to ensure that they are not functioning in isolation. She also emphasizes that the problems are also at a more basic level, like with transcription for example [1]. The advent of the digital has brought with it several new possibilities, but she also talks about the many misconceptions that seem to be prevalent with regard to the digital, particularly in terms of preservation and storage capacity. The question of format is of great importance and a determining factor in much of research that mobilizes digital technologies. As part of her work on archiving oral histories, she has often had to emphasize that there are specific formats for a digital oral archive. As she says:

You should not switch to say MP3 just because it’s cheaper, more convenient and a lighter file. I often have people arguing that I just bought a recorder, it gives me a clear recording [in the MP3 format] etc. If you were to archive that file you would find that within a few years you begin to lose data on that file. The digital archive has also made people think a lot more about what they are preserving, in what format. These are things you then teach yourself, you do not archive in certain formats, or rely on an archive of MP3 files, because every time you copy them onto something it would have lost a little bit of its description. So these are things that make the historian more oriented, you think a lot more about what you are doing.

She therefore warns against these presumptions that a digital archive will resolve completely problems of space and preservation, as a change in format can easily render your data inaccessible and essentially useless. The idea of ‘loss of data’ and lack of space is something easily missed, as there a notion of the digital being an endless space, but that too comes at a cost. As Jonathan Sterne (2013) explains in his work on the MP3 as a cultural artifactiv, it is a format that works through compression and elimination of excess sound, which eventually greatly affects the quality of the sound object itself. The notion of the digital rendering a certain quality of sound, and by implication generating a ‘better’ digital artifact itself, is therefore highly debatable.

There are other considerations to bear in mind as well. As Padmini Ray Murray, another faculty member at the CPH points out, the context of such work in the global south is very different, and lack of good infrastructure is definitely one of the major problems. There are issues of bandwidth, problems such as surveillance, and issues with regulation of internet access, now the issue of network neutrality and so on, all of which have implications for possible digital humanities work and specifically work on digital archives. A significant challenge she sees is that we don't have mechanisms to translate between/ from Indian languages. She says that:

It would be amazing to have an archive metadata tool that can work with different Indian languages which at the moment is an impossibility. This is where a place like Bangalore comes into the picture... We need to pull on resources that are being pioneered in places like the IITs, or institutions here working with natural language processing...technologies that we cannot in a humanities context create, but pull those in to use them for humanities research. But the questions that we are asking are necessarily quite different, from what we have in the West.

The problem with Indian languages brings out the problems that are specific to the global south and therefore the infrastructure needs of humanities research work. Padmini Murray mentions Bichitra, the online variorum of the works of Rabindranath Tagore developed by the School of Cultural Texts and Records at Jadavpur University as an effective illustration of the challenges faced by researchers working in languages other than English. She explains “The very level of creating the code for Bichitra was different, because it had to be done from scratch. Finding a set of reliable Bangla characters is difficult because the ligatures get mixed up, so they created a character set from scratch to create Bichitra, and for Prabhed [the collation software] which works within it.” The problem of a lack of standardization for Indic language inputs is therefore an immediate practical concern for archival work in different languages in India [2].

Indiancine.ma [3], an online archive of Indian film, has similarly been experimenting with different ways of reading and annotating film text, with a focus right now on films that are out of copyright. It uses an open-source platform named Pandor/a [4] for media archives, which helps to organise and manage large, decentralized collections of video, to collaboratively create metadata and time-based annotations, and to archive as a desktop-class web application. The editing tool enables a user to pause, cut and annotate a particular scene or sequence in the film according to a time code, thus creating enormous new possibilities in terms of how we engage with the film text at several levels. The different ways of organising content through different filters also helps map content in unique ways and read them. According to Jan Gerber and Sebastian Lutgert, who are part of the team that developed the archive and its predecessor Pad.ma [5], Indiancine.ma is a work in progress, and it will always be, so as to allow new opportunities to present themselves with every change in the software and tools being used. They are particular about the archive being open to a variety of users and uses – that is, it is not only a tool or space of publication for humanities researchers, but is also a software project, a resource for a film fan club, and many other things as it is open to interpretation. It is meant for people to build together and have conversations across domains and disciplines. In their work with people from both the humanities and sciences, they do see a void or gap between domains, and reiterate that it is very difficult for people to have a conversation across their disciplinary moorings. Infrastructure development has also become divided across these lines, and suffers from a kind of tunnel vision which often prevents it from being developed in response to the needs of the communities it is meant to address. As Sebastian recollects the experience of creating Pad.ma, a similar online video archive using the same platform, Pandor/a, he speaks of collaborating with people from a non-technology background, at the artists collective CAMP in Mumbai [6], and how the lack of a hierarchy between technologists and non-technologists only contributed to making these projects better. A lot of the early software projects in India suffered due to this distance between people from technology and non-technology backgrounds, and the lack of a common language for them to communicate. Both Sebastian and Jan themselves come with training and experience in diverse areas, ranging from philosophy and visual arts to software development, and believe that their contribution to this archive is more conceptual than technological. They also see the Free and Open Source Software (FOSS) culture, then a rather incipient movement in India when they had just begun work on these projects, as one that can foster more conversations and collaborative work in technology and research in India. When they had started out of course, it was very difficult to convince people to use free and open source software, or even get filmmakers to release their footage for an open access platform like Pad.ma. CAMP was one of the few spaces then that had this open source culture, and it encouraged people to collaborate extensively, across areas of expertise. As Sebastian says “You deal with a relatively complex informatics system, but you are fully aware that you can modify and change things, and deal with them in a transparent way, which is great.” Both claim that nobody owns Pad.ma or Indiancine.ma, but everybody looks after it in a way, because they all use it differently depending on their interests, and this nurtures and builds the platform in different ways. The availability of this somewhat outside/alternate space for collaboration, and working within the open source context has been instrumental in the growth of these two online open access archives.

The computational aspects of Pad.ma and Indiancine.ma, and even Bichitra to some extent is may be something to look forward to for researchers interested in exploring the possibilities of such research with these platforms. Given that both are essentially large corpora of material, introducing new algorithmic tools to work with them is not a distant possibility, something that has also been the core of a lot of DH work in the Anglo-American context. Jan and Sebastian have tried this already with one of their earlier projects, 0xdb [7], which is another online archive of cinema, by running a color recognition algorithm on it. There is an instance of face detection and speech recognition software that could be run on this platform, with interesting results. The existing filters on Indiacine.ma also make it possible to search for images or sequences based on colour and object recognition. For instance, an interesting experiment is to search for ‘telephone’ in the archive, which pulls up images containing telephones from across the entire corpus, outlining an interesting trajectory of the use of the instrument. While helpful in terms of querying and searching over a large corpus, they also emphasize the need to be able to make sense of it in a meaningful way. As Jan says “Most of this software is developed really as a means of control, in the area of surveillance etc., and not for exploring; it is more of a content identifying tool rather than to discover things. Clustering or referencing credits are other possibilities, but its more statistical analysis of the footage; are they really adding anything qualitative to cinema studies is still an open question”. Given this disjuncture in what these tools are developed for and how they are finally used, a point of concern is whether the research questions are also driven by the possibilities and limitations of the software itself. While that remains a broader question, Sebastian feels that more than a software, this is a new digital eco-system itself, and using these platforms in different ways, in fact even beyond what they were imagined for, will drive the technology in new directions. The limitation of computational tools as he sees now is really the speed, and given the expenses involved, they may not be feasible to implement and expect results anytime soon.

Both the above platforms demonstrate a certain ability to read texts both closely, as well as from a distance through the use of algorithmic tools, thus demonstrating the possibilities of analysis afforded by the infrastructure it has been built with. More importantly, they also highlight the limits of such tools and resources due to several challenges posed by the material itself. In the case of Bichitra, the problems of developing a code for Bengali characters has put forth a number of technological challenges; a pointer towards one among many problems for archiving materials in Indian languages. Indiancine.ma and Pad.ma are more symptomatic of the context in which new technologies can develop today given the support and space for collaboration and conversations across domains of expertise. The problems of format and technological obsolescence brought up by scholars at CPH is an important one; while colluding with proprietary software is inevitable in some cases, as suggested by the practitioners and researchers behind these platforms, keeping back-ups of material and being able to migrate out of a digital platform at any given point is also extremely essential. Such flexibility of material, and immense interoperability – across domains, formats and social-cultural contexts including language is something that researchers in DH, or for that matter in any field that actively engages with the internet and digital technologies would look for in the infrastructure that they build for research, scholarship and pedagogy. Infrastructure continues to remain a critical aspect knowledge production and dissemination, and it is imperative now more than ever, that it is addressed at the conceptual level of any research intervention involving digital technologies and knowledge production.

 

Notes

[1] See section on Archives for a more detailed discussion on this issue: http://cis-india.org/raw/living-in-the-archival-moment.

[2] See the section on Reading from a Distance – Data as Text for more on this: http://cis-india.org/raw/reading-from-a-distance-data-as-text.

[3] See: http://indiancine.ma/

[4] See: https://pan.do/ra

[5] See: http://pad.ma/

[6] See: http://studio.camp/

[7] See: https://0xdb.org/

 

References

Berry, D.M. "The Computational Turn", Culture Machine. Vol 12, 2011. http://www.culturemachine.net/index.php/cm/article/viewArticle/440.

Drucker, Johanna, "Humanistic Theory and Digital Scholarship" In Debates in the Digital Humanities. Minneapolis: University of Minnesota Press, 2012, http://dhdebates.gc.cuny.edu/debates/text/34.

Gold, Matthew K. and Jim Groom. "Looking for Whitman: A Grand, Aggregated Experiment". In Debates in the Digital Humanities. Minneapolis: University of Minnesota Press, 2012, http://dhdebates.gc.cuny.edu/debates/text/5.

Larkin, Brian. "Introduction". In Signal and Noise: Media, Infrastructure and Urban Culture in Nigeria. London: Duke University Press, 2008

Sterne, Jonathan, 'The MP3 as Cultural Artifact,' New Media and Society. Vol. 18(5):825–842, 2006

Svensson, Partrik, "From Optical Fibre to Conceptual Cyberinfrastructure" In' Digital Humanities Quarterly, Vol.5, No.1, 2011. http://www.digitalhumanities.org/dhq/vol/5/1/000090/000090.html.

 

Author

Puthiya Purayil Sneha

Sneha is a Programme Manager at CIS, and co-leads the researchers@work programme. She is engaged in a mapping of the emergent field of Digital Humanities in India, and is also interested in questions on the nature of textuality, reading, and writing practices in the digital sphere. She can be reached at sneha[at]cis-india[dot]org.