The Digital Humanities from Father Busa to Edward Snowden

Posted by Puthiya Purayil Sneha at Sep 04, 2017 10:35 AM |
What do Edward Snowden, the whistle-blower behind the NSA surveillance revelations, and Father Roberto Busa, an Italian Jesuit, who worked for almost his entire life on Saint Thomas Aquinas, have in common? The simple answer would be: the computer. Things however are a bit more complex than that, and the reason for choosing these two people to explain what the Digital Humanities are, is that in some sense they represent the origins and the present consequences of a certain way of thinking about computers. This essay by Dr. Domenico Fiormonte, lecturer in the Sociology of Communication and Culture in the Department of Political Sciences at University Roma Tre, was originally published in the Media Development journal.

 

Cross-posted from Media Development, Vol. LXIV 2/2017. Published on May 13, 2017.


What do Edward Snowden, the whistle-blower behind the NSA surveillance revelations, and Father Roberto Busa, an Italian Jesuit, who worked for almost his entire life on Saint Thomas Aquinas, have in common? The simple answer would be: the computer. Things however are a bit more complex than that, and the reason for choosing these two people to explain what the Digital Humanities are, is that in some sense they represent the origins and the present consequences of a certain way of thinking about computers.

Although it is true that computer science was born from the needs of calculation (i.e. computing), in other cultures and languages the usual term is “informatics”, or the science of information. The difference is not trivial, and in fact the encounter between the computer and words, or rather with language, can be considered a cultural watershed. Father Busa himself was one of the protagonists of this meeting which came about in 1949 when he visited New York to ask Thomas J. Watson Sr, the president of IBM, for permission to use computers to study the vocabulary of Saint Thomas Aquinas (Jones, 2016). That endeavour is considered by many to have signalled the birth of computer-based “Natural Language Processing”, the inter-disciplinary field behind many of the digital tools that we use in our everyday life: from the technologies of T9 on our smartphone to voice recognition and synthesis, etc.

But these tools, although fundamental, are not the most striking (or perhaps disturbing) results of this age of transformation. Through the gesture of entering words in a computer, Busa framed the basis of a new concept of hermeneutics that was no longer based solely on purely subjective interpretation, but also on automatic processing of linguistic data, and hence in some sense “objective”. Busa’s undertaking founded the discipline of Humanities Computing (although years later it was renamed Digital Humanities), but above all it laid the groundwork for a profound epistemological and cultural transformation. And at the heart of this revolution was the “written document”, the text, understood as an alphanumeric sequence. In an effort to best explain this revolution, I will concentrate on one aspect, the representation of the document, and return to the hermeneutical aspects in the final part.

The epistemological revolution of the digital document

My own association with Digital Humanities (DH), as for many humanists of my generation, came from philology and textual criticism. My first foray into electronic textuality was in 1990, when it became clear that the confluence of informatics and the humanities would revive an inherent, almost arcane dualism: in the beginning was the data… But I was unprepared to tackle the conflict between information retrieval and interface, or between a textual paradigm based on the idea of information (text=data) and a vision of the textual document as a stratified historical-material reality, visualized not only as information, but also as an object (or series of objects), to be ultimately used and enjoyed. This dualism certainly did not only come about as a result of the encounter between informatics and text, but what we can say is that the process of digitization from this point on would “enhance” certain characteristics of the document at the expense of others.

The problem of the digital document in fact cannot be understood unless one first understands what digitization is and how it works: that is to say, the process of translating what we who undertake the work call “encoding” or more generally “representation”. The pioneers of informatica umanistica in Italy (Tito Orlandi, Raul Mordenti, Giuseppe Gigliozzi, etc.) taught the students of my generation two key concepts: 1) the passage from the analogue to the digital implies a process that formalizes the object of research (from the single character to the more complex structures of the historical artefact); 2) each act of encoding, or rather each act of representation of the specific “object” via a formal language involves a selection from a set of possibilities and is therefore an interpretative act (Orlandi, 2010).

The fundamental difference is that the human language and its writing systems were always many and various, whereas formal computer languages are based on a codex universalis, an Esperanto derived for the most part from the English language. As George Steiner wrote in After Babel, “the meta-linguistic codes and algorithms of electronic communication are founded on a sub-text, on a linguistic ‘pre-history’, which is fundamentally Anglo-American” (Steiner, 1998: xvii). Digital “standards” always reflect a cultural bias, and the act of encoding is never neutral, but tends to assume (and overlap with) universalizing discourses that on the surface are hard to see.

An important standard for character representation with ASCII, the American Standard Code for Information Interchange created in the 1960s. That technology is continued today by Unicode, an industrial standard, which purports to represent the characters of all written languages. Beside the fact that it is directed by the usual mega-corporations, Google, Apple, IBM, Microsoft, etc., Unicode is underpinned by an alphabet-centric logic that penalizes non-Western systems of writing. Given this weakness, it should come as no surprise that it has attracted criticism on several fronts, including the charge of ethnocentrism (Perri, 2009; Pressman, 2014: 151), and also because it ignores the difficulties faced by languages of low commercial value in their efforts to be properly represented (and therefore at risk of extinction). To paraphrase Alexander Galloway, “technical is always [geo]political” (Galloway, 2004: 243).

Even if our lack of awareness as humanists might have deceived us into thinking that the translation from the analogue to the digital was a neutral and painless process, we would soon have realized that, as with any change of format, digital representation can change and influence both the life of the original object and its digital future. And we would have discovered the “multiple biases” inherent in the digitization process. So in one respect we have entered in a post-Busa phase where interpretation is not something you can have without defining both the object and the source of your knowledge.

Busa never showed much interest in theoretical questions or in the link between hermeneutics and epistemology (and even less between semiotics and politics), or between the interpretation of the object and the nature of its representation. Perhaps this was because the question “What do I want to represent, and how?” would have provoked a series of more disturbing questions: “What is knowledge? Who produces it, how, and for what purpose?” These questions probably would have threatened to paralyze his pragmatic approach. On the other hand, it cannot have been easy to ignore the problem, since many philosophers, starting with Plato when discussing the transition from orality to writing, kept asking questions about the formats and systems of knowledge representation (Stiegler, 2006).

As humanists we then begin to understand that the problems information technology appeared able to resolve, soon created new problems which were not limited to a single discipline, like philology or textual criticism. To ignore the epistemological (and also ethical or political) problems generated by the confluence of the humanities and information science was certainly possible: but at what price? The more pragmatic among us would have been content to use machines for what they could immediately offer: the tremendous possibilities and tools for representing, archiving and automatic analysis of humanistic objects and artefacts. This approach seemed prevalent in the first historical phase of DH, reflected in canonical definitions like “the application of computational methods to humanities research and teaching” or “researching the Humanities through digital perspectives, researching digital technologies from the perspective of the Humanities” [1].

But what are the effects of these methods and technologies? The answer to this question coincides with the new phase that DH is actually in at the moment, a phase that forces us to consider the costs of all of the above, the ethical, social, and political implications of the instruments, resources and infrastructure, and the cultural biases inherent in their conception and design.

The social and political implications of DH

Fr Busa’s “hermeneutic” approach has been the main focus of the past 20 years of DH, while the methodological and epistemological concerns have been pushed to one side. The reason for this is fairly simple. Since the overwhelming majority of evidence on which the memory of people is based (particularly in the West) is the written text, the computer, a manipulator of alphanumeric symbols, has been shown to be a powerful agent of their preservation and management. This need to unravel the concept of the “text as data”, as mentioned above, has pushed aside for the moment the question of interface, that is, ways for the text to be used and read.

The materiality of written documents, given their incredible linguistic and cultural diversity, their visual and pragmatic dimensions, etc. (especially holographs and manuscripts) does not marry all that well with the limited possibilities offered by information science – or at least doesn’t fit with what has been produced by those who have guided its development thus far. Therefore, up until the early 2000s, the Digital Humanities focused especially on the design of tools and resources for the analysis and preservation of written documents. The spread of the Web from the mid-1990s, despite the first rumblings on the theme of user interface development (which Busa always considered to be a minor problem), ended by confirming this tendency.

There was in my view a precise moment when this concept of “text as data” reached a point of crisis, by showing its dark side. As humanists we would probably have preferred to continue our work quietly as if nothing had changed, but at a certain point something monumental happened, an event which has changed our relation with the digital dimension of knowledge, and hence of research. And this moment was the 6th of June 2013, when the Washington Post and the Guardian began publishing the documents supplied by Edward Snowden about mass surveillance by the NSA. The immensity of this event was immediately clear: a document published by the US National Security Agency and its British twin (GCHQ), said that in one month alone over 181 million records had been collected, including metadata and content (text, audio and video [Gellman and Soltani, 2013]).

The news that in July 2016 half of Silicon Valley, from Amazon to Google, had been co-opted by the Pentagon (Collins, 2016), and the dynamics of the last presidential elections in the USA confirmed, that the Net has become the field on which the geopolitical balances of the planet are played out. And at the centre of this “new world” is the idea of the “universal archive” where all data (past, present, and future) are stored. It is here that both the hermeneutical and epistemological questions fall down. In modern times, knowledge and interpretation depended on history, which we conceived as a linear process, i.e. based on space and time. But the dynamics of digital data seem to escape the logics of space and time, because the digital archive is ubiquitous and eternally present.

In my opinion, the heritage of Busa is reflected by the obsession with control (collection) and the analysis (interpretation) of data by government agencies and high-tech multinationals. Both have committed to the “hermeneutic” vision (although of the bare bones variety), or rather to the analysis of huge amounts of our data as the basis of their interpretation of the world. Welcome to the fantastic world of Big Data...

The question is no longer what the document is or how it is represented (an epistemological question) or how it is to be interpreted (a hermeneutical question). Even if the better forces of DH have insisted on this point and on the necessity of proceeding in this order (because interpretation of the object is inseparable from the circumstance of its representation), these “humanistic” scruples appear suddenly irrelevant. The actual question is in fact “who are we really?” Or rather not us, but the creation through our digital footprint of an alter ego that the algorithms of Google or Facebook decree is more “true” than the other (which we mistakenly believe still to exist). But who will be able to decipher or take apart these stories (data + algorithms) which we daily write and re-write? And does it still make sense to investigate the instruments of production and preservation of memories and knowledge when we no longer have any control over them?

Geoffrey Rockwell and I recently tried to analyze a commercial surveillance package, Palantir, from the point of view of DH (Rockwell and Fiormonte, 2017). Palantir scans and combines data from “documents, websites, social media and databases, turning that information into people, places, events, things, displaying those connections on your computer screen, and allowing you to probe and analyze the links between them” (Anyadike, 2016). But these kinds of software can be also seen as story-telling tools, because they allow someone to build stories about us and through us. So there seems to be a “literary” and rhetorical side to surveillance software, which the digital humanist seems particularly well-equipped to analyze. After all, the story of Big Data is also our story. There seems to be an “original sin” present in Big Data, i.e. the information retrieval paradigm that treats stories as data and data as a resource to be mined. And this approach is clearly reflected in Busa’s original idea of computational hermeneutics: digitize your texts, get your data, then build an interpretation upon them.

A posteriori we can ask ourselves what happened on that distant morning in 1949 in the heads of Thomas J. Watson Sr. and Father Busa. Was the founder and owner of IBM conscious of what the vision of Father Busa would lead to? And could the Jesuit father have ever expected that his intuition would change not only our means of reading and interpreting history, but also how we construct it? No one can ever know. But history reaffirms once again the great responsibility of science – in this case the responsibility of the “ignorant” humanities. If anyone believes that the humanities do not have a future, it is good to read again how 70 years ago a meeting between Thomas Aquinas and computers formed the basis of a revolution in digital communication. But from now on, the role and responsibility of the humanist will not only be to preserve and interpret the signs of the past, but to engage critically with, and where necessary unmask, the technological, political and social discourses that are shaping our knowledge, memories, and consciousness.

This article was translated by Desmond Schmidt.

Note

[1] Selected responses to the question “How do you define DH?” from Day of DH 2012. Accessed from http://archive.artsrn.ualberta.ca/Day-of-DH-2012/dh/index.html.

References

Anyadike, Obi (2016). Spies Sans Frontières? How CIA-linked Palantir is gaining ground in the aid industry (and why some humanitarians are worried). IRIN, March 7, 2016. Accessed from https://www.irinnews.org/investigations/2016/03/07/spies-sans-fronti%C3%A8res.

Collins, Terry (2016). Amazon CEO Jeff Bezos joins Pentagon innovation board. CNet, July 28, 2016. Accessed from https://www.cnet.com/news/jeff-bezos-amazon-blue-origin-pentagon-ash-carter-eric-schmidt-google/.

Galloway, Alexander R. (2004). Protocol: How Control Exists after Decentralization. Cambridge (MA), MIT Press.

Gellman, Barton – Soltani, Ashkan (2013). NSA infiltrates links to Yahoo, Google data centers worldwide, Snowden documents say. The Washington Post, October 30, 2013. Accessed from https://www.washingtonpost.com/world/national-security/nsa-infiltrates-links-to-yahoo-google-data-centers-worldwide-snowden-documents-say/2013/10/30/e51d661e-4166-11e3-8b74-d89d714ca4dd_story.html.

Jones, Steven E. (2016). Roberto Busa, S. J., and the Emergence of Humanities Computing. The Priest and the Punched Cards. London, Routledge.

Orlandi, Tito (2010). Informatica testuale. Teoria e prassi. Roma-Bari, Laterza.

Perri, Antonio (2009). Al di là della tecnologia, la scrittura. Il caso Unicode. Annali dell’Università degli Studi Suor Orsola Benincasa, Vol. II, pp. 725-748.

Pressman, Jessica (2014). Digital Modernism: Making It New in New Media. Oxford, Oxford University Press.

Rockwell, Geoffrey and Fiormonte, Domenico (2017). Palantir: Reading the Surveillance Thing. Critical Software Stories as a Way of the Digital Humanities. Paper presented at the AIUCD 2017 Conference, University of La Sapienza, Rome, January 26-28, 2017.

Steiner, George (1998). After Babel. Aspects of Language and Translation. Oxford, Oxford University Press.

Stiegler, Bernard (2006). Anamnesis and Hypomnesis. The Memories of Desire. In Armand, L. and Bradley, A. ed., Technicity. Prague, Litteraria Pragensia, pp. 15-41. Online version. Accessed from http://arsindustrialis.org/anamnesis-and-hypomnesis.

Author

Domenico Fiormonte (PhD University of Edinburgh) is currently a lecturer in the Sociology of Communication and Culture in the Department of Political Sciences at University Roma Tre. In 1996 he created one of the first online resources on textual variation (www.digitalvariants.org). He has edited and co-edited a number of collections of digital humanities texts, and has published books and articles on digital philology, new media writing, text encoding, and cultural criticism of DH. His latest publication is The Digital Humanist. A critical inquiry (Punctum 2015) with Teresa Numerico and Francesca Tomasi. His current research interests are moving towards the creation of new tools and methodologies for promoting interdisciplinary dialogue in the humanities (http://www.newhumanities.org).

 

Meta

Author

Puthiya Purayil Sneha

Sneha is a Programme Manager at CIS, and co-leads the researchers@work programme. She is engaged in a mapping of the emergent field of Digital Humanities in India, and is also interested in questions on the nature of textuality, reading, and writing practices in the digital sphere. She can be reached at sneha[at]cis-india[dot]org.