Can data ever know who we really are?
Please read the full essay on Deep Dives: Can data ever know who we really are?
Zara Rahman: The Engine Room, Website, and Twitter
If I didn’t define myself for myself, I would be crunched into other people’s fantasies for me and eaten alive.
– Audre Lorde
The proliferation of digital data and the technologies that allow us to gather that data can be used in another way too — to allow us to define for ourselves who we are, and what we are.
Amidst a growing political climate of fear, mistrust and competition for resources, activists and advocates working in areas that are stigmatised within their societies often need data to ‘prove’ that what they are working on matters. One way of doing this is by gathering data through crowdsourcing. Crowdsourced data isn’t ‘representative’, as statisticians say, but gathering data through unofficial means can be a valuable asset for advocates. For example, data collating the experiences of women who have reported incidents of sexual violence to the police in India, can then be used to advocate for better police responses, and to inform women of their rights. Deservedly or not, quantifiable data takes precedence over personal histories and lived experience in getting the much-desired currency of attention.
And used right, quantifiable data — whether it’s crowdsourced or not — can also be a powerful tool for advocates. Now, we can use quantifiable data to prove beyond a question of a doubt that disabled people, queer people, people from lower castes, face intersecting discrimination, prejudice, and systemic injustices in their lives. It’s an unnecessary repetition in a way, because anybody from those communities could have told reams upon reams of stories about discrimination — all without any need for counting.
Regardless, to play within this increasingly digitised system, we need to repeat what we’ve been saying in a new, digitally-legible way. And to do that, we need to collect data from people who have often only ever been de-humanised as data subjects.
Artist and educator Mimi Onuoha writes about the challenges that arise while collecting such data, from acknowledging the humans behind that collection to understanding that missing data points might tell just as much of a story as the data that has been collected. She outlines how digital data means that we have to (intentionally or not) make certain choices about what we value. And the collection of this data means making human choices solid, and often (though not always) making these choices illegible to others.
We speak of black boxes when it comes to the mystery choices that algorithms make, but the same could be said of the many human decisions that are made in categorising data too, whether that be choosing to limit the gender drop-down field to just ‘male/female’ as with Fitbits, or a variety of apps incorrectly assuming that all people who menstruate also want to know about their ‘fertile window’. In large systems with many humans and machines at work, we have no way of interrogating why a category was merged or not, of understanding why certain anomalies were ignored rather than incorporated, or of questioning why certain assumptions were made.
The only thing we can do is to acknowledge these limitations, and try to use those very systems to our advantage, building our own alternatives or workarounds, collecting our own data, and using the data that is out there to tell the stories that matter to us.