Analysing Wikipedia: An Introduction

Posted by Kiran Jonnalagadda at Mar 25, 2009 10:05 PM |
Kiran Jonnalagadda and Hans Varghese Matthews introduce their project, aimed at producing tools that will allow anyone to analyse editing behaviour on Wikipedia. This is the first in a series of posts documenting their work.

There used to be a time, only a few years ago, when the typical savvy internet user, in seeking an understanding of some new concept, would look it up on a Google search. Today, for an increasing number of users, the first reference likely to be looked at is Wikipedia. It is also usually the last.

Wikipedia’s prominence has grown phenomenally over the last few years, and that has made it important for anyone seeking acceptance of their version of facts. Vandalism is commonplace. Most of it is quickly removed and cleaned up, but some slips through, and the tools for fighting such vandalism remain relatively behind the curve.

We at the Centre for Internet and Society wondered if there was a way to detect pack editing behaviour, when a group of users edit together to push their agenda, across pages. The tools for fighting one vandal at a time are increasingly improving. Pack editing is harder to deal with. We don’t know if we have a solution, but we did think we should try.

I’m Kiran Jonnalagadda, your collaborator on this blog. I’m working with Hans Varghese Matthews, our resident statistician, who’s attempting to build mathematical models of pack behaviour. I write the code to pull the data from Wikipedia’s edit history that Hans needs and will later implement his algorithms in a set of tools that anyone can use to analyse Wikipedia.

We started a month ago with some initial experiments that I’ll describe in subsequent posts. Do let us know what you’d like to see come out of this project.