Session 5 Reflections: Harnessing the Power of Big Data and the Mundanities of Archival Research

Week 5 was a mini-showcase of the exciting research currently being done by lecturers at Birkbeck. Our first speaker Senior Lecturer Dr Dell Zhang opened his talk, Harnessing the Power of Data, with the quote ‘In God We Trust, All Others Bring Data’ attributed to Dr W. Edwards Deming. In the next hour, Dell proceeded to show everyone present how our daily interactions on the web generate collections of data that can affect in a variety of ways. Closer to home, Dell discussed how research in Computer Science is changing due to the way data can now be collected. Where researchers use to find a theory or model and then conduct the necessary fieldwork to collect data to support or disprove their ideas, there is a trend now for Computer Scientists to start their research using large data sets, whereby by following the data, they are able to extrapolate ideas. In this era of big data, Dell is convinced that more is different, as with enough data, numbers can speak for themselves. He highlights how social scientists use to analyse small scale networks to examine social interaction but with Twitter and Facebook, social connections can easily be observed as these social interactions are recorded. Hence social scientists can determine some models that would not have happen with small data sets.

Dell also highlighted how big data can help solve some real-world problems such as the creation of effective spam filter programmes. He points out that while spam on the web has been increasing, users of Google’s mail programme Gmail report a rate of less than 1% of spam. He argues that this is due to how Gmail has a large access to data via the e-mails their users receive. The sheer number of e-mails produces a large data set that allows for the creation of a spam filter programme that works because the programme is able to calculate the probabilities of key words and filter them out making it an effective programme! As long as there is more data, performance will improve and the very act of engaging on the web generates data, hence the improvement, as Dell points out, is limitless.

After Dell’s talk of technology and data, it felt like a step back in time with our second speaker Dr Jose Bellido a Lecturer in Law with his talk ‘Mundane Research Issues- Notes on Legal Archives and Copyright’ where a researcher usually has to visit an archive in person to find the information one needs. For Jose, his data is not ‘placeless’ but rather linked to a specific time and place and often not yet digitised. Instead of search engines, researchers get indexes or cards. Jose’s research might sound almost archaic but there were clearly some benefits to the lack of ‘technology’ so to speak. He pointed out that while technology and research gave one immediate results, archival research was akin to fishing- you never know what you might get! Misspellings on index cards might lead you onto a different path, or hints dropped by the person in charge of the archive can often open up a new avenue of research. There is an element of chance or luck in this type of research which might not be that easily present if one only worked with large data sets.

As Jose points out, he is more interested in the how data emerges.

 What perhaps was most striking about Jose’s talk was about how decisions would have to be made for data to be collected and if one assumes that a certain type of information is not valuable then there is potential for this information to be lost. Hence, archival research is still important today because what is not recorded can be just as important as what is. Discussions on how a law is enacted might be more important than the enactment of the law itself! Personal archives which contain memorabilia of everyday life can often shed new light on persons or events! This is not to say that Jose’s area of research shuns technology altogether. His research on copyright can be found at and he points how UK Supreme Courts now allow for the recording of up to 20 hours of the sitting of certain cases which can be shown for teaching purposes. On a personal level, being able to take digital images of records so that one could read them at leisure at home was one of the most direct ways in which technology had impacted on Jose’s research. Jose’s talk was particularly useful for students who were worried about the reproduction of images of archives or the reproduction of material from these archives and here Jose was a wealth of information. Drawing on his experience of doing research in Cuba and Argentina, he was able to offer valuable advice on what were the standard problems researchers in this field had to consider. While it might seem that Jose’s talk would have little in connection with Dell’s, it became clear quite quickly that these two methods of conducting research while seemingly eons apart shared a remarkable similarity! Dell had remarked earlier in his talk that data in itself was meaningless, a lot of noise and what was needed about context to make sense of the data and what technology could do today was create algorithms based on key words so as to be able to filter out the ‘noise’ to make sense of the data. Jose says this shares a correlation to archival research where researchers too had to work out what the key words were in archives to find the relevant information, what words to avoid that led to dead ends. What both these methods have in common is that a need for knowledge to make the right choices.

If one of the end goals of conducting research is to add to existing knowledge, it was extremely useful to reflect upon how Dell and Jose are going about using different technologies to produce this knowledge drawn from collection of data or archives.

By Lorraine Lim


