Some methodological musings – wait – come back!

It’s been rather quiet from me on the blog front, so I thought I would pull my head out of the data for a few minutes and write a post. It’s a bit early to produce anything meaningful in terms of results, but I know that Graham was keen that I share some of my methodological stuff with all of you, so you can blame him for what follows!

Now, data can be terrifying. In my previous job in the museums sector I encountered many a hardened professional reduced to quivering jelly when faced with their audience figures. One of the big problems is that, as humans, we usually communicate information by creating narratives, and we’re more used to seeing those in a Word document than an Excel spreadsheet.   But fear not! The stories are there. You just need to find them.

Of course, nothing is that simple. Perhaps you –like me – were instructed by your mother at a tender age that it is not alright to ‘tell stories’, used as a gentle euphemism for barefaced lies. Yes, stories are easy to follow, but sometimes that’s because they represent an over-simplified version of more complex truths. So with this project – as with any data analysis – we need to find the right stories to explain what’s going on in the data.

I’ll be going into this in more detail over the coming weeks, I hope, as the analysis progresses. What I want to do in this post is to explore some of the early things that you can do (and that we have done) to make sure you tell a compelling, but also truthful, story with your data.

1. Make sure you have some questions, and that they’re based on a valid theory. While there’s some merit in just taking a great big dataset and then looking at it to find interesting relationships, this is quite a risky strategy for social research. It’s known as ‘data mining’ and it can turn up all sorts of strange results. One famous example uses international data to show that possessing a TV is associated with living longer. There’s no discernible medical reason that this should be the case, of course, what’s actually going on here is that there’s a third, ‘masking’ variable involved – being rich. If you are (internationally-speaking) rich, you are both more likely to have a TV, and more likely to live longer. But you don’t get that from the data – all it tells you is that TV ownership goes alongside a long life. You need the background theory to understand that the masking variable exists, and to do analysis that incorporates it. This means that the story you tell about the data will not just be true in terms of reporting on relationships that exist, but also in terms of explaining it. (We’ll come back to take a proper look at causality another time.)

In this project, we’ve worked with a wide range of library staff, and other support teams at Huddersfield, to get their take on the interesting questions that we should ask based on their reading and professional experience. Don’t underestimate the power of your own knowledge in formulating your research questions!

2. Understand your research data. It sounds really obvious but this is so important. Make sure you know exactly how the data were collected, as this will have an enormous impact on what they tell you. There’s a great example of leading questioning in Yes, Minister: Bernard the civil servant is shown to be the perfect balanced sample by being simultaneously for and against conscription, depending on how the question’s asked. If there are any big gaps in the data, make sure you understand why they are there, and whether they’re going to affect your analysis in any way.

You might think that leading questions aren’t so important when looking at library usage – and you’d be right – but the method of collection can still affect the findings. We’ve looked at usage of various online resources by the hour, and Dave explained very thoroughly exactly what he meant by that (that the student had used the resource at some point within that hour). If he hadn’t, I might’ve plodded along thinking that the student had used the resource for the full hour, or for the majority of an hour, which would change the findings significantly. We’ve also got some gaps in our UCAS data, which we think might be related to the student’s school, so we need to take that systematic variation into account in the analysis.

I think that’s probably enough from me for now, but I will look forward to coming back at a later stage and regaling you all with more methodological excitement!