Hypothesis musings.

Since the project began, I’ve been thinking about all the issues surrounding our hypothesis, and the kind of things we’ll need to consider as we go through our data collection and analysis.

For anyone who doesn’t know, the project hypothesis states that:

“There is a statistically significant correlation across a number of universities between library activity data and student attainment”

The first obvious thing here is that we realise there are other factors in attainment!  We do know that the library is only one piece in the jigsaw that makes a difference to what kind of grades students achieve.  However, we do feel we’ll find a correlation in there somewhere (ideally a positive one!).  Having thought about it beyond a basic level of “let’s find out”, the more I pondered, the more extra considerations leapt to mind!

Do we need to look at module level or overall degree?  There are all kinds of things that can happen that are module specific, so students may not be required to produce work that would link into library resources, but still need to submit something for marking.  Some modules may be based purely on their own reflection or creativity.  Would those be significant enough to need noting in overall results?  Probably not, but some degrees may have more of these types of modules than others, so could be worth remembering. 

My next thought was how much library resource usage counts as supportive for attainment.  Depending on the course, students may only need a small amount of material to achieve high grades.  Students on health sciences/medicine courses at Huddersfield are asked to work a lot at evidence based assignments, which would mean a lot of searching through university subscribed electronic resources, whereas a student on a history course might prefer to find primary sources outside of our subscriptions. 

On top of these, there all kinds of confounding factors that may play with how we interpret our results:

  • What happens if a student transfers courses or universities, and we can’t identify that?
  • What if teaching facilities in some buildings are poor and have an impact on student learning/grades?
  • Maybe a university has facilities other than the library through the library gates and so skews footfall statistics?
  • How much usage of the library facilities is for socialising rather than studying?
  • Certain groups of students may have an impact on data, such as distance learners and placement students, international students, or students with any personal specific needs.  For example some students may be more likely to use one specific kind of resource a lot out of necessity.  Will they be of a large enough number to skew results?
  • Some student groups are paid to attend courses and may have more incentive to participate in information literacy related elements e.g. nurses, who have information literacy classes with lots of access to e-resources as a compulsory part of their studies.

A key thing emerging here is that lots of resource access doesn’t always mean quality use of materials, critical thinking, good writing skills…  And even after all this we need to think about sample sizes – our samples are self-selected, and involve varying sizes of universities with various access routes to resources.  Will these differences between institutions be a factor as well?

All we can do for now is take note of these and remember them when we start getting data back, but for now I set to thinking about how I’d revise the hypothesis if we could do it again, with a what is admittedly a tiny percentage of these issues considered within it:

“There is a statistically significant correlation between library activity and student attainment at the point of final degree result”

So it considers library usage overall, degree result overall, and a lot of other factors to think about while we work on our data!

3 thoughts on “Hypothesis musings.”

  1. Really interesting stuff. I guess the issue here is how many of these issues and factors can you realistically include within the project timescale and how many will you just have to note but not chase any further. I imagine that the importance of these issues will increase or decrease depending on how you use the analysed data – if it is used for fairly broad resource provision then maybe some of these don’t have a massive importance but if the data is used for more fine grained decisions in the institution then they may start to increase in importance.

    I suppose the experimental nature of these projects does require certain assumptions to be made about the data and the context and it would be useful for projects to state the assumptions they have made. It might be useful to make this a section of the final blog post….

  2. I guess that many of the problems that you are discuss are reduced or eliminated if you look at the course / module level. If a course has low marks for some reason independent of library usage then library usage should still correlate within that course.

    The thing that that worries me about the hypothesis is that there is no distinction between someone who enters the library 10 times in the first (or last) week of the course and someone who visits once a week for ten weeks – and similarly for the other types of data.

    I appreciate that that makes the analysis a far larger job, and may be moving more towards the work that Leeds Met is doing in the StarTrak:NG project

  3. Hi Tom

    One of the things we’re planning to do is to generate reports of averages and p-values of usage/grade for each partner at more granular levels (academic school, course, library location, etc.

    Some of the partners were able to provide the usage data per year (rather than a grand total for 3 years of usage per student), so that might go part way towards the issue of when the usage took place.

    When I took a look at overall averages for Huddersfield’s data, it seemed to indicate that usage increases over time (i.e. Y2 usage is higher than Y1, and Y3 higher than Y2) and that the usage/final grade correlation appears (to a certain extent) in Y1.

    If the project manages to show that Huddersfield’s data is fairly representative of the HE sector, then there’s nothing to stop us doing a deeper look (outside of the actual project) at when the usage actually occurs.

Comments are closed.