Category Archives: Wins and Fails

BONUS findings post! Correlations


Hello! You’ve changed the format of the blogs, haven’t you? Yes, we like to mix it up a bit. We thought this might be fun, and not at all derivative.

What are these bonus findings then? Well, it all stems from the fact that this time round we have been given students’ final grades as a percentage, rather than a class. Continuous rather than categorical data. This opens up a whole new world of possibilities in terms of identifying a relationship between usage and grades.

Wait, didn’t you already prove that in Phase 1? We certainly did.

So why are you doing it again? Well, to not-quite-quote a famous mountaineer – because we can. It’s important to be clear that we’re not trying to ‘prove’ or ‘disprove’ results from the previous phase. Those stand alone. We’re simply taking advantage of the possibilities offered by the new data.

And those possibilities are…? Remember Spearman’s correlation coefficient from the last post? Well, we can use that again. As you’ll remember from earlier posts, it’s best to keep continuous data continuous if you can. The first round of the project gave librarians with percentage grades – continuous data – a methodology which required them to convert said grades into classes – categorical data. So we’re outlining this technique for their benefit – it’ll save time AND it’s better!

But if you’ve only got the class-based data Not a problem! Use the old technique, which is designed for class-based data. This is just about giving people options so that they can choose whatever fits their data best.

Right. Got it. So, what did you find? This might be where you have to take a bit of a back seat, my inquisitive friend.

In fact, we found  absolutely nothing to surprise us. The findings echo everything we established in the first phase, and the additional work we’ve done with extra variables in this phase. Figure 1 shows the effect sizes and significance levels for each variable.

As usual, I’ve only reported the statistically significant ones, and they are exactly the same as the ones that were statistically significant in our previous tests. You can see that, again, we’ve found a slight negative correlation between the percentage of e-resource use which happens overnight and the final grade. Once again, I’m inclined to dismiss this as a funny fluke within the data, rather than an indication that overnight usage will improve your grade.

So nothing new to report? Not really. Just a new method (outlined in the toolkit) for those librarians who want to take advantage of their continuous datasets.

A spot of cross fertilization!

We’ve spent an interesting week talking to other JISC projects.

We’ll be working very closely with the Copac Activity Data Project over the next 6 months and had a good meeting with them on Tuesday.

CopacAD will conduct primary research to investigate the following additional use cases:

  • an undergraduate from a teaching and learning institution searching for course related materials
  • academics/teachers using the recommender to support the development of course reading lists
  • librarians using the recommendations to support academics/lecturers and collections development

Huddersfield will be one of the libraries providing data and we’ll also be participating in the focus groups.

We are both undertaking work packages around business case feasibility studies and hope to pool our activity by sending out a joint questionnaire later in the project. We will also be participating in a RLUK/SCONUL workshop in April.

Yesterday the project happened upon another JISC project at Huddersfield, the JISC EBEAM Project.

EBEAM will evaluate the impact of e-assessment and feedback on student satisfaction, retention, progression and attainment as well as on institutional efficiency.

EBEAM is looking at GradeMark and we think there is a real opportunity to link this into the LIDP project in the future.

Watch this space for more developments with both projects over the coming months.

The Final Blog Post

It has been a short but extremely productive 6 months for the Library Impact Data Project Team. Before we report on what we have done and look to the future, we have to say a huge thank you to our partners. We thought we would be taking a lot on at the start of the project in getting eight universities to partner in a six month project; however, it has all gone extremely smoothly and as always everyone has put in far more effort and work than originally agreed. So thanks go to all the partners, in particular:

Phil Adams, Leo Appleton, Iain Baird, Polly Dawes, Regina Ferguson, Pia Krogh, Marie Letzgus, Dominic Marsh, Habby Matharoo, Kate Newell, Sarah Robbins, Paul Stainthorp

Also to Dave Pattern and Bryony Ramsden at Huddersfield.

So did we do what we said we would do

Is there is a statistically significant correlation across a number of universities between library activity data and student attainment?

There answer is a YES!

There is statistically significant relationship between both book loans and e-resources use and student attainment. And this is true across all of the universities in the study that provided data in these areas. In some cases this was more significant than in others, but our statistical testing shows that you can believe what you see when you look at our graphs and charts!

Where we didn’t find a statistical significance was in entries to the library, although it looks like there is a difference between students with a 1st and 3rd, there is not an overall significance. This is not surprising as many of us have group study facilities, lecture theatres, cafes and student services in the library. Therefore a student is as just likely to be entering the library for the above reasons than for studying purposes.

We want to stress here again that we realise THIS IS NOT A CAUSAL RELATIONSHIP!  Other factors make a difference to student achievement, and there are always exceptions to the rule, but we have been able to link use of library resources to academic achievement.

So what is our output?

Firstly we have provided all the partners in the project with short library director reports and are in the process of sending out longer in-depth reports. Regrettably, due to the nature of the content of these reports, we cannot share this data; however, we are in the process of anonymising partners graphs in order to release charts of averaged results for general consumption

Furthermore we are also planning to release the raw data from each partner for others to examine. Data will be released on an Open Data licence at

Finally, we have been astonished by how much interest there has been in our project. To date we have two articles ready for publication imminently and have another 2 in the pipeline. In addition by the end of October we will have delivered 11 conference papers on the project. All articles and conference presentations are accessibly at:

Next steps

Although this project has had a finite goal in proving or disproving the hypothesis, we would now like to go back to the original project which provided the inspiration. This was to seek to engage low/non users of library resources and to raise student achievement by increasing the use of library resources.
This has certainly been a popular theme in questions at the SCONUL and LIBER conferences, so we feel there is a lot of interest in this in the library community. Some of these ideas have also been discussed at the recent Business Librarians Association Conference

There are a number of ways of doing this, some based on business intelligence and others based on targeting staffing resources. However, we firmly believe that although there is a business intelligence string to what we would like to take forward, the real benefits will be achieved by actively engaging with the students to improve their experience. We think this could be covered in a number of ways.

  • Gender and socio-economic background? This came out in questions from library directors at SCONUL and LIBER. We need to re-visit the data to see whether there are any effects of gender, nationality (UK, other European and international could certainly be investigated) and socio-economic background in use and attainment.
  • We need to look into what types of data are needed by library directors, e.g. for the scenario ‘if budget cuts result in less resources, does attainment fall’? The Balanced Scorecard approach could be used for this?
  • We are keen to see if we add value as a library through better use of resources and we have thought of a number of possible scenarios in which we would like to investigate further:
    • Does a student who comes in with high grades leave with high grades? If so why? What do they use that makes them so successful?
    • What if a student comes in with lower grades but achieves a higher grade on graduation after using library resources? What did they do to show this improvement?
    • Quite often students who look to be heading for a 2nd drop to a 3rd in the final part of their course, why is this so?
    • What about high achievers that don’t use our resources? What are they doing in order to be successful and should we be adopting what they do in our resources/literacy skills sessions?
  • We have not investigated VLE use, and it would be interesting to see if this had an effect
  • We have set up meetings with the University of Wollongong (Australia) and Mary Ellen Davis (executive director of ACRL) to discuss the project further. In addition we have had interest from the Netherlands and Denmark for future work surrounding the improvement of student attainment through increased use of resources

In respect to targeting non/low users we would like to achieve the following:

  • Find out what students on selected ‘non-low use’ courses think to understand why students do not engage
  • To check the amount and type of contact subject teams have had with the specific courses to compare library hours to attainment (poor attainment does not reflect negatively on the library support!)
  • Use data already available to see if there is correlation across all years of the courses. We have some interesting data on course year, some courses have no correlation in year one with final grade, but others do. By delving deeper into this we could target our staffing resources more effectively to help students at the point of demand.
    • To target staffing resources
  • Begin profiling by looking at reading lists
    • To target resource allocation
    • Does use of resources + wider reading lead to better attainment – indeed, is this what high achievers actually do?
  • To flesh out themes from the focus groups to identify areas for improvement
    • To target promotion
    • Tutor awareness
    • Inductions etc.
  • Look for a connection between selected courses and internal survey results/NSS results
  • Create a baseline questionnaire or exercise for new students to establish level of info literacy skills
    • Net Generation students tend to overestimate their own skills and then demonstrate poor critical analysis once they get onto resources.
    • Use to inform use of web 2.0 technologies on different cohorts, e.g. health vs. computing
  • Set up new longitudinal focus groups or re-interview groups from last year to check progress of project
  • Use data collected to make informed decisions on stock relocation and use of space
  • Refine data collected and impact of targeted help
  • Use this information to create a toolkit which will offer best practice to a given profile
    • E.g. scenario based

Ultimately our goal will be to help increase student engagement with the library and its resources, which as we can now prove, leads to better attainment. This work would also have an impact on library resources, by helping to target our precious staff resources in the right place at the right time and to make sure that we are spending limited funds on the resources most needed to help improve student attainment.

How can others benefit?

There has been a lot of interest from other universities throughout the project. Some universities may want to take our research as proof in itself and just look at their own data; we have provided instructions on how to do this at We will also make available the recipes written with the Synthesis project in the documentation area of the blog, we will be adding specific recipes for different library management systems in the coming weeks:

For those libraries that want to do their own statistical analysis, this was a was a complex issue for the project, particularly given the nature of the data we could obtain vs. the nature of the data required to specifically find correlations. As a result, we used the Kruskal Wallis (KW) test, designed to measure whether there are differences between groups of non-normally distributed data. To confirm non-normal distribution, a Kolmogorov-Smirnov test was run. KW unfortunately does not tell us where differences are, the Mann Whitney test was used on specific couplings of degree results, selected based on visual data represented in boxplot graphs. The number of Mann Whitney tests have to be limited as the more tests conducted, the higher the significance value required, so we limited them to three (at a required significance value of 0.0167 (5% divided by 3)). Once Mann Whitney tests had been conducted, effect size of the difference was calculated. All tests other than effect size were run in PASW 18; effect size was calculated manually. It should be noted that we are aware the size of the samples we are dealing with could have indicated relationships where they do not exist, but we feel our visual data demonstrates relationships that are confirmed by the analytics, and thus that we have a stable conclusion in our discarding of the null hypothesis that there is no relationship between library use and degree result.

Full instructions of how the tests were run will first be made available to partner institutions and disseminated publicly through a toolkit in July/August

Lessons we learned during the project

The three major lessons learned were:

Forward planning for the retention of data. Make sure all your internal systems and people are communicating with each other. Do not delete data without first checking that other parts of the University require the data. Often this appears to be based on arbitrary decisions and not on institutional policy. You can only work with what you’re able to get!

Beware e-resources data. We always made it clear that the data we were collecting for e-resource use was questionable, during the project we have found that much of this data is not collected in the same way across an institution, let alone 8! Athens, Shibboleth and EZProxy data may all be handled differently – some may not be collected at all. If others find that there is no significance between e-resources data and attainment, they should dig deeper into their data before accepting the outcome.

Legal issues. For more details on this lesson, see our earlier blog on the legal stuff

Final thoughts

Although this post is labelled the final blog post, we will be back!

We are adding open data in the next few weeks and during August we will be blogging about the themes that have been brought out in the focus groups.

The intention is then to use this blog to talk about specific issues we come across with data etc. as we carry our findings forward. At our recent final project meeting, it was agreed that all 8 partners would continue to do this via the blog.

Finally a huge thank you to Andy McGregor for his support as Programme Manager and to the JISC for funding us.

Some thoughts from Lincoln

Thanks to Paul Stainthorp at the University of Lincoln for allowing us to cut and paste this blog post. You can see the original at:

I submitted Lincoln’s data on 13 June. It consists of fully anonymised entries for 4,268 students who graduated from the University of Lincoln with a named award, at all levels of study, at the end of the academic year 2009/10 – along with a selection of their library activity over three* years (2007/08, 2008/09, 2009/10).

The library activity data represents:

  1. The number of library items (book loans etc.) issued to each student in each of the three years; taken from the circ_tran (“circulation transactions”, presumably) table within our SirsiDynix Horizon Library Management System (LMS). We also needed a copy of Horizon’s borrower table to associate each transaction with an identifiable student.
  2. The number of times each student visited our main GCW University Library, using their student ID card to pass through the Library’s access control gates in each of the three* years; taken directly from our ‘Sentry’ access control/turnstile system. These data apply only to the main GCW University Library: there is no access control at the University of Lincoln’s other four campus libraries, so many students have ’0′ for these data. Thanks are due to my colleague Dave Masterson from the Hull Campus Library, who came in early one day, well before any students arrived, in order to break in to the Sentry system and extract this data!
  3. The number of times each student was authenticated against an electronic resource via AthensDA; taken from our Portal server access logs. Although by no means all of our e-resources go via Athens, we’re relying on it as a sort of proxy for e-resource usage more generally. Thanks to Tim Simmonds of the Online Services Team (ICT) for recovering these logs from the UL data archive.

I had also hoped to provide numbers of PC/network logins for the same students for the same three years (as Huddersfield themselves have done), but this proved impossible. We do have network login data from 2007-, but while we can associate logins with PCs in the Library for our current PCs, we can’t say with any confidence whether a login to the network in 2007-2010 occurred within the Library or elsewhere: PCs have just been moved around too much in the last four years.

Student data itself—including the ‘primary key’ of the student account ID—was kindly supplied by our Registry department from the University’s QLS student records management system.

Once we’d gathered all these various datasets together, I prevailed upon Alex Bilbie to collate them into one huge .csv file: this he did by knocking up a quick SQL database on his laptop (he’s that kind of developer), rather than the laborious Excel-heavy approach using nested COUNTIF statements which would have been my solution. (I did have a go at this method—it clearly worked well for at least one of the other LIDP partners—but it my PC nearly melted under the strain.)

The final .csv data has gone to Huddersfield for analysis and a copy is lodged in our Repository for safe keeping. Once the agreement has been made to release the LIDP data under an open licence, I’ll make the Repository copy publicly accessible.

*N.B. In the end, there was no visitor data for the year 2007/08: the access control / visitor data for that year was missing for almost all students. This may correspond to a re-issuing of library access cards for all users around that time, or the data may be missing for some other reason.

Good news everybody…

We are very pleased to report that we have now received all of the data from our partner organisations and have processed all but two already!

Early results are looking positive and our next step is to report back with a brief analysis to each institution. We are planning to give them our data and a general set of data so that they can compare and contrast. There have been some issues with the data, some of which has been described in previous blogs, however, we are confident we have enough to prove the hypothesis one way or another!

In our final project meeting in July we hope to make a decision on what form the data will take when released under an Open Data Commons Licence. If all the partners agree, we will release the data individually; otherwise we will release the general set for other to analyse further.

Initial hurdles – the LJMU experience

Energised by the initial project team meeting, the LDIP team at LJMU set about gathering the required data in order to make our contribution to the project. Having already had a few discussions we were fairly confident that we would be able to gather the required data. We had access to student records and numbers, Athens data, library usage data from Aleph and we were aware that our security system (gate entry into the library) kept a historic record of each individual’s entry and exit into the buildings which are serviced through the swipe card entry system. We just needed to pull all this together through the unique student number.

Getting this particular bit of data from our Acadmic Planning department was straightforward. An anonymised list of all 2010 graduating students, along with their programme of study and degree classification, and most importantly their unique id number was duely provided.

Hurdle one – Our security system does indeed log and track student movement through the use of the swipe card entry system, but we are unable to get the systemt to report on this. All data from the system is archived by the system supplier and is subsequently not readily available to us. This means that entry into the Learning Resource Centres is not going to be something we can report upon on this occassion.

Hurdle two – Our Network team systematically delete student network accounts upon graduation, which means the record that links an individual’s unique student ID number, Athens account, security number and Library barcode is not available for the students whose library usage we wished to analyse!

There were about 4,500 students who graduated from LJMU in 2010 with undergraduate degrees, but unfortunately, by the time I got to speak to the network manager, 3,000 of these had been deleted, as is our institutional practice and poilicy.

The upshot of all this is that we are only going to be able to provide data for a third of the potential students that we could have provided data for if we had thought to ask these questions earlier on. But at least we are still able to contribute.

Focus Groups – I am hoping that the organisation and co-ordination of some student focus groups will be more fruitful, but early indicators suggest that the timing of this is not particularly good as we are now in a reading week which will be followed by end of semster exams and coursework submissions, along with an Easter Bank Holiday weekend and Royal Wedding to be squeezed in. In effect, this is the busiest time of the year for our students. However, we have agreat relationship with our student union and they are normally very helpful and responsive so I am hoping we will have something organised very soon.

What would we do differently? – the lessons learnt in this instance are to do with internal partnerships and communication. When first approached about the project we thought that we had asked the right questions of the right people within the University. However, it is obvious to us now that we should have made sure that we discussed our plans in more detail with the Head of Networks and the Head of Security as they are our means of access to two of the key systems that we require in order for us to obtain the required data. Discussions with key stakeholders are of the utmost importance as they highlight local practices and procedures as well as potential difficulties with systems and contracts (as is the case with our security system)

On a positive note all  our stakeholders our excited to be involved in the project and do wish that we could provide more data. Our networks manager has already indicated that he would be happy to delay future network account deletions if we wanted to obtain similar data for our 2011 graduates.

To sum up, an interesting couple of weeks at LJMU in our quest to get the LIDP data, and I hope that this post brings with it a few words to the wise……

Hypothesis musings.

Since the project began, I’ve been thinking about all the issues surrounding our hypothesis, and the kind of things we’ll need to consider as we go through our data collection and analysis.

For anyone who doesn’t know, the project hypothesis states that:

“There is a statistically significant correlation across a number of universities between library activity data and student attainment”

The first obvious thing here is that we realise there are other factors in attainment!  We do know that the library is only one piece in the jigsaw that makes a difference to what kind of grades students achieve.  However, we do feel we’ll find a correlation in there somewhere (ideally a positive one!).  Having thought about it beyond a basic level of “let’s find out”, the more I pondered, the more extra considerations leapt to mind!

Do we need to look at module level or overall degree?  There are all kinds of things that can happen that are module specific, so students may not be required to produce work that would link into library resources, but still need to submit something for marking.  Some modules may be based purely on their own reflection or creativity.  Would those be significant enough to need noting in overall results?  Probably not, but some degrees may have more of these types of modules than others, so could be worth remembering. 

My next thought was how much library resource usage counts as supportive for attainment.  Depending on the course, students may only need a small amount of material to achieve high grades.  Students on health sciences/medicine courses at Huddersfield are asked to work a lot at evidence based assignments, which would mean a lot of searching through university subscribed electronic resources, whereas a student on a history course might prefer to find primary sources outside of our subscriptions. 

On top of these, there all kinds of confounding factors that may play with how we interpret our results:

  • What happens if a student transfers courses or universities, and we can’t identify that?
  • What if teaching facilities in some buildings are poor and have an impact on student learning/grades?
  • Maybe a university has facilities other than the library through the library gates and so skews footfall statistics?
  • How much usage of the library facilities is for socialising rather than studying?
  • Certain groups of students may have an impact on data, such as distance learners and placement students, international students, or students with any personal specific needs.  For example some students may be more likely to use one specific kind of resource a lot out of necessity.  Will they be of a large enough number to skew results?
  • Some student groups are paid to attend courses and may have more incentive to participate in information literacy related elements e.g. nurses, who have information literacy classes with lots of access to e-resources as a compulsory part of their studies.

A key thing emerging here is that lots of resource access doesn’t always mean quality use of materials, critical thinking, good writing skills…  And even after all this we need to think about sample sizes – our samples are self-selected, and involve varying sizes of universities with various access routes to resources.  Will these differences between institutions be a factor as well?

All we can do for now is take note of these and remember them when we start getting data back, but for now I set to thinking about how I’d revise the hypothesis if we could do it again, with a what is admittedly a tiny percentage of these issues considered within it:

“There is a statistically significant correlation between library activity and student attainment at the point of final degree result”

So it considers library usage overall, degree result overall, and a lot of other factors to think about while we work on our data!

Notes from the first meeting 11.03.11

The group had its very first meeting on Friday the 11th, and it was a full house – almost all the group members managed to make it to Huddersfield, and were greeted with hot cross buns and biscuits a-plenty.  

Introductions were made, and the meeting kicked off with Dave Pattern providing an overview to the background of the project.  The germ of an idea began when the library started investigating the kind of people who were using the library, looking at an overall picture rather than something specifically course based.  However, it became obvious that there were certain courses who used the library a lot, and some who barely entered, if at all.  Creating a non/low usage group within the library at Huddersfield gave the team a chance to focus on targeting specific groups to examine use in more detail, but never created a statistically sound basis to make assumptions, and so the LIDP was conceived! 

Graham Stone, the project manager, went through the project documentation and how information is to be disseminated via the blog (with comments welcome from all project members), and reminded members that we don’t consider a positive correlation between library use and attainment to be a causal relationship!  The group is very aware of other factors that come into attainment and is by no means suggesting that library use is the only element of importance!  Data protection and ethical issues were considered, keeping in mind pending information from Huddersfield’s legal advisor.

 Graham asked for volunteers to join a project steering group based at Huddersfield (taking travel distance into consideration!), and it was agreed that Salford would have a representative join the group (a blog post dedicated to the steering group is coming soon).

Bryony Ramsden, the project research assistant, talked about issues that might disrupt the hypothesis (see the main hypothesis blog post), and introduced the idea of running focus groups.  Some qualitative data would help explain exactly why some people use the library a huge amount, and some don’t, and help discover why discrepancies between courses might develop.  Samples would ideally be a mixture of student types, covering the main groups of undergraduates and postgraduates both full and part time across various schools/bodies.  Groups will need to run soon to ensure students aren’t disrupted too much before exams and assignment due dates begin to take up their time, and having found term differences between institutions already the plan was modified from running groups in April and May to over March and April!  Data collection could end up running a little tight here, but a move forward could actually be beneficial to all parties if the data is ready earlier than planned.

Dave talked about data collection and emphasised that he realises not all institutions will be able to provide all same sets of data types.  He talked through different routes of accessing data to maximise what could be available with a minimum of difficulty.  He offered a number of options for passing the data back to him (SQL, Excel, or he can provide coding to help if required), with at least data from academic year of 2009/10.  Concerns were expressed that because of variations in graduation dates data may not cover a full academic year, but if these courses are flagged up there may be potential for comparison between like courses.  Dave said he’ll create a document detailing the systems of each institution so that he can offer advice easily on data gathering, and reminded everyone that if they have any other data they think might be useful, he’ll welcome suggestions.  Data encryption issues were discussed to emphasise the data protection issues raised in the exchange process.  Data should be submitted to Dave by 23rd April.

Having discussed all the core important elements to get things moving, the group went their separate ways, some to trains and car journeys, others to the pub (the Head of Steam, right on the train platform for convenience…).