Category Archives: Data Capture

LIDP Toolkit: Phase 2

We are starting to wrap up the loose ends of LIDP 2. You will have seen some bonus blogs from us today, and we have more about reading lists and focus groups to come – plus more surprises!

Here is something we said we would do from the outset – a second version of the toolkit to reflect the work we have done in Phase 2 and to build on the Phase 1 Toolkit:

Stone, Graham and Collins, Ellen (2012) Library Impact Data Project Toolkit: Phase 2. Manual. University of Huddersfield, Huddersfield.

The second phase of the Library Impact Data Project set out to explore a number of relationships between undergraduate library usage, attainment and demographic factors. There were six main work packages:

  1. Demographic factors and library usage: testing to see whether there is a relationship between demographic variables (gender, ethnicity, disability, discipline etc.) and all measures of library usage;
  2. Retention vs non-retention: testing to see whether there is a relationship between patterns of library usage and retention;
  3. Value added: using UCAS entry data and library usage data to establish whether use of library services has improved outcomes for students;
  4. VLE usage and outcome: testing to see whether there is a relationship between VLE usage and outcome (subject to data availability);
  5. MyReading and Lemon Tree: planning tests to see whether participation in these social media library services had a relationship with library usage;
  6. Predicting final grade: using demographic and library usage data to try and build a model for predicting a student’s final grade.

This toolkit explains how we reached our conclusions in work packages 1, 2 and 6 (the conclusions themselves are outlined on the project blog. Our aim is to help other universities replicate our findings. Data were not available for work package 4, but should this data become available it can be tested in the same way as in the first phase of the project, or in the same way as the correlations outlined below. Work package 6 was also a challenge in terms of data, and we made some progress but not enough to present full results.

The toolkit aims to give general guidelines about:

1. Data Requirements
2. Legal Issues
3. Analysis of the Data
4. Focus Groups
5. Suggestions for Further Analysis
6. Release of the Data

Further analysis

We are pulling out much more detailed data for LIDP2, including the time of day a resource was accessed. Some initial graphs are now available here, however, please note that this is just raw data and need to be cleaned up.

There do seem to be some interesting trends. For example the graph below implies that not only do students who achieve a 1st class degree use more e-resources, but they also use more during ‘office hours’, students who receive a lower degree are the highest uses overnight – this appears to tally with other research that shows that students who log in to a universities VLE are more at risk of dropping out.

The next graph appears to show that students who obtained a first use e-resources most between 2am and 7am – perhaps chasing deadlines? Students who get a 1st peak btween 10am and 11am.

Please note that we haven’t cleaned this data up yet, e.g. we haver not excluded overseas users.

Focus group analysis

The focus group analysis has just been released to each individual collaborating institution.  The groups were designed to pull out additional advising data on usage of library resources and facilities, asking students how much they used library facilities and resources, where they chose to use the resources, any difficulties they experienced, and whether the library satisfied their information and learning space requirements.

Students volunteered with a small reimbursement for their time and involvement, with varying success at each institute (if you’ve been following the blog, you’ll have already seen De Montfort’s focus group discussion), but resulting in a huge amount of data to analyse!

The coding process involved reading through transcripts to bring out broad themes, and refining the themes into smaller groups where applicable.  Transcripts were then re-read for the analysis itself, with the aim to not just code them, but to use thematic clues to develop and elaborate on what students discussed.  For example, a student discussing problems they had encountered using a resource may simultaneously be  indicating non-verbally that their student group could benefit from more in-depth information literacy training, or that there could be improved subscription options for that subject area.

Analysis was also based around frequency of mentions: the more often a code or theme was discussed, the more important an element it represented in student library use/non-use.  This method can be problematic in that it doesn’t always demonstrate emphasis and enthusiasm materialising in the group discussion, or indeed can be heavily influenced by current issues the students are experiencing, but it does still demonstrate what is important to the participant at that time and thus what is meaningful to them.  Additionally, when used in combination with other codes and the analysis technique above, it can result in a revealing image of student experiences and usage, and provide material to lead further research at a later date if appropriate.

Some thoughts from Lincoln

Thanks to Paul Stainthorp at the University of Lincoln for allowing us to cut and paste this blog post. You can see the original at:

I submitted Lincoln’s data on 13 June. It consists of fully anonymised entries for 4,268 students who graduated from the University of Lincoln with a named award, at all levels of study, at the end of the academic year 2009/10 – along with a selection of their library activity over three* years (2007/08, 2008/09, 2009/10).

The library activity data represents:

  1. The number of library items (book loans etc.) issued to each student in each of the three years; taken from the circ_tran (“circulation transactions”, presumably) table within our SirsiDynix Horizon Library Management System (LMS). We also needed a copy of Horizon’s borrower table to associate each transaction with an identifiable student.
  2. The number of times each student visited our main GCW University Library, using their student ID card to pass through the Library’s access control gates in each of the three* years; taken directly from our ‘Sentry’ access control/turnstile system. These data apply only to the main GCW University Library: there is no access control at the University of Lincoln’s other four campus libraries, so many students have ’0′ for these data. Thanks are due to my colleague Dave Masterson from the Hull Campus Library, who came in early one day, well before any students arrived, in order to break in to the Sentry system and extract this data!
  3. The number of times each student was authenticated against an electronic resource via AthensDA; taken from our Portal server access logs. Although by no means all of our e-resources go via Athens, we’re relying on it as a sort of proxy for e-resource usage more generally. Thanks to Tim Simmonds of the Online Services Team (ICT) for recovering these logs from the UL data archive.

I had also hoped to provide numbers of PC/network logins for the same students for the same three years (as Huddersfield themselves have done), but this proved impossible. We do have network login data from 2007-, but while we can associate logins with PCs in the Library for our current PCs, we can’t say with any confidence whether a login to the network in 2007-2010 occurred within the Library or elsewhere: PCs have just been moved around too much in the last four years.

Student data itself—including the ‘primary key’ of the student account ID—was kindly supplied by our Registry department from the University’s QLS student records management system.

Once we’d gathered all these various datasets together, I prevailed upon Alex Bilbie to collate them into one huge .csv file: this he did by knocking up a quick SQL database on his laptop (he’s that kind of developer), rather than the laborious Excel-heavy approach using nested COUNTIF statements which would have been my solution. (I did have a go at this method—it clearly worked well for at least one of the other LIDP partners—but it my PC nearly melted under the strain.)

The final .csv data has gone to Huddersfield for analysis and a copy is lodged in our Repository for safe keeping. Once the agreement has been made to release the LIDP data under an open licence, I’ll make the Repository copy publicly accessible.

*N.B. In the end, there was no visitor data for the year 2007/08: the access control / visitor data for that year was missing for almost all students. This may correspond to a re-issuing of library access cards for all users around that time, or the data may be missing for some other reason.

Good news everybody…

We are very pleased to report that we have now received all of the data from our partner organisations and have processed all but two already!

Early results are looking positive and our next step is to report back with a brief analysis to each institution. We are planning to give them our data and a general set of data so that they can compare and contrast. There have been some issues with the data, some of which has been described in previous blogs, however, we are confident we have enough to prove the hypothesis one way or another!

In our final project meeting in July we hope to make a decision on what form the data will take when released under an Open Data Commons Licence. If all the partners agree, we will release the data individually; otherwise we will release the general set for other to analyse further.

5 years of book loans and grades at Huddersfield

I’m just starting to pull our data out for the JISC Library Impact Data Project and I thought it might be interesting to look at 5 years of grades and book loans. Unfortunately, our e-resource usage data and our library visits data only goes back as far as 2005, but our book loan data goes back to the mid 1990s, so we can look at a full 3 years of loans for each graduating students.

The following graph shows the average number of books borrowed by undergrad students who graduated with an specific honour (1, 2:1, 2:2 or 3) in that particular academic year…


…and, to try and tease out any trends, here’s a line graph version….


Just a couple of general comments:

  • the usage & grade correlation (see original blog post) for books seems to be fairly consistent over the last 5 years, although there is a widening between usage by the lowest & highest grades
  • the usage by 2:2 and 3 students seems to be in gradual decline, whilst usage by those who gain the highest grade (1) seems to on the increase