Lemontree

In work package 6 (Data Analysis) we said we would investigate some in house projects:

Lemontree is designed to be a fun, innovative, low input way of engaging students through new technologies and increasing use of library resources and therefore, final degree awards. The project aims to increase usage of library resources using a custom social, game based eLearning platform designed by Running in the Halls, building on previous ideas such as those developed at Manchester Metropolitan University to support inductions and information literacy and uses rewards systems similar to those used in location based social networks such as Foursquare.

Stone, Graham and Pattern, David (2012) Knowing me…Knowing You: the role of technology in enabling collaboration. In: Collaboration in libraries and learning environments. Facet, London. ISBN 978-1-85604-858-3

When registering for Lemontree, students sign terms and conditions that allow their student number to be passed to Computing and Library Services (CLS). This allows CLS to track usage of library resources by Lemontree gamers versus students who do not take part. As part of LIDP 2, we wanted to see if we could analyse the preliminary results of Lemontree to investigate whether engagement with Lemon Tree makes a difference to student attainment by comparing usage and attainment of those using Lemon Tree with those that are not across equivalent groups in future years (we only planned to come up with a proof of concept from Phase 2)

Andrew Walsh, who is project managing Lemontree at Huddersfield reports,

Lemontree has slowly grown its user base over the first year of operation, finishing the academic year with 628 users (22nd May 2012), with large numbers registering within the first few weeks of the academic year 2012-2013 (over 850 users registered by 5th October 2012). This gives us a solid base from which we can identify active Lemontree users who will be attending university for the full academic year 2012-2013.

Lemontree currently offers points and awards for entering the library, borrowing and returning books and using online resources as well as additional social learning rewards, such as leaving reviews on items borrowed. The rewards are deliberately in line with the types of data we analysed in the first phase of LIDP.

We have seen healthy engagement with Lemontree with an average of 74 “events” per user in the first year, with an event being an action that triggers points awarding.

At the end of this academic year, we will identify those users registered for the full year and extract usage statistics for those students. Those who registered in their second or third years of studies will have their usage statistics compared to their first year of study, to see if engagement with Lemontree impacted on their expected levels of library usage. For those students registered at the start of their first year, we will investigate whether active engagement with the game layer has an impact compared to similar groups of students, such as course, UCAS points, etc., to see if early intervention using gamification can have an impact throughout a student’s academic course.

Survey on Analytics Services for Business Intelligence

Both LIDP and the CoapcAD undertook to investigate the feasibility of national shared service for data analytics services.

On behalf of JISC, we have got together with Mimas to undertake this preliminary survey to understand potential demand for data analytics services, which can enhance business intelligence at the institutional level and so support strategic decision-making within libraries and more broadly.

Both projects envision a shared service that centrally ingests and processes raw usage data from different systems and provides analytics tools and data visualisations back to local institutions.

The survey is open until October 18th 2012 and should take approximately 15 minutes to complete.

http://www.survey.bris.ac.uk/mimas/analytics

Article in the Times Higher

Thanks to @Mitchley at DMU for spotting an interesting article in the Times Higher, which reports on a study at the University of East London that found that students who buy more books achieve better degree results.

Degree results speak volumes

Although we don’t have any hard statisical evidence, some initial work we did on measuring the link between a book usage and the average % final grade of the students who borrowed it seemed to indicate that some core/essential titles were less borrowed by higher achievers than we were expecting to see. We felt the most likely explanation was that high achievers weren’t borrowing the book from the library beacuse they had in fact purchased it.

Final blog post: managing the data

In terms of the data, this project was both simpler and more complex than the first stage. The fact that we were only dealing with data from a single institution made a big difference: we didn’t have to worry about ensuring that our classifications of data matched across different institutional systems and we only had one set of people to talk to when the data wasn’t quite what we were expecting!

On the other hand, the data itself was considerably more complicated. We had a larger number of variables to manage, and in many cases the data didn’t present itself in the ideal format for analysis. So a lot of the work on this round of the project was taken up with recoding data – and that involves not just hours of shouting at Excel but also a bit of brainpower to decide how to create groups and sub-groups within different variables.

Let’s take, as an example, the ‘ethnicity’ variable. The data provided by Huddersfield divided students into 20 different ethnicity categories, some of which had only a handful of students in them. This was problematic for two reasons – first, the students might potentially be identifiable in such small groups, and second, it was very unlikely that statistical tests would reveal any meaningful differences at that level of detail. By grouping the 20 categories into 5 or 6, we protected student identities and increased the viability of our statistical tests.

Now, it’s important to stress that you must not simply regroup data in order to get a statistically significant result, trying lots of different groupings until you get one that gives you the numbers you want. We had to have some theoretical underpinning to our categorisation – and this is where the ‘brainpower’ I mentioned earlier comes in. We could have categorised everyone into ‘white’ and ‘non-white’ but this wouldn’t necessarily give us particularly useful information: it would not reveal any differences between non-white groups, which might be important when planning activities to support those various groups. So we were looking for a happy medium somewhere between 20 and 2 groups.

Luckily, this is a problem that other researchers have faced before, and we were able to borrow from some fairly standard categories (you can see them in more detail in this post). Even then, though, we had to scratch our heads a bit to map the Huddersfield categories onto the new ones. For example, we had several groups called things like ‘Black – Other’ and ‘Asian – Other’. Should we put these into one big ‘other’ category, or into ‘Black’ and ‘Asian’ respectively? In the end, we felt that we didn’t have enough information to put them into the more specific categories. For example, our ‘Asian’ category referred specifically to the Indian subcontinent, and we grouped China separately. But we didn’t know whether our ‘Asian – other’ students were of Indian or Chinese heritage, or whether they were from one which didn’t fit into our existing groups – Korean, or Thai, for example.

In some cases, we didn’t have standard categories to borrow from, and had to create our own. For example, our data held only the names of courses that students were following, not their discipline or school. So when trying to group by subject, we had to start from scratch, taking the 100-odd courses and grouping them to allow analysis. We did this with the help of the library staff: it’s fairly likely that if we’d used other Huddersfield staff members we would have ended up with slightly different classifications. In the same way, other institutions might have a different take on how best to organise such data, according to their own organisational set-ups.

All in all, arranging and managing the data was one of the more time-consuming elements of the project. This is something that is only likely to get trickier if the project were working with data from several institutions. This begins with coding the text-based data: for example, imagine that one institution records students as ‘Black – African’, another, as ‘African (Black)’ and another as ‘Black African’. It’s not too difficult to get Excel to recognise that these are all the same thing, but it would be very time consuming to do it for each response in a number of variables. And for some variables it might involve some intellectual effort to decide upon equivalence – discipline is a particularly good example of this. Each university will offer slightly different courses, and it will take work to map them onto each other.

After the variables have been coded and brought into alignment, another challenge exists in ensuring that analysis works for all partners. Take country of domicile as an example. Some universities may have a particularly large number of students from, say, the Middle East, and want to separate that out as an analytic category. But, given that we need to keep the overall numbers of analytic categories low in order for the analysis to work and that different universities are likely to have different areas of geographic interest, how do we decide on the best groupings?

We’ll have to have a think about how best to address these challenges, if the research is to progress to include a wider number of institutions who want to work together and pool their data.

FINDINGS post 5: Predicting outcomes

One of the original aims for this project was to see whether library usage data could be combined with other variables to build a model that might help predict student outcomes. At first we thought this mightn’t be possible – student results weren’t normally distributed, and this precluded any regression analysis. But, once we weeded out the part-time, non-Huddersfield-based students, we found that the results were normal. Hurrah! This is always a happy, happy moment for statistics bods.

So, I set out with the data that I had to try and build a model. The aim, in statistics, is to try and build the best predictive model using the fewest input variables. This is for several reasons: it’s practical (saves you from having to collect loads of data which doesn’t add much to the final prediction), it’s elegant, and it reduces the likelihood of problems such as multicollinearity, where two or more of your variables are correlated and which stuffs up the predictive power of your model. It’s also important that you only use variables which you have a strong theoretical reason for including – randomly picking variables to see which one gives you the best result is called data mining, and it’s Not Allowed. For these reasons, I only used the library usage variables which we’ve already shown to be related to outcome: e-resource hours, PDF downloads, number of resources accessed 1, 5 and 25 times, and number of items borrowed.

Regression analysis in SPSS gives you about a million different outputs, but the key one, in terms of understanding the predictive ability of the model, is the adjusted R². This figure ranges from 0 (terrible) to 1 (perfect, and highly suspicious!). I was taught that in the social sciences an adjusted R² of around .7 is considered pretty good going. The one we achieved with this model is .106. Oh dear.

Furthermore, there do seem to be some potential problems with multicollinearity. This isn’t hugely surprising – after all, we ought to expect the three variables for number of e-resources accessed to be related, especially the last two – if you’ve accessed something 25 times you have also accessed it 5 times after all! It may also be that there’s a relationship between the hours spent logged into e-resources and the number of resources accessed.

So I next tried a more parsimonious model; one which eliminated the variables for number of e-resources accessed 5 and 25 times. The adjusted R² for this model is .107 – fractionally better – and the multicollinearity seems to be less of a problem. But it’s still not a great predictor. For the next step, I added in the demographic variables that we have, and got a model with an adjusted R² of .177. This pretty much exhausted all the data that I had available to me, and I have therefore declared the answer to our original question to be – no, you can’t build a model to predict outcomes with the data available to this project.

Now, that’s not to say that nobody could. There are a lot of variables we haven’t included. In some cases this is because we don’t have the data – for example, UCAS points might be a useful indicator of a student’s potential on arrival and a good thing to include, if you have them. Others are because the data is simply too difficult to collect – how do you measure a teacher’s ability (as opposed to how that ability is perceived by their students!), or stressful life events that a student is dealing with at exam time?

It would be really interesting to see whether integrating the missing information such as UCAS points would improve the model – and whether we could try and find a way of measuring proxies for some of those other issues: contact with pastoral staff, student satisfaction surveys. Of course, this would raise a lot of ethical issues and couldn’t be done lightly. But it would be interesting…

FINIDNGS post 4: Final results and usage

In this round of analysis, we had some new metrics of library usage which weren’t available to researchers in the first round of the project. I’ve already blogged about one of them – overnight usage – and there’s more on that later in this post, but first I’d like to talk about the other three. These are – the number of e-resources accessed (as distinct from the hours spent logged into e-resources); the number of e-resources accessed 5 or more times, and the number of e-resources accessed 25 or more times. These metrics show how many of Huddersfield’s 240-odd e-resources, which range from large journal platforms and databases down to individual journal subscriptions, a student has logged into during the year, once, at least five times or at least 25 times.

You’ll already have seen these three dimensions in the posts on our demographic and subject-based analysis. But now I’m going to see whether there’s a relationship between use on these dimensions and final degree outcome, using the same methodology as Phase 1 of the project. Figure 1 shows the results.

Figure 1: Usage and final degree outcome

As you can see, there are quite a few statistically significant differences for the e-resource dimensions. Most of these are small effects, but the difference between a first and a third is medium sized for the number of e-resources accessed, and the number accessed five or more times. This is a really interesting finding. It suggests that breadth of reading – indicated by using a number of different e-resources – might be a particularly important factor in degree success, and leads to all kinds of questions about how the library might support students in reading widely.

You can see that we’ve found a difference for the percentage of overnight usage as well! Weirdly, this only pops up in relation to the difference between 2.i and 2.ii degrees, and it’s a miniscule effect. I’m inclined to dismiss this as a blip and go with our previous finding that there isn’t a significant difference between grades in terms of their overnight usage: with the same caveat that our model is different from the one used by Manchester Met, and thus perhaps not as able to identify nuanced differences.

FINDINGS post 3: Dropping out

I’ve already posted some early findings about usage and dropping out, showing that there is a relationship between not using the library and dropping out. As you might remember, these findings had a fairly big health warning attached to them.

Since that blog was posted, I think I’ve found another way to look at the relationship between non-use and dropping out, one which is a bit more nuanced. You might remember that I created a ‘binary’ dummy variable for use of the library, in order to get around the problem of cumulative usage (we’d expect a third year student to have higher usage than a first year student, simply because they’ve been at the university longer, but we didn’t have access to the year-of-study data that would allow us to correct for this). I mentioned at the time that turning a continuous variable into a categorical one isn’t an ideal solution, and it turns out that a better one was staring me in the face the whole time (this, I find, is a frequent joy of statistical analysis!).

In that very same blog, I said that we were only looking at the most recent year of data: this was to give first and third year students the same opportunity to move themselves from the ‘non-user’ to ‘user’ category. But (and I’m kicking myself now for not seeing this at the time) once you look only at the last year of data, you can use cumulative measures again! After all, if you only look at one year of data for every student, you eliminate the problem whereby third year students will have accumulated more use simply by virtue of the fact that they’ve been there longer. Those first two years cease to be relevant as they’re not counted within the measure! Of course, we still have the problem of different patterns of usage in years 1, 2 and 3, but I think we just have to shrug our shoulders and accept that.

So, I have run some new tests – Mann Whitney U again – using a cumulative measure of usage for the first two terms of the 2010-11 academic year. As you might remember, we’re only looking at people who dropped out in term three (again, this is to try and give the dropouts a decent amount of time to establish their usage patterns: including people who dropped out in their second week of term is going to skew our results as they may not even have had their library induction by that point!). This means that all the students included in this study were at the university in the first two terms, and they have all had exactly the same opportunity to accumulate usage.

You might remember that in the earlier analysis I mentioned that we couldn’t separate out the full-time and part-time students as the sample became too small. Well, the new test resolves that problem, at least in part. Figure 1 shows the effect sizes for the difference in usage between current and dropout students: in the first column, full-time and part-time students based on all sites (Huddersfield and its secondary campuses at Barnsley and Oldham); in the second column, full-time and part-time students based at Huddersfield only; and in the third column, full-time students based at Huddersfield.

Figure 1: Retention and usage

You can see that as we decrease the sample size by removing certain categories of student, some of the dimensions cease to be statistically significant. When we remove sites other than Huddersfield, library PC use stops being significant, and when we limit our analysis to full-time students, the number of library visits also stops being significant. This tallies with the findings from Phase 1 of the project, which found that items borrowed, e-resource hours and PDF downloads were the only dimensions to have a statistically significant relationship with final degree outcome.

The effect sizes here are all very small, but they are significant and they are there for the number of items borrowed, the hours logged into e-resources and the number of PDF downloads. This reinforces the findings from my earlier analysis; that non-usage of the library can’t be taken, on its own, as a predictor of dropping out, but that there is some kind of relationship which suggests that it could be a useful warning signal, taken in the context of other warning signals such as course tutor feedback, lack of attendance at lectures and so forth.

It might have been interesting to look at some of the other dimensions of use in relation to dropping out. Percentage of overnight use is one obvious one: we didn’t find a correlation between overnight use and poor grades, but perhaps there’s one between overnight use and dropping out? The number of e-resources accessed might also be interesting. We can’t do this with the Huddersfield data at the moment, because both of those dimensions contain information about the full year and it’s not possible to extract terms one and two for analysis. Maybe that’s one for Phase 3…?

FINDINGS post 2: discipline matters

We’re moving on in this second of our final blogs to look at the relationship between discipline and usage. There are some interesting – although not necessarily unexpected – findings here, and bigger effect sizes than we saw with the demographic variables. Again, I’ve only shown the differences that are statistically significant: effect sizes are highlighted by the depth of colour, with small effects pale, medium effects a little darker, and large effects very dark.

You might remember that we had to aggregate some of the demographic categories – such as ethnicity and country of origin – both to protect student confidentiality and to get meaningful results. The same is true of our discipline-based analysis. But, because we suspected that this was going to be quite an important area, we’ve decided to go a bit more granular. So, we’ve created two levels of analysis. Each of the 100-odd courses offered by Huddersfield to its full-time, undergraduate students has been classified into one of 17 ‘clusters’. These clusters have then been aggregated to form six ‘groups’. We can compare the groups to see some overarching differences, and then drill down to compare clusters within groups, to get a more detailed understanding.

It’s important to mention that the grouping work was done by librarians and student support staff, so it represents the relationships that they see between the different courses. I suspect that if we’d asked course tutors, lecturers or students themselves we might have seen slightly different combinations. Also, the grouping was slightly driven by numbers: we had to make sure that there were enough people in each category to make the statistical tests viable and to ensure that anonymity was protected.

Limitations duly mentioned, let’s go on to look at the findings. First, the aggregated subject groups. We used the social science group as our control, as it was the biggest. As you can see from Figure 1, it is also the higher user in most comparisons where significant differences exist: the only exceptions are the comparisons with the number of items borrowed and the number of e-resources accessed by health students. We think this might be because health students are often out on placements, limiting their opportunities to visit the library but making e-resources more important. Furthermore, e-resource use is heavily embedded into the nursing curriculum, and most students will have classes which require them to go into the library’s e-resources and search for items.

Figure 1: Aggregated subject groups

The overall takeaway from this figure, I think, is that computing and engineering students are less intensive users on a number of dimensions, with a medium effect for the number of items borrowed. Arts students are very low users compared to the social sciences, with medium effects on several dimensions.

Let’s move on now to look at how the smaller clusters relate to each other within the groups. Poor old science was in a group all by itself: it wasn’t possible to sub-divide this one so we’ll skip over it and look at health. This has been divided into nursing and other health disciplines (including subjects such as sports therapy, physiotherapy, podiatry and occupational therapy) and you can see from Figure 2 that nursing is a bigger user on pretty much every dimension. There’s a large effect for the number of e-resources accessed, suggesting that health students are reading more widely than their nursing counterparts. This might be because nurses are required to use a certain number of documents for some of their assignments – ten pieces of a specific kind of research for systematic reviews, or four journal articles for an early assignment; the wide use might represent their efforts to find exactly the right kind of resource. There are also medium-sized effects for library visits, hours logged into e-resources, number of e-resources accessed 5 or more times and number of PDF downloads and small effects for hours logged into library PCs and number of e-resources accessed 25 or more times; in each case, nurses are the higher users.

Figure 2: Health group

The computing and engineering group has been divided into – errr – computing and engineering! The differences here are fewer and smaller. Perhaps unsurprisingly, engineers use the library PCs more often – presumably the computing students are happily tapping away on their personal laptops. And computing students are downloading more PDFs. But on the whole, the behaviour of these two clusters is quite similar.

Figure 3: Computing and engineering group

 

Now we move onto the humanities group, which has been subdivided into three clusters: English, drama and media and journalism. The first thing to note is that there are no statistically significant differences between English and drama students. But there are differences between these two clusters and the media students, and where those differences exist the media students have the lower level of usage. Most of these differences relate to e-resource use: the English students have higher use on pretty much all the e-resource dimensions with medium-sized effects. The differences between media and drama are smaller.

Figure 4: Humanities group

Now, on to Figure 5 and the social science group, which looks colourful and complicated! The first thing to note is the number of cells which are shaded in dark colours: there are a lot of big effects within this group. Overall, behavioural sciences dominate. They have higher usage, at a statistically significant level, than every other cluster on at least one dimension. Business is the next-most-dominant discipline, although it’s worth noting that business students borrow fewer items than their colleagues in every other discipline except law, and that the effect sizes here are medium or large. Lawyers, in fact, have the lowest use compared to most subjects, but they do use the library, and especially its computers, more than their counterparts in social work and education. Finally, there are no significant differences between social work and education: perhaps it’s unsurprising that these two vocational courses have similar patterns of usage.

Figure 5: Social science group

Finally, we turn to the arts group. This one might have been skewed a bit by the inclusion of music, which contains a couple of courses which might have fitted alongside English and drama in the humanities group as well as more technology-focused subjects which fit with the design courses. Musicians are heavier users than every other cluster on at least four dimensions. They borrow more items than all of the other clusters, in each case with a large effect. They are also higher users of electronic resources, particularly when compared to the 2D and 3D designers. Elsewhere, there are fewer significant differences. Architects do not visit the library very often compared to the other clusters in this group, but fashion designers do. It’s possible that the architects’ relatively low level of use is because they have a ‘Design Centre’ in the department, which offers access to computers, journals and other materials: the ‘Art and Design Resource area’ in the library has traditionally been more focused on textiles and fashion design, and might explain their higher number of library visits. And, as a final footnote, 3D designers are fond of e-resources.

Figure 6: Arts group

The main finding from this section of the analysis – that discipline has a big effect on patterns of library usage – might not be earth shattering. But it does provide statistical backing for something that many librarians will already know anecdotally or from their own observations. This could be a really useful starting point for conversations with academics – checking whether the low usage by their students is a cause for concern and, if so, what might be done to increase it.

FINDINGS post 1: demographic differences

And so, our exciting rollercoaster of findings gets underway with a quick look at some of the demographic factors which seem to affect usage of library resources. Now, I’ve already posted about some early findings, which hadn’t been tested for statistical significance. These new findings have, and I’m only showing the results which are significant: i.e. where we can be confident at the agreed level (which varies from test to test but is in every case a standard statistical method) that the results represent real differences within a wider population, and aren’t just a coincidence within the sample of data that we’ve got.

We’re looking here at final year students who have finished their degrees, were full-time students and whose courses were based at the Huddersfield campus. This helps us to exclude a few variables that might possibly confound our overall findings: for example, we might find that mature students have less library usage, but if lots of our mature students are part-time students (and we haven’t tested for this, so I don’t know – it’s just an example) then we wouldn’t be able to tell whether it’s their maturity or their part-time status that limits their use of the library. Now, one thing we haven’t been able to do is to control for the different variables that we want to test – so we don’t necessarily know whether, say, our Asian students are disproportionately male, and it’s their gender rather than their race that makes them use the library more often (again, just an example – don’t quote me on this). This is a bit of a problem, but it’s not unusual in statistics and with the sample that we’ve got there’s no way round it other than to shrug our shoulders and make sure we acknowledge this when we report the findings. (Hence this paragraph of caveat!)

For each finding, I’m showing the effect size. For the tests that we’ve used (Mann-Whitney U, fact fans), these are generally reckoned as follows: anything up to .3 is a small effect, between .3 and .5 is a medium sized effect, and anything over .5 is a large effect. (Ignore the minus signs by the way – they’re just a function of the test and don’t mean anything.) You’ll notice that most of the demographic variables only show small effect sizes but don’t worry – it gets a lot more exciting when we look at subjects.

I think most of the dimensions of use are fairly self-explanatory, but I should probably clarify what we mean by the three that refer to the ‘number of e-resources accessed’. This metric shows how many of Huddersfield’s 240-odd e-resources, which range from large journal platforms and databases down to individual journal subscriptions, a student has logged into during the year, once, at least five times or at least 25 times.

So, without any further ado, let’s crack on!

Figure 1: Age and usage

Figure 1 shows the relationship between age and usage. We’ve separated our students into mature – those who entered the university aged 21 or older – and non-mature students (it was VERY difficult to come up with a non-pejorative name for that group!). As you can see, mature students have higher library usage than non-mature students on every dimension except library visits and hours spent logged into the library PCs. Remember, these are all full-time students, so it’s not that they’re illicitly logging in from work rather than visiting the physical library. We wondered whether the mature students can afford their own laptops and therefore have less need of the library PCs: it might also have something to do with the way younger students treat the library as more of a social space to hang out with friends. Most departments also have resource centres where students can access computers, and may seem like a less daunting study environment for some students.

Figure 2: Gender and usage

Figure 2 looks at gender and usage. Again, we see differences here on almost every metric – hours logged into the library PCs and number of e-resources accessed 25 or more times are the only exceptions. And on almost every one, women are bigger users than men – the only exception is the number of visits to the library, where men dominate.

Now, we’re looking at something a bit different for the next few figures. Rather than comparing each group to every other, we’ve chosen one group as a control and compared all the rest to them (this is for reasons to do with our statistical methods). In each case, our control is simply the biggest group: this allows us to compare minorities with the majority and hopefully identify behaviour that might otherwise get lost.

Figure 3: Ethnicity and usage

Figure 3 looks at ethnicity and usage; the control group here is ‘white’. (For more information on how we constructed our ethnicity categories, go back to my earlier post.) You’ll notice that there are fewer significant differences here, and almost none to do with e-resources. The only exception is the number of e-resources accessed by Chinese students – we’ll come back to that. Asian students are big users of the libraries, and especially the library PCs, but aren’t using their high levels of use to borrow more items. Facebook, anyone?! Black students are also big library users, and they are borrowing more than their white counterparts. Chinese students, as we’ve said, are borrowing less and accessing fewer e-resources than white students: there may be an issue here to do with breadth of reading which (as we’ll see in a few posts’ time) is important.

Figure 4: Country of domicile and usage

Figure 4 looks at country of domicile (Huddersfield-ese for ‘where do you live when you’re not at university?’), with the control group being students based in the UK. There are more significant differences here on several dimensions of use. Notably, students from the new EU (member states which joined in 2004 or later) are very keen on computers: they spend more time logged into e-resources, download more content and use more resources more often than UK-based students. Students from the old EU (pre-2004 member states) and the very broad ‘rest of the world’ category visit the library less than UK students, but borrow more items and use the PCs more often when they are there. Students hailing originally from China show lower usage on a number of dimensions: alongside Figure 3, this suggests that Chinese students are systematically lower users of library resources.

All this is very interesting, but how useful is it in helping librarians to develop services that meet the needs of their users? In truth, it’s really only a first step. We now know where the differences in usage are, but we don’t know why they exist, and that’s what we need to understand if we are to tailor services to students. Maybe the Chinese students are getting all their information from alternative sources and so it doesn’t matter that their usage is much lower. Perhaps the high use of e-resources by students from the new EU doesn’t indicate thorough, broad reading but rather very inefficient search and discovery strategies. In order to really understand what’s going on behind the numbers, we will need to take a more qualitative approach, running focus groups and case studies to explore students’ behaviours and the reasons for those behaviours. Only then can we attempt to understand what we should do to ensure the library’s doing everything it can to support all its users.