FINDINGS post 3: Dropping out

I’ve already posted some early findings about usage and dropping out, showing that there is a relationship between not using the library and dropping out. As you might remember, these findings had a fairly big health warning attached to them.

Since that blog was posted, I think I’ve found another way to look at the relationship between non-use and dropping out, one which is a bit more nuanced. You might remember that I created a ‘binary’ dummy variable for use of the library, in order to get around the problem of cumulative usage (we’d expect a third year student to have higher usage than a first year student, simply because they’ve been at the university longer, but we didn’t have access to the year-of-study data that would allow us to correct for this). I mentioned at the time that turning a continuous variable into a categorical one isn’t an ideal solution, and it turns out that a better one was staring me in the face the whole time (this, I find, is a frequent joy of statistical analysis!).

In that very same blog, I said that we were only looking at the most recent year of data: this was to give first and third year students the same opportunity to move themselves from the ‘non-user’ to ‘user’ category. But (and I’m kicking myself now for not seeing this at the time) once you look only at the last year of data, you can use cumulative measures again! After all, if you only look at one year of data for every student, you eliminate the problem whereby third year students will have accumulated more use simply by virtue of the fact that they’ve been there longer. Those first two years cease to be relevant as they’re not counted within the measure! Of course, we still have the problem of different patterns of usage in years 1, 2 and 3, but I think we just have to shrug our shoulders and accept that.

So, I have run some new tests – Mann Whitney U again – using a cumulative measure of usage for the first two terms of the 2010-11 academic year. As you might remember, we’re only looking at people who dropped out in term three (again, this is to try and give the dropouts a decent amount of time to establish their usage patterns: including people who dropped out in their second week of term is going to skew our results as they may not even have had their library induction by that point!). This means that all the students included in this study were at the university in the first two terms, and they have all had exactly the same opportunity to accumulate usage.

You might remember that in the earlier analysis I mentioned that we couldn’t separate out the full-time and part-time students as the sample became too small. Well, the new test resolves that problem, at least in part. Figure 1 shows the effect sizes for the difference in usage between current and dropout students: in the first column, full-time and part-time students based on all sites (Huddersfield and its secondary campuses at Barnsley and Oldham); in the second column, full-time and part-time students based at Huddersfield only; and in the third column, full-time students based at Huddersfield.

Figure 1: Retention and usage

You can see that as we decrease the sample size by removing certain categories of student, some of the dimensions cease to be statistically significant. When we remove sites other than Huddersfield, library PC use stops being significant, and when we limit our analysis to full-time students, the number of library visits also stops being significant. This tallies with the findings from Phase 1 of the project, which found that items borrowed, e-resource hours and PDF downloads were the only dimensions to have a statistically significant relationship with final degree outcome.

The effect sizes here are all very small, but they are significant and they are there for the number of items borrowed, the hours logged into e-resources and the number of PDF downloads. This reinforces the findings from my earlier analysis; that non-usage of the library can’t be taken, on its own, as a predictor of dropping out, but that there is some kind of relationship which suggests that it could be a useful warning signal, taken in the context of other warning signals such as course tutor feedback, lack of attendance at lectures and so forth.

It might have been interesting to look at some of the other dimensions of use in relation to dropping out. Percentage of overnight use is one obvious one: we didn’t find a correlation between overnight use and poor grades, but perhaps there’s one between overnight use and dropping out? The number of e-resources accessed might also be interesting. We can’t do this with the Huddersfield data at the moment, because both of those dimensions contain information about the full year and it’s not possible to extract terms one and two for analysis. Maybe that’s one for Phase 3…?