University of Huddersfield -- Circulation and Recommendation Data

Background

Since 2005, the University of Huddersfield has provided book recommendations within its library catalogue, driven by mining of the historical circulation usage data.

At the time of writing, the library has details of just under 3 million circulation transactions spanning a period of 13 years. The mining of this data has proved both beneficial to our students (via recommendation services and easy access to personal borrowing histories) and to the library (via usage analysis to inform stock management).

Involvement with the JISC TILE Project led to a decision to release a sizeable portion of the usage data in the hope that it might prove beneficial to others. The released data represents about 70% of the total circulation data available -- only items with low circulation and/or no ISBNs have been omitted.

Usage License

The data derived directly from the Library Service at the University of Huddersfield is released under a CC0 / Open Data Commons license. Ideally, we would like anyone who uses this data to adhere to the Open Data Commons Community Norms, although you are under no obligation to do so. The purpose of using such a license is to allow the data be distributed, shared and used as widely as possible. If you do something cool with the data, please let us know about it (email: d.c.pattern[at]hud.ac.uk).

This material is Open Data

One file makes use of data derived from LibraryThing's thingISBN service. This file is used to provide a handy index of ISBNs and is distributed under the same Terms of Use as shown on the LibraryThing API web page. Specifically, this data may only be used for non-commercial purposes.

By releasing this aggregated and anonymised data, the University of Huddersfield is hoping to:

  1. allow others to mine the data and to exploit it
  2. stimulate a discussion about the value of library usage data
  3. promote collaboration and sharing of such data in order to benefit both libraries and their users

Data Files

The two main data files are not raw transaction logs -- they are aggregated data.

circulation_data.xml (circulation data file)

This file contains usage data for individual book titles (items), broken down by academic school, by academic courses, and by year. Details for around 80,000 books and over 2 million circulations transactions has been included.

The licence section is followed by the items section which provides usage data per item.

For each item, there is an id attribute whose value is our internal bibliographical number. This is primarily of use to construct a permanent URL to our catalogue, which is included in the url element. The book title (as given on our OPAC), current number of copies held by the library, and one (or more) 10-digit isbns is included. Following this is a loan_section section, which details the circulation data.

The loan_section starts with the total number of loans and ends with a breakdown of loans per year. In between, where loans have been significant enough, details are given for the number of loans per academic school and by academic course. See the schools.xml and courses.xml files for mappings to the ids.

In the example below, "William the Conqueror" has been loaned 88 times, with 84 loans attributed to three of the seven academic schools (mostly to the School of Music, Humanities and Media). Borrowing by students on three courses in particular (especially the AH100 History course) was significant enough to warrant including in the usage data. Borrowing peaked in 1999 with a low in 2004, and we might want to consider weeding one or two of the 5 remaining copies.

<?xml version="1.0" encoding="UTF-8" ?>
 <circulation_data>
  <description>This file contains usage data for over 80,000 items borrowed from the libraries
               at the University of Huddersfield. Data has only been included where usage has been
               significant and when an ISBN is available. Usage breakdowns are provided by year and,
               where usage has been significant, by academic school and by academic course. See
               separate files for ID mappings for schools and courses. Total loans and the current
               number of copies have been included for each item. ISBNs included in the MARC record are
               shown, with further mappings available in the isbnlookup.xml file.</description>
  <licence>
   <type>CC0 / Open Data Commons</type>
   <statement>To the extent possible under law, Computing and Library Services, University of
              Huddersfield, UK has waived all copyright, moral rights, database rights, and any other
              rights that might be asserted over the data contained within this file.</statement>
   <url>http://labs.creativecommons.org/licenses/zero/1.0</url>
   <url>http://wiki.creativecommons.org/CC0</url>
   <url>http://www.opendatacommons.org</url>
  </licence>
  <items>
   <item id="50017">
    <title>William the Conqueror : the Norman impact upon England /</title>
    <copies>5</copies>
    <isbn>0413243206</isbn>
    <url>http://library.hud.ac.uk/catlink/bib/50017</url>
    <loan_history>
     <total>88</total>
     <schools>
      <school id="A">75</school>
      <school id="B">3</school>
      <school id="D">6</school>
     </schools>
     <courses>
      <course id="AH100">36</course>
      <course id="AH230">3</course>
      <course id="AX290">3</course>
     </courses>
     <years>
      <year id="1996">7</year>
      <year id="1998">7</year>
      <year id="1999">13</year>
      <year id="2000">11</year>
      <year id="2001">6</year>
      <year id="2002">9</year>
      <year id="2003">6</year>
      <year id="2004">1</year>
      <year id="2005">8</year>
      <year id="2006">9</year>
      <year id="2007">8</year>
      <year id="2008">3</year>
     </years>
    </loan_history>
   </item>
  </items>
 </circulation_data>

suggestion_data.xml (book recommendation data file)

This file contains "people who borrowed this, also borrowed..." style book recommendations for over 37,000 books.

For each item, we've included title, isbn(s), and a permanent url. This is followed by one or more suggestions.

For each suggested item, there is an isbn. Where we have multiple ISBNs in the recommended item's MARC record, you'll see multiple lines but they'll have a common id attribute. The other attributes are:

The suggestions are listed in descending common order. This can give rise to the "Harry Potter Effect" (at least that's what Tim Spalding calls it!) where heavily borrowed books appear first in the list.

An improved order of recommendations can be obtained by taking into account the total number times the suggested book has been borrowed.

For example, divide the common by total and you'll get a value between 0 and 1, and then sort in descending order. In the example below, that makes 0713164069 ("The Norman Conquest") the best recommendation, rather than 0631132341 ("Anglo-Norman England 1066-1166").

Depending on what the recommendations are being used for, you could also take into account the before, same, and after values. So, for recommending something to borrow at the same time, look for higher same values (e.g. 0333429176 "Conquest and colonisation : the Normans in Britain, 1066-1100 "). For "we think you might be interested in..." recommendations based on the recommendations of previously borrowed items, look for suggestions with higher after values (e.g. 0521379504 "Religion and devotion in Europe, c.1215-c.1515").

<?xml version="1.0" encoding="UTF-8" ?>
 <suggestion_data>
  <description>This file contains recommendation data from our "people who borrowed this, also
               borrowed..." service. For each recommendation, attributes include the number of times
               both items were borrowed (common), the number of times the suggested item was borrowed
               before, after or at the same time, and the total number of times the suggested item
               has been borrowed (useful for re-ranking the results).</description>
  <licence>
   <type>CC0 / Open Data Commons</type>
   <statement>To the extent possible under law, Computing and Library Services, University of
              Huddersfield, UK has waived all copyright, moral rights, database rights, and any other
              rights that might be asserted over the data contained within this file.</statement>
   <url>http://labs.creativecommons.org/licenses/zero/1.0</url>
   <url>http://wiki.creativecommons.org/CC0</url>
   <url>http://www.opendatacommons.org</url>
  </licence>
  <items>
   <item id="50017">
    <isbn>0413243206</isbn>
    <url>http://library.hud.ac.uk/catlink/bib/50017</url>
    <title>William the Conqueror : the Norman impact upon England /</title>
    <suggestions>
     <isbn id="127265" common="41" before="16" same="11" after="14" total="119">0631132341</isbn>
     <isbn id="105889" common="29" before="9" same="6" after="14" total="93">0582482372</isbn>
     <isbn id="50054" common="28" before="11" same="8" after="9" total="91">0413278301</isbn>
     <isbn id="229475" common="25" before="3" same="0" after="22" total="145">0521379504</isbn>
     <isbn id="229475" common="25" before="9" same="0" after="16" total="145">0521370760</isbn>
     <isbn id="122036" common="24" before="7" same="13" after="4" total="59">0713164069</isbn>
     <isbn id="105969" common="24" before="20" same="0" after="4" total="146">0582482771</isbn>
     <isbn id="253678" common="23" before="5" same="13" after="5" total="67">0333429176</isbn>
    </suggestions>
   </item>
  </items>
 </suggestion_data>

schools.xml (lookup file for academic school IDs)

The University of Huddersfield currently has seven academic schools.

courses.xml (lookup file for academic course IDs)

For each course, there is a school attribute linking back to the academic school which runs the course (see schools.xml).

The course codes (id) are internal codes used by the University of Huddersfield. To help map our courses to those run at other UK universities, see the ucas.xml file.

ucas.xml (mappings for the academic course IDs to UCAS codes)

For each course, there are one or more ucas entries. There is a potential many-to-many mapping between the internal course codes and the UCAS codes. Please note, some of our courses do not have a relevant UCAS code.

isbnlookup.xml (FRBR style ISBN lookup index)

Many thanks to Tim Spalding (LibraryThing) for allowing the use of the thingISBN data to generate this index. This file is distibuted under the same licence as the thingISBN.xml.gz file (see the LibraryThing API web page for further details).

For each item, there is an id attribute which maps back to the two main data files. This is followed by a list of FRBRish isbns (which links together different works and editions). Where this data was derived from the thingISBN data, there is a librarything work ID which can be appended to http://www.librarything.com/work/ to provide the relevant LibraryThing URL.

This file can be used to provide better matches to our data from your own ISBNs.

Comments

If you have any comments, feedback, questions, etc, please send them to the Library Systems Manager, Dave Pattern (d.c.pattern[at]hud.ac.uk).


This document was last updated on 11/Dec/2008 at 21:39pm