JISC MOSAIC Project

Background

The Project continues on from the JISC funded TILE Project and, amongst other things, will be investigating the benefits of mining usage data from multiple university libraries.

Usage License

The Perl scripts linked to from this page are released under a CC0 licence and are provided "as is", with no strings attached.

Data Files

In order to generate the XML version of the circulation data, suitable for submitting to the project, you will need to generate 4 separate text files (along with an extra optional file which contains exclusions). In each case, the format of the data in the files is TSV (i.e. fields separated by a tab character, and a newline separating each row of data).

As the contributing libraries may be submitting mutliple years worth of data, it is possible to prepare separate user, transaction and exclusion data per academic year (denoted by <year> in the file name).

In the examples below, the character prepresents a tab character. A * means that the field is required.

user file: users.<year>.txt

FIELDS:

  * user ID
    course ID
  * progression level

SAMPLE:

    67890  →  ABC123  →  UG3
    45678  →  DEC987  →  PhD2
    76543  →          →  staff

The user ID is whatever ID you want to use to identify an individual library user. It will be converted to a MD5 hash value before the data is submitted to MOSAIC. It must match the user ID contained in the transaction file.

The course ID is whatever ID or code you use to identify a course that a student studies on. It must match the course ID in the course file. For library users who are not on a course (e.g. staff), the value can be blank.

The progression level value is taken from the pre-defined list in the MOSAIC documentation.

transaction file: transactions.<year>.txt

FIELDS:

  * timestamp
  * item ID
  * user ID

SAMPLE:

    1222646400  →  114784  →  67890
    1225756800  →  103828  →  67890
    1225756800  →  62580   →  76543

The timestamp is in Unix time format (i.e. the number of seconds since 1st Jan 1970 UTC). It is used to calculate the day the transaction occurred on.

The user ID is whatever ID you want to use to identify an individual library user. It will be converted to a MD5 hash value before the data is submitted to MOSAIC. It must match the user ID contained in the user file.

The item ID is whatever ID you want to use to identify a library book. It must match the item ID contained in the item file.

item file: items.txt

FIELDS:

  * item ID
  * ISBN(s)
  * title
    author(s)
    publisher
    publication year
    persistent URL

SAMPLE:

    123 → 0415972531 → Music & copyright → L. Marshall → Wiley   → 2004 → http://libcat.hud.ac.uk/123
    234 → 0415969298 → Songwriting tips  → N. Skilbeck → Phaidon → 1997 → http://libcat.hud.ac.uk/234

The item ID is whatever ID you want to use to identify a library book. It must match the item ID contained in the item file.

The ISBN(s) are one (or more) ISBNs, separated by a | pipe character where more than one ISBN is linked to the item (e.g. 0415966744|0415966752).

The title is the title of the book.

The author(s) are one (or more) names, separated by a | pipe character where more than one name is present (e.g. John Smith|Julie Johnson).

The publisher and publication year are the name of the publishing company and the year of publication.

The persistent URL is the web address the item can be found at (e.g. on your library catalogue).

course file: courses.txt

FIELDS:

  * course ID
  * course title
    course code(s)

SAMPLE:

    AE110  →  BA(H) English & Media     →     QP33
    AE120  →  BA(H) Drama & English     →     W440|W4W3|WP43|WQ43
    AE200  →  BA(H) English Language PT → 	

The course ID is whatever ID or code you use to identify a course that a student studies on. It must match the course ID in the user file.

The course title is the human readable name of the course.

The course code(s) is a list of zero (or more) UCAS or JACS course codes. A | pipe character should be used to separate multiple values.

optional exclusion file: exclude<year>.txt

FIELDS:

  * match point
  * value

SAMPLE:

    course  →  AE100
    user    →  12345
    item    →  67890
    prog    →  PhD3+
    coprog  →  AE100|PhD3+

Although it is possible to simply exclude any data you do not want to submit to MOSAIC by not including it in any of the above files, you can also specify specific value to be excluded as the files are parsed.

The match point can be one of 5 values...

    course      = a specific course ID
    user        = a specific user ID
    item        = a specific item ID
    prog        = a specific progression level
    coprog      = a specific course and progression level combination

For example, if you have a borrowed account for handling inter-library loans, you may want to exclude it from the data submitted to MOSAIC. Alternatively, if a certain course only has a single student on it, you may wish to exclude that course to ensure that the borrowing habits of that individual are not exposed.

The value is the relevant value to match. For coprog values, specify the course ID and progression level with a | pipe character inbetween.

Perl Scripts

These scripts are released under a CC0 licence and are provided "as is", with no strings attached.

data2xml.pl

Notes:

  1. Various options can be configured in the MAIN VARIABLES section, although some of these can be overriden on the command line, e.g...
  2. The XML filename is based on the options -- e.g. mosaic.2005.level1.1244486113.0000001.xml is level 1 data from 2005.
  3. You can choose to generate a debug file which will list details of transactions that have been ignored or excluded.

Comments

If you have any comments, feedback, questions, etc, please send them to the Library Systems Manager, Dave Pattern (d.c.pattern<at>hud.ac.uk).


This document was last updated on 08/Jun/2009 at 19:40pm