Project Overview

Background

MOSAIC is building on the findings and recommendations of the JISC TILE Project, which investigated ‘pain points’ in UK HE library take up of Web2.0 opportunities, in particular relating to the ‘context’ of users (e.g. their course) and their related use of resources.

The TILE project acknowledged the work done by Dave Pattern at the University of Huddersfield with local activity data. MOSAIC builds on this by aggregating data from several institutions and making it available for re-use and experimentation. The Talis podcast with Dave provides further background.

Scope

MOSAIC is investigating the technical feasibility, service value and issues around exploiting user activity data, primarily to assist resource discovery and evaluation in Higher Education. Such activity data might be combined from:

  • The circulation module of Library Management Systems (the initial project focus)
  • ERM system / Resolver journal article access
  • VLE resource and learning object download
  • In addition, reading lists (from a variety of institutional sources, without activity data) may provide key indicators

A number of universities are actively interested in developing recommender and other user activity driven services. Our objective is to make a significant contribution to that effort, nationally and internationally, by generating a test dataset (beyond just circulation or a single institution) freely available for re-use under an Open Data licence – either the Open Data Commons PDDL or the Creative Commons CC0 dedication – thus promoting experimentation and innovation by allowing anyone to freely share, modify and use this work for any purpose without restriction.
The project will therefore assist others working on these issues by assessing scalability and service models, by making data available and by gathering feedback from the community. We hope you will join us and JISC in this journey.

Contact Helen Harrop (helen@sero.co.uk) for further details.

Participants

The project will run from April to November 2009 in order to provide immediate recommendations in to UK HE sector planning processes. The team, led by David Kay of Sero Consulting, brings together Ken Chad, Mark van Harmelen (PLE Ltd), Helen Harrop (Sero), Paul Miller (Cloud of Data) and Dave Pattern (University of Huddersfield).

The project is working with a selection of volunteer UK HE libraries, covering most major LMS products, to assess the potential for aggregating anonymous user activity data from library and learning services. Initial collaborators in investigation and supply of activity data have been Dundee, Falmouth, Huddersfield, Lincoln, Sheffield, Sussex, Swansea, Warwick and Wolverhampton. In addition, Mimas is providing access to the Copac dataset.

The contributed data will be used to test the performance and utility of available technologies and to gather initial user feedback from librarians and students in autumn 2009.

Data

For pilot purposes, the project is developing a common data schema that covers all the record types of interest with a minimum of mandatory fields. This schema aims to be ‘good enough’ for the pilot and thereafter to assist in community agreement of a durable schema if it is successful. To help others wishing to begin experimentation, partners are also sharing scripts and code used to extract data from real systems.

Data Protection places significant demands on such an undertaking. Submitted records will not include individual user details and will be aggregated at the level of course / unit of study and item (e.g. book title). Furthermore, in the data used to derive activity patterns (‘users who did this also did that’), lone transactions in a given group will also be removed.

To ensure the effectiveness of the pilot and to examine the options for wider developments, the pilot data be made available under an Open Data Commons licence, making all the MOSAIC pilot data freely accessible for download and re-use. Part of the MOSAIC business options appraisal will be to consider the pros and cons of Open Data and alternatives in this context.

The specific uses of the pilot data will be:

  • Population of the MOSAIC demonstrator, using SOLR database technology
  • MOSAIC evaluation workshops
  • Focus for experimentation at the Mashed Library event (7 July 2009)
  • A competition for the best activity data application (closes 31 August 2009)
  • Experimental use by developers anywhere, including systems vendors