University of Huddersfield -- Keyword Search Data

Background

Since June 2006, the University of Huddersfield has logged details of keyword searches carried out on its library catalogue. At the time of writing, the library has details of over 3.2 million successful keyword searches.

Usage License

The data derived directly from the Library Service at the University of Huddersfield is released under a CC0 / Open Data Commons license. Ideally, we would like anyone who uses this data to adhere to the Open Data Commons Community Norms, although you are under no obligation to do so. The purpose of using such a license is to allow the data be distributed, shared and used as widely as possible. If you do something cool with the data, please let us know about it (email: d.c.pattern[at]hud.ac.uk).

This material is Open Data

Data Files

keywords.xml (keywords and linked words)

This file contains a list of keywords (keyword) along with a list of the most commonly used words (word) that are paired with that keyword in a multi-keyword search.

For each keyword, a count shows the number of times it has been used as a search term.

For each of the paired words, a count shows the number of searches in which both terms have been used. The frequency value indicates the strength between that word and the parent keyword.

In the example below, for the keyword "depression" which has been used in 2,542 searches, the most commonly paired word is "great" (as in "The Great Depression") and there have been 99 searches that use both terms. The word "great" itself is infrequently used with "depression" (freq=5%) -- in total there have been 1,813 searches using the word "great", but only 99 of them also included "depression" (i.e. 99 / 1813 = 0.0546 or 5%). The word "great" is more commonly paired with words like "britain" and "war".

The word "postnatal" has a much stronger link to "depression" (freq=67%), as it has only been used in 133 searches (89 / 133 = 0.669). The words "natal" (as in "post-natal"), "manic", and "sadness" also have strong links to the word "depression".

<keyword_data>
  <description>Keyword list from over 3 million searches carried out on the
               library catalogue at the University of Huddersfield...</description> 
  <licence>
   <type>CC0 / Open Data Commons</type>
   <statement>To the extent possible under law, Computing and Library Services, University of
              Huddersfield, UK has waived all copyright, moral rights, database rights, and any other
              rights that might be asserted over the data contained within this file.</statement>
   <url>http://labs.creativecommons.org/licenses/zero/1.0</url>
   <url>http://wiki.creativecommons.org/CC0</url>
   <url>http://www.opendatacommons.org</url>
  </licence>
  <keywords>
    <keyword count="2542">depression
      <word count="99" freq="5">great</word> 
      <word count="95" freq="1">cognitive</word> 
      <word count="89" freq="1">therapy</word> 
      <word count="89" freq="67">postnatal</word> 
      <word count="53" freq="2">post</word> 
      <word count="53" freq="87">natal</word> 
      <word count="44" freq="0">mental</word> 
      <word count="42" freq="0">health</word> 
      <word count="40" freq="4">anxiety</word> 
      <word count="39" freq="0">psychology</word> 
      <word count="38" freq="0">children</word> 
      <word count="38" freq="0">women</word> 
      <word count="28" freq="74">manic</word> 
      <word count="27" freq="23">overcoming</word> 
      <word count="15" freq="2">suicide</word> 
      <word count="15" freq="0">stress</word> 
      <word count="14" freq="1">elderly</word> 
      <word count="12" freq="63">sadness</word> 
      <word count="10" freq="22">persistent</word> 
    </keyword>
  </keywords>
</keyword_data>

Comments

If you have any comments, feedback, questions, etc, please send them to the Library Systems Manager, Dave Pattern (d.c.pattern[at]hud.ac.uk).


This document was last updated on 13/Apr/2008 at 12:02pm