Semantifying University of Huddersfield Library's Circulation Data

On Friday, Tony Hirst tweeted about the University of Huddersfield releasing a big pile of circulation data from their libraries. In his post about it here, he admonishes us "DON’T YOU DARE NOT DO THIS…" . After reading his post, reading the arguments for doing so that the mastermind behind it, Dave Pattern, puts forward in his post about it, and especially after looking at the nice clean XML data, I knew what I'd be doing this weekend.

(Impatient? Cut to the chase here or a book's Exhibit here)

I can't get our library to do this, but I can take up Dave's invitation to see what can be done with such a useful dataset. Hmm. Did somebody say Semantic Web?

The data -- under a Open Data Commons/CC0 license -- comes in four files for circulation data, suggestion data, courses, and schools. After spending so much time lately scraping data from the HTML coming out of blogs, it was a delight to look at XML files that made it easy to write XSL transformations on well-formed code. It was easy to write XSLTs to transform it to RDF. Here's the files: circulation_data.xsl, suggestion_data.xsl, courses.xsl, schools.xsl.

Of course, there weren't ready-made ontologies for this data (at least not that I knew of). Good thing Ian Davis recently created OpenVocab, so that for small one-off projects you can create a quick vocabulary. Alas, so far I've had trouble connecting to it. But here's the basic schema that I'll be putting in when my interwebs are happier.

I've got a couple example of ways to display and sort the data, using two of my favorite tools, ARC for RDF storage and processing and Exhibit for display. I'm most excited about the feature to look up a book by the library web page for the course -- very smart of Dave to include that info. In a production environment instead of the demo on my server (more on that below), it should be an easy thing to add a link from the library's page for a title to this, or a similar, display of additional info about the book.

Update: Looks like some of the results are coming up duplicates. That's because a lot of the RDF data came in as blank nodes, the mysterious green blotch on the Twinkie of RDF. Looks like, due to some problems in loading all the data, some of the info got loaded twice, which duplicated the blank node triples.

Speaking of additional info, since the RDF Book Mashup constructs URIs for books based on ISBN numbers, it was easy enough to include owl:sameAs links from the circulation data to the book mashup, letting it play a little with the LOD Cloud. Depending on performance issues, I hope to make the Exhibits incorporate data from there, too.

The SPARQL endpoint open for reading here. And here's a file of sample SPARQL queries to run -- you'll also want to refer to the schema to see how things are related, or run the XSLTs on the sample data to see the RDF structure.

WARNING: Because I'm on a shared hosting server, the SQL resources sometimes aren't quite there. So this might or might not be available and working happily at any given time. Ideally, if this looks useful, I'd love to see it adopted by someone who can give it a happier home.

One important note. I saw the tweet from Tony on Friday afternoon. It's now Sunday afternoon and the store, with Exhibits and endpoint, is up and running. That's tweet to triplestore in two days. Anyone still wondering if RDF and Semantic Web approaches aren't ready to do some really cool things?

Trackback URL for this post:

http://www.patrickgmj.net/trackback/150

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Hello,

I think I better to look at this page too http://vocab.org/aiiso/schema-20080514

you may want to check this too, web hosting

have a nice day.

is that about XML technology? or could we use anyother technologies to use, change it?

The original data in the four files I linked to are XML, and the license leaves uses pretty open for non-commercial use. So if you want the XML, you can get the original and use any tools/technologies you want. If you're talking about the SPARQL endpoint, you'd be looking at using any RDF tool and querying the endpoint, and then doing anything you want. I think the idea is to let everyone explore as many possibilities as they can, so as long as nothing goes against the license, I'd say use any thing you want.

I think the idea is to let everyone explore as many possibilities as they can, so as long as nothing goes against the license, I'd say use any thing you want.

what?

Thanks much!
Bob

Thanks much!
Gokhan

Thanks much!
Gokhan

thank you sharing

thanx for informations

Thanks. :)

Not sure if this helps, but you might be able to map the 'schools' in the data into an appropriate term (School) in the AIISO by Rob Styles and Nadeem Shabir at http://vocab.org/aiiso/schema-20080514

Owen,

Good call, I that that'd be very appropriate. I'll really ought to rewrite the XSLTs to make them better organized, and I'll see about adding that mapping in. If no one beats me to it, I'll also put an equivalentClass statement in the openvocab definitions.

Thanks much!
Patrick

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options