Linked Open Omeka

Tagged:  •    •    •    •    •  

Here it is, my first stab at starting to develop a pluging for Omeka: Linked Open Omeka (download below -- hopefully this will find it's way to the official Omeka site someday). Update Now in trunk .It's really incomplete and there is a lot of work to do, but it's a start at taking an Omeka installation and connecting the data in with the Linked Data Cloud. For digital humanities, I hope that this will be a very useful move. First, I hope to help Omeka collections and exhibits intermesh with other data sources, for example by letting an item in Omeka about Thomas Jefferson connect up with the DBpedia data about Thomas Jefferson.

The first step is was to try to make it work with the Linked Data principles in Cool URIs for the Semantic Web, especially 303 Redirects and Content Negotiation. I hope I've figured out how to do that properly. If not, I hope someone will correct me. Since Omeka is built on the Zend Framework, if it really does work out properly the approach should be generalizable to any app built on it.

Making Zend do 303 redirects and content negotiation

Content negotiation and 303 redirects for Linked Open Data aims to have one URI to point to for the resource. Then, depending on the HTTP Accept header, it redirects to either a page of RDF (if request was for application/rdf+xml) or a human readable page if the request is for html.

So to do this in Zend, I tried to follow the pattern used by DBpedia. They have URIs of the form http://dbpedia.org/resource/(name), where (name) corresponds to the last part of the URL for the Wikipedia entry. If you request html, it goes to http://dbpedia.org/page/(name). If you request rdf, you get back the rdf that gets sucked in from another file. (At least that's my best understanding of how it works). To do this within the Zend framework, you need a controller to handle the redirects, and a router in the plugin to guide the request to it. Here's what the directory structure and router looks like:

    - LinkedOpenOmeka
        -controllers
            -ResourceController.php
            -DataController.php
         -models
             -Graph.php
         -views
             -public
                 -rdfxml.php
                 -rdfjson.php
                 -humanreadable.php
    
function linked_open_omeka_define_routes($router) {
    
    
    $router->addRoute(
        'linked_open_omeka_resource',
        new Zend_Controller_Router_Route(
        'resource/*',
            array('module' => 'linked-open-omeka',
            'controller' => 'resource',
            'action' => 'redirect'
            )
        )
    ); 
    
    $router->addRoute(
    'linked_open_omeka_data_humanreadable',
    new Zend_Controller_Router_Route(
        'data/humanreadable/*',
        array('module' => 'linked-open-omeka',
            'controller' => 'data',
            'action' => 'humanreadable'
            )
        )
    );
    
    $router->addRoute(
    'linked_open_omeka_data_rdfxml',
    new Zend_Controller_Router_Route(
        'data/rdfxml/*',
        array('module' => 'linked-open-omeka',
            'controller' => 'data',
            'action' => 'rdfxml'
            )
        )
    ); 
    
    $router->addRoute(
    'linked_open_omeka_data_rdfjson',
    new Zend_Controller_Router_Route(
        'data/rdfjson/*',
        array('module' => 'linked-open-omeka',
            'controller' => 'data',
            'action' => 'rdfjson'
            )
        )
    ); 
    
    
}
      

The first router guides everything with a URI like http://youromeka.org/resource/* to the ResourceController, which does the work of redirecting the request according to the request's Accept heading. Here's the full code of my ResourceController (such as it is, no claims that I'm not doing silly things!). In essence, it just takes the path, like resource/items/show/1 and redirects it appropriately to data/rdfxml/items/show/1 or data/humanreadable/items/show/1.

            
class LinkedOpenOmeka_ResourceController extends Omeka_Controller_Action
{
            /*
            * Parts lifted and modified from 
            * http://www.w3pop.com/public/resource/book/ZendFramework10/zend.controller.actionhelpers.html
            */
            public $_redirector = null;
            
            
  public function init()
     {
               
        $this->_redirector = $this->_helper->getHelper('Redirector');
        $this->_redirector->setCode(303);
        $req = $this->getRequest();
        $this->_path = $req->getPathInfo();
        $reqMime = $req->getServer('HTTP_ACCEPT');
                
         if (false !== strpos($reqMime, 'application/rdf+xml' ) ) {
           $this->getResponse()->setHeader('Content-Type', 'application/rdf+xml');	
           $this->_redirectPath = $this->buildRedirectPath('rdfxml');	
                
        } else if (false !== strpos($reqMime, 'application/rdf+json' ) ) {
             $this->_redirectPath = $this->buildRedirectPath('rdfjson');		
        } else {
             $this->_redirectPath = $this->buildRedirectPath('humanreadable');
         }
     }
            
            
     public function redirectAction() 
     {
         $this->_redirector->gotoURL($this->_redirectPath);	        
      }	
            
            
     public function buildRedirectPath($action) 
     {
         $redirectPath = '/data/' . $action;
         $redirectPath .=  $this->_path;
         $redirectPath = str_replace('resource/' , '', $redirectPath);
          return $redirectPath;
      }
}    

That passes the work off to the other controller, the DataController. In the plugin, the appropriate routers have been built to handle the possibilities. Usually, one would just have one route for rdfxml and another for human readable data. I have a third for rdfjson. That's just because the Graph class that I'm using was originally built to spit out RDF/JSON for another application. So while it was there, I figured I'd throw that option in, too.

Then, it's just up to the DataController to parse out the path that it gets to be able to locate the appropriate resource in Omeka and generate the RDF or human-readable data and dump it into the view. Here again, since I had a Graph class lying around that could be tweaked to spit the data out, I just used that and pushed the graph into the view. Then, my views consist of little more than

            <?php echo $this->graph->toRDF() ?>
            and
            <?php echo $this->graph->toJSON() ?>            

Here's a few screenshots of the results. I have the Tabulator Data Browser plugin installed, which forces Firefox to accept RDF/XML then renders it. (It sometimes seems to make Firefox crap out when it starts building up large graphs, but I still highly recommend it for exploring). I just have a quick site of random fictional entries (Many thanks to our student aide, Shannon, for putting them in!), but hopefully this will give the idea. For any resource being shown (e.g., ends in a URL like "/show/*", you can slide in "resource/" after the site's base URL to look at the Linked Data view. So, for example, if you start at http://devel.patrickgmj.net/omeka/items/show/13 and put "resource" between "omeka" and "items" in the url like so http://devel.patrickgmj.net/omeka/resource/items/show/13, you'll switch to the LOD version.

First, the top-level view of the site. For the site, I've given that a URI like "/omeka/resource/site", like so.

site view gives overall site architecture

Click for larger image, or go to the data

I'm using the SIOC ontology to describe the site, even though there are some questions I have about how I'm doing that (more below). This lists out all the Items on the site, as well as all the Collections (Containers in SIOC terms). Even though there's not much data about them yet, all the Users are also listed. Hopefully, then, this provides a starting point for a Linked Data crawler to start following its nose through the site.

Next, the RDF of an Item. Following SIOC structures, I'm distinguishing between the Item and the resource that the Item is about (sioc:about). So the metadata about the Item is about where it fits into the Omeka installation's overall structure, while the metadata for the object is about the object itself. Even though there's not a strong distinction in Omeka's interface, so far I think that's not an issue at all since the metadata for the Item is stored happily in the Omeka record for it, and so all the fields created when a new Item is made really seem to apply just to the object. The only exception is the title, which is easy enough to apply to both.

One major limitation so far is that I don't have much of anything for asserting links out. Eventually, I would like to have an easy way to make owl:sameAs links for the objects to DBpedia or other Linked Open Data URIs, and similarly have an easy way to put external links on the creator, subject, and other properties. That's work for the future. For now, I've just used the relations field to make some links out.

item about a clip from Mad Max trailer related to Mad Max DBpedia and Linked Movie Database entries via dcterms:relation

Click for larger image or go to the data

This is one where the Tabulator can get sluggish, since I've included a relation linking out to DBpedia. Tabulator goes out and grabs that RDF, too, which makes the graph pretty big. But at least this shows the first steps toward making this really linked into the LOD cloud.

item about an image of Walt Whitman related to Whitman's DBpedia entry via dcterms:relation

Click for larger image or go to the data

Omeka's internal structures handle tags and taggings in the same way that Richard Newman't Tag Ontology does, so I just carried that right over. In the future, I'd like to add a reference to the user who created each tagging.

items and taggings are related via tag:tagging

Click for larger image or go to the data

taggings and tags are related via tag:associated_tag

Click for larger image or go to the data

Unresolved Issues

Asserting that a sioc:Site is also a sioc:Container

I don't see anything in the specs that make this assertion fail validity tests, but I also see some reasons to raise an eyebrow, at the least. I haven't asserted explicitly that the site is also a container, but that's a quick inference from the definition of sioc:parent_of and sioc:container_of.

Omeka vocabulary and item types

I haven't done much yet with classes for Omeka site structures (like Collections and Exhibits), or Item Types. For different Item Types in Omeka, you can also create custom metadata fields, which would need to have site-specific URIs minted for them.

Handy URIs for metadata fields

Right now I'm just trying to map things onto dcterms as best I can. I'd like to find a way to build URIs into Omeka's Element sets. I'm thinking now of building new Element Sets specifically for Linked Data. That way there's no reason to touch what currently exists. Maybe things like a LinkedDataDcterms Element Set, a LinkedDataFoaf Element Set, etc.

Relatedly, I'd like to have a user profile plugin that'd make use of FOAF
properties, especially foaf:interest.

What Does This Give Us?

We have a (mostly) complete structured representation of an Omeka site's structure and the objects described within it.

I imagine one first step from here being a store that nom-nom-noms up all that data and puts it at one SPARQL endpoint, which would allow queries against the entire set of open Omeka installations.

That would be a step toward responding to questions like:

  • What Omeka installations have content about X?
  • What Omeka installation use tag X?
  • What Omeka installation have similar content, but organized differently?

Let's add user profiles, and associate them wherever we can

What does that give us?

It'll help respond to questions like:

  • What's a person's history of tagging within an open Omeka installation? Across many installations?
  • What's the history of a tag on various items, across many people?
  • How is the same tag being used on many different things?
  • What groups of people are forming around a tag, across all open Omekas?

Let's make it really linked, with links out. Start just with links to DBpedia for subjects and ItemObjects

What does that give us?

Beyond being a means of disambiguating references, that would be a step toward responding to questions like:

  • I'm interested in this Wikipedia article. What Omeka installation will tell me more? (A link to DBpedia provides a reference to the Wikipedia entry, so they can be neatly related)
  • What Items in all open Omekas fall into the same categories? (DBpedia provides an organizational structure based on info in Wikipedia. It ain't always pretty, but serendipity ain't always pretty, either.

Let's put the last two together. User profiles with interests by DBpedia URIs

What does that give us?

It'll continue the train, helping us respond to questions like:

  • I'm interested in the topic this Wikipedia article talks about. Who else in Digital Humanities
    has a similar interest?
  • Who else is building Omeka sites around my interests?
  • Who else with the same interest is building Omeka sites?
  • What topics are people with similar interests building Omeka sites about?

Where to go from here?

When we put this all together, we link up information sources to related information sources. That's hugely good. But the trajectory I'm trying to plot here ain't about sources of data. It's about helping the right people find each other. No matter what field, or interest, or research agenda, or level of inquiry, it's all about being able to discover the right person who might be able to respond to your interesting questions.

That's what Linked Open Data is all about to me. The data is great, but the more important thing is offering ways of discovering the right conversations by way of open data.

AttachmentSize
omeka-linkedopenomeka-plugin-1.0-0.1beta.zip13.14 KB

Trackback URL for this post:

http://www.patrickgmj.net/trackback/175

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Doug,

For the object URIs, yes, exactly -- the idea is definitely to link to things like the DBpedia entry for Thomas Jefferson, or other appropriate LOD URIs (There's a variety of good sources for URIs for various domains, like geonames.org for places, buildings, landmarks, etc). As it stands, one could conceivably put those URIs in now. I've done that with the Relation field in a couple of the examples. In the MadMax example, the image shows the data from DBpedia being pulled in, and above it there's also a link to the Linked Movie DataBase.

The reason not to do that, and instead build the separate LOD fields corresponding to creator, subject, relation, etc. is that those URIs would also show up in the regular page for the Item, which would be ugly at best!

So for the object URIs, they should definitely link out to other URIs. Similarly, there should be a way to say that the subject URI is the same as another URI. So, for example, if the Item is about Thomas Jefferson, there should eventually be a way to assert that the Omeka-specific URI for that ItemObject is owl:sameAs DBpedia's entry.

The really sticky stuff will be in what you mentioned--the predicates. I've barely begun exploring this, but it looks like most of the out-of-the-box fields and Item Types will map happily onto existing predicates and classes in other vocabularies. But, since Omeka also allows you to create your own installation-specific fields, there will have to be a way to mint installation-specific URIs for them.

I think that could be really exciting in terms of the conversations we had about standards at THATCamp. If we pull in to one data store all of the different installation-specific fields/predicates, we'd have a way to see what project-based, ad hoc standards people are creating. If there are very different fields used to describe similar objects, that would reveal different ways that people are thinking about and categorizing those similar objects. And getting that sense of different ways of categorizing similar things might move toward what you were wondering about -- how to use data to raise questions -- because comparing categorizations is really comparing perspectives.

Patrick, this is great that you are inviting us to look over your shoulder as you develop this and think out loud about what you envision will be enabled. There are details I don't fully understand yet, but I do like the way you are minting URIs for useful resources, and what I wonder about is whether this approach is going to lead to installation-specific URIs for metadata values. Omeka has distinct URIs for items, collections, users, exhibits, etc., but (if I follow right) those are always the subjects of your triples. Omeka doesn't yet have installation-specific URIs for the objects of triples -- or predicates, for that matter -- but I can imagine those might be also useful. To spell that out in terms of your example, the string "Thomas Jefferson" may be a value attached to the "subject" field in an Omeka record, and the string "Thomas Jefferson" (or maybe "Jefferson, Thomas") may be a value in a "creator" field if Jefferson is the author of an item. But even if identical these are just string values. Omeka itself doesn't reify "Thomas Jefferson" as a resource in its own right about which assertions can be made, or can be found at a sensibly constructed ready-made URI. Is the idea of your separate LOD fields that the values would not be strings but URIs pointing to places such as DBpedia? I can see something like that kind of constraint being useful for editing, quality control, and browsing within a single Omeka installation, alongside the considerable value of linking multiple data sources.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options