From Atom to SIOC + Tagging with ARC2 and SimplePie

Tagged:  •    •    •    •    •  
Project(s): 

A 'directory' for UMWBlogs has been cooking conceptually for some time, and has now started cooking into code. The first thing I'm working on is grabbing data from feeds and turning into RDF. I have a test/demo of the first draft of the system here. Please feel free to give it a whirl, and offer comments on improvements here.

What I have so far lets you enter a blog or post address, and it tries to find an Atom feed for it (RSS2 is coming soon), then uses that feed to build SIOC and Tagging data. SimplePie searches for the feed and does the parsing, then I have scripts using ARC2 to build up the RDF.

I'm still working through many issues, and I'm sure others will notice more that I haven't yet identified. Minting URIs for the tags is one issue. Atom generally gives a scheme attribute for categories, which I'm using to build the URI (but not always--depends on the generator). But when I move forward to RSS2, I'll have to get more creative. One option, of course, is to forget about using URIs based on where it comes from and put them all into the ravendesk.org namespace that this project will eventually be using. But I was hoping to maintain a quick way to connect to the list of tags from the site itself.

Another big struggle is with URIs for users. Right now, they're being built from the author element in the Atom feed. That will clearly need some work, since it is based on the display name, and not on an actual or unique username in the system. (There's a project design constraint that I can't count on getting others to install the Wordpress SIOC plugin, and in the future there will be blogs and other sites that don't allow plugins at all). Eventually, I'll have to build a registration system, and so people will be able to register their blogs and I might be able to figure something out through that.

Nevertheless, I'm relatively happy so far...and I'd love to get comments and feedback.

Trackback URL for this post:

http://www.patrickgmj.net/trackback/128

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options