12.26.07

Realizing AtomPub entries from XML using ROXML

Posted in Ruby, blog, rails at 11:00 am by Robert Horvick

I have done some digging and there does not seem to be any plugins or gems (or even active projects on RubyForge) that are providing support for an AtomPub server. I did find a sample based on Camping but it was not really what I was looking for. It supports a subset of AtomPub and not in a way that is really friendly for what need.

So I set out on my own.

An atom pub entry can be as simple as this:

<atom:entry xmlns="http://www.w3.org/2005/Atom">
  <atom:title>ATOM Post Test</atom:title>
  <atom:id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</atom:id>
  <atom:content>Some text.</atom:content>
</atom:entry>

Or quite a bit more complex (lifted from http://tools.ietf.org/html/rfc4287):

<entry>
  <title>Atom draft-07 snapshot</title>
  <link rel="alternate" type="text/html"
    href="http://example.org/2005/04/02/atom"/>
  <link rel="enclosure" type="audio/mpeg" length="1337"
    href="http://example.org/audio/ph34r_my_podcast.mp3"/>
  <id>tag:example.org,2003:3.2397</id>
  <updated>2005-07-31T12:29:29Z</updated>
  <published>2003-12-13T08:29:29-04:00</published>
  <author>
    <name>Mark Pilgrim</name>
    <uri>http://example.org/</uri>
    <email>f8dy@example.com</email>
  </author>
  <contributor>
    <name>Sam Ruby</name>
  </contributor>
  <contributor>
    <name>Joe Gregorio</name>
  </contributor>
  <content type="xhtml" xml:lang="en"
    xml:base="http://diveintomark.org/">
      <div xmlns="http://www.w3.org/1999/xhtml">
        <p><i>[Update: The Atom draft is finished.]</i></p>
      </div>
  </content>
</entry>

I started with just caring about title and content. Everything else I could infer from authentication (author, anyway).

With this thought, I created some simple xpath queries to find the title and content. That was working out. I could get the atom response created and the post serialized to the database. So with just a few lines of code (beyond the scaffolding) I was able to use Live Writer to post via atompub to my blog engine.

But I wasn’t happy with just winking away the atompub protocol in favor of getting done quickly.

So I sat down and really started reading the atom publishing spec (and the a related atom rfc). I have a lot of work to do.

I started using XmlSimple to parse the XML. That was ok, except some things don’t map well to hashes and it did not get me closer to serializing atom entries to xml (for the feed later on).

So I moved down to REXML and that was working ok. But I still wasn’t happy. I felt like I was writing too much monkey code and not doing making progress on my real goal.

Tinally I found ROXML - and now I’m happy. I’ll cut to the chase and just show the code:

require 'rubygems'
require 'roxml'

class AtomBase
  include ROXML

  def to_s
    self.to_xml
  end
end

class AtomAuthor < AtomBase
  xml_name "author"
  xml_text :name
  xml_text :uri
  xml_text :email
end

class AtomContributor < AtomBase
  xml_name "contributor"
  xml_text :name
  xml_text :uri
  xml_text :email
end

class AtomLink < AtomBase
  xml_name "link"
  xml_attribute :rel
  xml_attribute :type
  xml_attribute :href
  xml_attribute :length
end

class AtomContent < AtomBase
  xml_name "content"
  xml_attribute :type
  xml_attribute :lang
  xml_attribute :base
  xml_text :text, nil, ROXML::TEXT_CONTENT
end

class AtomSummary < AtomBase
  xml_name "summary"
  xml_attribute :type
  xml_attribute :lang
  xml_attribute :base
  xml_text :text, nil, ROXML::TEXT_CONTENT
end

class AtomEntry < AtomBase
  xml_name "entry"
  xml_text :title
  xml_object :link, AtomLink, ROXML::TAG_ARRAY
  xml_text :id
  xml_object :summary, AtomSummary
  xml_text :updated
  xml_text :published
  xml_object :author, AtomAuthor
  xml_object :content, AtomContent
  xml_object :contributor, AtomContributor, ROXML::TAG_ARRAY
end

The major gap is that this does not support xhtml content and summary fields. Those fields contain embedded XML and I have not figured out how to get ROXML to stop navigating the tree and make the object property return the inner xml.

I did try hacking ROXML to support an XML_CONTENT tag. It solved about 70% of the problem but it was not working exactly as I wanted after about a half hour and I felt like I was shaving a yak. I can come back to that later on. For now that the atom bits were are neatly wrapped up behind a class with a known bug. I’m ok with that.

I haven’t reviewed the atom syndication spec to make sure I’m doing everything right but it’s working with all the samples I’ve thrown at it (including the samples from the spec) and Live Writer seems to be having a field day with it. So that’s good.

Anyway - that’s where I’m heading.

I have the feeling that I’m reinventing the wheel. I hope not poorly.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

5 Comments »

  1. Matt Platte said,

    December 26, 2007 at 2:37 pm

    Hey, I enjoy watching wheelwrights at work, so thanks for opening your shop door so’s I can watch.

  2. Mark said,

    December 26, 2007 at 3:57 pm

    For XHTML have you looked @ hpricot?

  3. Joe Cheng [MSFT] said,

    December 28, 2007 at 4:09 pm

    Robert, great to see this–I too was wondering why there wasn’t more Ruby tooling for Atom. If you end up releasing this I’d encourage you to drop a note on the atom-protocol mailing list:
    http://www.imc.org/atom-protocol/

    Also, I have posted some notes about Writer’s AtomPub implementation that you may find helpful:
    http://jcheng.wordpress.com/2007/10/15/how-wlw-speaks-atompub-introduction/

    Please e-mail me with any questions.

  4. Anonymous said,

    February 24, 2008 at 1:59 am

  5. legal Online Buy Hydrocodone said,

    April 9, 2008 at 1:53 pm

    307order Buy Hydrocodone Online Hydrocodone Buy prescription Online

Leave a Comment