Another Ruby Moment

August 22nd, 2009

I wanted a regular expression to parse an (x)HTML snibbet and escape the contents of pre tags, non-greedily.

Its most likely a multi-line string, so I wanted something with a wider scope than cat and sed. Perl? Probably, but Ruby's object-oriented nature is so wonderful. I tried it, and as usual the regular expressions took a little bit of work, but this is what I came up with:

mystring = %q{<bodycontent>
<p>How are you?</p>
<pre>OK<br /></pre>
mystring.gsub!(/<pre>.*?<\/pre>/m) {|esc|
    esc.gsub('<br />',"\n").gsub('<','&lt;').gsub(/&lt;(\/)?pre>/,'<\1pre>')
puts mystring
Yearly Indexes: 2003 2004 2006 2007 2008 2009 2010 2011 2012 2013 2015 2019 2020 2022