Tidy is a nice little library which is used to to clean up HTML. Its been around for awhile, and from what I understand, it has been retained and maintained by a group of users after the original author, Dave Raggett, decided to halt upkeep.
There are many language bindings, and I've recently been using a lot more for converting HTML into valid XML, which can be transformed by XSLT into XHTML. Good stuff!