I have a lot of appreciation for libTidy - its an awesome tool for avoiding issues when dealing with foreign HTML and trying to transform it with XSL.
However, I ran into an issue yesterday that I think is worth mentioning that involves the words in the title of this post:
- HTML Comments
- Script Tags (elements)
You may have come across this type of code in the source of an HTML document:
LibTidy does something similar, it wraps the inner contents of script tags with CDATA elements. This was problematic for me because I was then trying to process the output with XSL. XSL escaped the CDATA comment tag as:
I was able to work around this first by disabling output escaping in my stylesheet for script tags, but I decided to fix the problem at the root and remove the comment tags.
So now the default behavior is that libTidy is escaping script tag contents with CDATA tags, and libXSLT escapes their content, dropping the CDATA tags. Works fine as long as there aren’t any entities in there!
UPDATE: After wrestling with the comment, I found the strangest bug I’ve ever encountered.