To XHTML or not to XHTML?

Today, we had a conversation about HTML 4 vs XHTML 1.0. For me, the matter was neatly settled they very first time I saw an XML system produce XHTML like this:

  <p>An article with an <em/> empty emphasis tag.</p>

Perfectly legal XML, perfectly legal XHTML. But — if you serve up this XHTML as text/html (which 99.99% of the world does), then you end up with this:

Empty tags considered harmful

Why? Because it’s parsed as HTML. And the browser sees the start of an em tag, but no close.

And now I make sure that all our sites emit HTML 4. It’s a lot simpler.

This isn’t to say I don’t use XHTML. It’s a fine medium for further processing (e.g. applying XSLT). But it’s not right for serving up to browsers verbatim.

Comments 6

  1. Aristotle Pagaltzis wrote:

    You can’t serve XHTML as text/html. Whatever you serve as text/html is HTML – malformed HTML that happens to look like XHTML, maybe, but it’s HTML nonetheless. The only way to serve XHTML is to serve it as application/xhtml+xml.

    Posted 15 Jul 2009 at 21:44
  2. dom wrote:

    @Aristotle Pagaltzis It’s kind of vague for XHTML 1.0, IIRC. This is just an example of why it’s a really bad idea.

    Posted 16 Jul 2009 at 11:44
  3. Aristotle Pagaltzis wrote:

    No, it’s not vague at all. If you serve it as text/html, then browsers parse it as text/html, no matter what doctype and syntax you may have used.

    Posted 16 Jul 2009 at 20:22
  4. Mark wrote:

    You might talk about the config and/or code changes you made to switch between XHTML and HTML 4. The header has already been covered (text/html vs. application/xhtml+xml, but presumably you changed other things, like perhaps a dialect choice in XSLT, etc.

    Great blog BTW, I seem to keep running across entries when I’m doing Google searches, always a good read.

    Posted 23 Sep 2009 at 23:59
  5. dom wrote:

    @Mark The actual switch is worth documenting actually. This was on a Cocoon site and we found that simply switching the serializer from xhtml to html wasn’t enough. We also had to add the following XSLT transform into the pipeline in order to strip out the XHTML namespace before serialization. I never did get to the bottom of why this was necessary.

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
     
      <xsl:template match="*">
        <xsl:element name="{local-name()}">
          <xsl:apply-templates select="@*|node()" />
        </xsl:element>
      </xsl:template>
     
      <xsl:template match="@*">
        <xsl:attribute name="{local-name()}">
          <xsl:value-of select="." />
        </xsl:attribute>
      </xsl:template>
     
      <xsl:template match="processing-instruction()|comment()">
        <xsl:copy>
          <xsl:apply-templates select="node()" />
        </xsl:copy>
      </xsl:template>
    </xsl:stylesheet>
    Posted 24 Sep 2009 at 11:32
  6. dom wrote:

    @Mark Actually, I wasn’t talking about the header. I was talking about serving XHTML served up as text/html. This is what caused so many problems I switched to serving up HTML 4 instead.

    Posted 24 Sep 2009 at 14:30