October 29, 2003
1/11/2022: This page is old and somewhat outdated; it remains here only for archival purposes.
Many website designers are now migrating from HTML to XHTML in an effort to become more standards compliant. After two-and-a-half years of using XHTML on The Oo Kingdom, I am doing exactly the opposite. This page explains why.
Kindly stop reading and let that sink in for a minute. If you
are using XHTML on your site, you are not using valid HTML. Try
changing only the DOCTYPE
and leaving the
rest of the document alone, and then try validating it. You’ll
see what I mean. Any extra slashes in the head
of
the document will return errors, because XHTML is “a horse of
a different color.”
Every file has a MIME type;
computers must know what type of file it is so they know what
application or plug-in should be used to open and read the file.
Normal, “old-fashioned” HTML carries the MIME type
text/html
. That’s pretty basic; it’s simply
text—hypertext, to be specific—and more
specifically, it’s HTML.
XHTML, properly
served, carries the MIME type
application/xhtml+xml
. What’s the difference?
While HTML is merely text used as a markup
language, XHTML is an application of XML, the Extensible
Markup Language. (By the way, the parent language of HTML
is called SGML
[Standard Generalized Markup Language] and has been around since
the 1980s.)
The problem with MIME types for Web pages is that most of
today’s browsers do not support application/xhtml+xml
.
Among these “old-fashioned” browsers is the popular
Internet Explorer 6.0 for Windows. Our statistics for October 2003
indicate that 70% of our visitors are using this browser, and another
20% are using older versions of Internet Explorer. Newly emerging
technologies such as PDAs,
mobile devices and Internet-capable cell phones generally don’t
support it either; they support something closer to HTML 3.2.
Our recent solution to this problem was to serve XHTML as
application/xhtml+xml
to browsers which will accept it
(currently Netscape 6/7 and Mozilla) and as text/html
to everyone else (Opera 6 and 7 will accept it, but scripting is
an issue with them; more on that later). This works, but to me
it is fundamentally wrong. The same
Web page cannot be of two different MIME types any more than a
JPEG image
can be played as a video, or an ordinary donut can be a cream-filled
Bismark, or a cup of black coffee can pass as a cappuccino.
Yet that is precisely what we are trying to do, if we serve
XHTML as text/html
, the MIME type that most (even
modern) browsers can understand.
Some get around this by dynamically changing the DOCTYPE
and removing the extra slashes, or by using content negotiation to
serve a separate HTML page to non-compliant browsers. The first
approach requires that all HTML files be saved with .php
extensions, which for most means changing all of the URIs—not a good thing.
The second approach means having to maintain two separate pages,
which can be a pain if you update frequently. It would also divide
your traffic among the two pages—not good for search engine ratings.
Neither is a tidy solution. I prefer the “one page fits all” approach;
after all, it is the World Wide Web, right?
Most websites make extensive use of client-side scripting,
particularly JavaScript. To complicate matters for prospective
XHTML users, JavaScript works differently in an XML environment
than it does with normal HTML. Mark Pilgrim goes into some depth
on this in his article
for xml.com. One major problem is that often-used methods like
document.write
simply don’t work in XML, so other
methods must be used instead. I tried this on our site, but it
meant actually using both methods, since the newer methods
aren’t supported by many existing browsers (including Opera 6;
that’s why I don’t serve application/xhtml+xml
to Opera).
Then another problem surfaced: HTML entity codes for special text characters appeared as plain text using the newer methods, so I had to key the characters into the script directly. This meant using characters which are not allowed in HTML. They work fine on Windows, but I wonder what weird Latin characters are showing up on other platforms?
Update, January 31, 2004: I later learned the syntax for presenting the characters in XML. We are using the greeting script on our Home page now.
If there’s one thing I hate about designing websites, it’s
using redundant code. Specifically I am referring to redundant
name
and id
attributes in anchor tags,
and redundant style sheet linking methods.
XHTML requires the id
attribute for anchor tags
used as link targets. XHTML 1.1 does not allow the name
attribute in anchors; XHTML 1.0 allows both for backwards
compatibility with older browsers (such as Netscape 4) that don’t
support linking to id
attributes. But regular HTML
does not require the id
attribute, and everyone supports
name
, so why use both?
The <link>
tag is used in HTML for
linking to external style sheets. Generally this works in
XHTML also, but the preferred method is to use the
?xml-stylesheet
syntax. To satisfy standards,
both methods should be used in “backwards-compatible
XHTML.” Hardly anyone actually does this, and I consider it
a waste of time and bandwidth. The <link>
tag works fine for everyone in regular HTML.
What is “backwards-compatible XHTML,” anyway?
The only reason XHTML works on older browsers at all
is because they are forgiving of HTML errors. They
have to be, because 95% of all existing websites are written
in invalid HTML. This is mainly due to how the Web evolved;
browser manufacturers invented their own proprietary tags, and
now most people still use them. Examples of these include
<marquee>
, <embed>
,
<bgsound>
and <font face…>
.
Enough said.
The bottom line for me (and probably for most) is this: if you don’t need the extensibility of XML, XHTML is unnecessary. “What about forward compatibility?” you may ask. HTML is not going away any time soon. Too many expensive corporate websites are built with it. Remember when they said the United States would be entirely on the metric system by 1978? It’s now 2003, and everything is still measured in feet and inches here, because the public was so strong to resist change. So I (and others) predict it will be with HTML.
Unfortunately, this means the 21st Anniversary Edition of The Oo Kingdom will be short-lived. It’s back to the drawing boards, this time for a Greatly Simplified Version in HTML 4.01 Strict.