<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jabbering Giraffe &#187; css</title>
	<atom:link href="http://happygiraffe.net/blog/tag/css/feed/" rel="self" type="application/rss+xml" />
	<link>http://happygiraffe.net/blog</link>
	<description></description>
	<lastBuildDate>Wed, 19 Oct 2011 10:40:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>Clich&#233;s are hard</title>
		<link>http://happygiraffe.net/blog/2007/12/19/cliches-are-hard/</link>
		<comments>http://happygiraffe.net/blog/2007/12/19/cliches-are-hard/#comments</comments>
		<pubDate>Wed, 19 Dec 2007 06:42:00 +0000</pubDate>
		<dc:creator>Dominic Mitchell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[css]]></category>
		<category><![CDATA[html]]></category>
		<category><![CDATA[javascript]]></category>

		<guid isPermaLink="false">http://happygiraffe.net/2007/12/19/cliches-are-hard/</guid>
		<description><![CDATA[So yesterday, Jeremy asked: Wondering if accents are valid in class names (so I can mark up some text as being of the class &#8220;cliché&#8221;) It&#8217;s a damned good question. And you have to consider: character encodings; CSS; HTML; XHTML; &#8230; <a href="http://happygiraffe.net/blog/2007/12/19/cliches-are-hard/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>So yesterday, Jeremy <a href="http://twitter.com/adactio/statuses/511587712">asked</a>:</p>
<blockquote>
<p>Wondering if accents are valid in class names (so I can mark up some text as being of the class &#8220;cliché&#8221;)</p>
</blockquote>
<p>It&#8217;s a damned good question.  And you have to consider: character encodings; <span class="caps">CSS</span>; HTML; <span class="caps">XHTML</span>; JavaScript; <span class="caps">HTTP</span>.  Needless to say, it&#8217;s more complicated than it looks at first.</p>
<p>My first thought was that of <span class="caps">CSS</span> files.  Is it valid to say:</p>
<pre class="css">
  p.cliché { color: #f00; }
</pre>
<p>To answer this question, you have to visit the <a href="http://www.w3.org/TR/CSS21/"><span class="caps">CSS 2</span>.1 spec</a>.  Near the end is <a href="http://www.w3.org/TR/CSS21/grammar.html#grammar">§G.1 Grammar</a>.  It contains a <a href="http://en.wikipedia.org/wiki/Backus–Naur_form"><span class="caps">BNF</span></a> grammar describing the syntax of <span class="caps">CSS</span>.  They&#8217;re not that difficult to read when you get the hang of them.  In this case, I start by finding something I can recognise: a selector.  Then, I work down through the grammar to find what I&#8217;m interested in.</p>
<ul>
<li><code>selector : simple_selector [ combinator simple_selector ]* ;</code>
<ul>
<li>a selector is composed of one or more simple_selectors.</li>
</ul>
</li>
<li><code>simple_selector : element_name [ HASH | class | attrib | pseudo ]* | [ HASH | class | attrib | pseudo ]+ ;</code>
<ul>
<li>A simple_selector is composed of an element_name followed by zero or more an ID, class, attribute or pseudo selector.  Alternatively, it&#8217;s composed of one or more ID/class/attribute/pseudo selector (without an element name).</li>
</ul>
</li>
<li><code>class : '.' IDENT ;</code>
<ul>
<li>A class name is just &#8221;.&#8221; followed by an identifier.  That&#8217;s what we&#8217;re interested in here.</li>
</ul>
</li>
<li><code>ident    -?{nmstart}{nmchar}*</code>
<ul>
<li>This is now in §G.2.  But you can see that an identifier has an optional leading minus, followed by an nmstart and zero or more nmchar.  It&#8217;s those nmchar that we care about.</li>
</ul>
</li>
<li><code>nmchar [_a-z0-9-]|{nonascii}|{escape}</code>
<ul>
<li>nmchar allows letters, numbers, underscores and minuses, as well as non-ascii characters and escapes.  Oooh!  Getting closer!</li>
</ul>
</li>
<li><code>nonascii    [\200-\377]</code>
<ul>
<li>This is a horrid notation.  It&#8217;s an <a href="http://en.wikipedia.org/wiki/Octal">octal</a> character range.  Octal stopped being in general usage in the early 80s, although Unix and C perpetuate them.  Anyway, it says that any character whose code is between 128 and 255 is allowed.</li>
</ul>
</li>
</ul>
<p>So we get an answer: Because é is U+00E9 (or, 233 decimal), it&#8217;s allowed as part of an identifier in a <span class="caps">CSS</span> file.</p>
<p>But it&#8217;s worth noting the arbitrary limit of 255 here.  That means that you don&#8217;t get to use any unicode character above that (e.g. Ā [U+0100]) verbatim in a <span class="caps">CSS</span> file.  Instead, you have to escape it by saying (according to the <code>escape</code> declaration in that grammar) <code>\h100</code>.  Which is quite nasty.</p>
<p>There&#8217;s one other wrinkle to consider before this will work.  You also need to ensure that the <span class="caps">CSS</span> file is served over <span class="caps">HTTP</span> using the correct character set.  If you&#8217;ve saved it as <a href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1">Latin-1</a>, you need to ensure that it&#8217;s served up with this header:</p>
<pre>
  Content-Type: text/css; charset="iso-8859-1"
</pre>
<p>This is the default, so it could be left off, but it&#8217;s usually better to be explicit.  Likewise, if the file is saved as <a href="http://en.wikipedia.org/wiki/UTF-8"><span class="caps">UTF</span>-8</a>, you need this header to be added.</p>
<pre>
  Content-Type: text/css; charset="UTF-8"
</pre>
<p>If you&#8217;re using <a href="http://httpd.apache.org/">Apache</a>, check out the <a href="http://httpd.apache.org/docs/2.2/mod/core.html#adddefaultcharset">AddDefaultCharset</a> and <a href="http://httpd.apache.org/docs/2.2/mod/mod_mime.html#addcharset">AddCharset</a> directives.</p>
<p>So that&#8217;s <span class="caps">CSS</span>.  But what about <span class="caps">HTML</span>?</p>
<p><span class="caps">HTML</span> is defined in the <a href="http://www.w3.org/TR/REC-html40/"><span class="caps">HTML 4</span>.01 specification</a>.  It&#8217;s defined using <a href="http://en.wikipedia.org/wiki/Standard_Generalized_Markup_Language"><span class="caps">SGML</span></a>, which means more complication in order to work out what the heck&#8217;s going on.  Thankfully, everybody knows that there are four ways to get an é into <span class="caps">HTML</span>:</p>
<ul>
<li>A literal <code>é</code>.</li>
<li>A character entity: <code>&amp;eacute;</code></li>
<li>A decimal character reference: <code>&amp;233;</code></li>
<li>A hex character reference: <code>&amp;xE9;</code></li>
</ul>
<p>In order to figure out what characters are allowed in a class attribute, though, you have to go and start looking at the <a href="http://www.w3.org/TR/REC-html40/sgml/dtd.html"><span class="caps">DTD</span></a>:</p>
<ul>
<li>The <a href="http://www.w3.org/TR/REC-html40/sgml/dtd.html#coreattrs">coreattrs</a> entity is the first mention.  It defines a class as being some <span class="caps">CDATA</span>.</li>
<li>The definition of <span class="caps">CDATA</span> is an intrinsic part of <span class="caps">SGML</span>.  The details of which can be altered by the <a href="http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html"><span class="caps">SGML</span> Declaration for <span class="caps">HTML 4</span></a>.  There&#8217;s a section at the beginning which lists which characters are allowed.  It includes a large number of unicode characters all above 160 decimal.</li>
</ul>
<p>That means that it&#8217;s safe to include a character via any of the above methods.</p>
<p>But there are a few more wrinkles.  Firstly, whilst the two characters references above are intrinsic to <span class="caps">HTML</span> (via <span class="caps">SGML</span>), where does the character entity come from?  Well, they are defined as part of the <span class="caps">HTML</span> spec: <a href="http://www.w3.org/TR/REC-html40/sgml/entities.html">Character entity references in <span class="caps">HTML 4</span></a>.</p>
<p>There&#8217;s also the problem of the character encoding in case you use the literal é.  Like the <span class="caps">CSS</span> above, you need to ensure that your web server is telling everybody what character encoding the file is served as.  Actually, for <span class="caps">HTML</span>, it&#8217;s less of a problem, as the browser will generally auto-detect character encodings.  But that&#8217;s not necessarily reliable, so it&#8217;s better to be explicit.  And in <span class="caps">HTML</span>, you can put the character encoding in the file itself:</p>
<pre>
  &lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;
</pre>
<p>Yes, this a little bit like having a french dictionary containing the words &#8220;ecrit en Français&#8221; in the front.   But it&#8217;s a good idea to have both this and the <span class="caps">HTTP</span> declaration (and they <em>must</em> match).</p>
<p>I&#8217;m not going to talk about <span class="caps">XHTML</span>/XML because it&#8217;s not in widespread use (i.e. serving it up as <code>application/xhtml+xml</code>).</p>
<p>Finally, what about JavaScript?  Well, it&#8217;s defined as <a href="http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf"><span class="caps">ECMA</span>-262</a> (3rd edition).  That spec explicitly defines everything in terms of Unicode, so it&#8217;s mostly OK.  You can still access characters you can&#8217;t type via an escape mechanism: <code>\u00E9</code> (see the definition of UnicodeEscapeSequence on page 19).  Additionally, JavaScript can get at the unicode characters in the <span class="caps">DOM</span> quite easily:</p>
<pre>
  &lt;p id='a1' class="cliché"&gt;Lessons will be learned.&lt;/p&gt;
  &lt;script type="text/javascript"&gt;
    alert(document.getElementById('a1').className)
  &lt;/script&gt;
</pre>
<p>As always, JavaScript files served over <span class="caps">HTTP</span> need to be supplied with the correct character-encoding through the Content-Type header.  Just like <span class="caps">CSS</span> and <span class="caps">HTML</span>.</p>
<p>So what&#8217;s the take-away from all this?</p>
<ul>
<li>Use literal characters and <span class="caps">UTF</span>-8 everywhere.  It&#8217;s consistent and extensible.</li>
<li>Know how to look in the specs when something&#8217;s going wrong – you&#8217;ll know whether it&#8217;s you, or the browser that&#8217;s getting it wrong.</li>
<li>Characters are hard, let&#8217;s go shopping!</li>
</ul>
<p>Jeremy <a href="http://twitter.com/adactio/statuses/511627392">worked it all out</a> in far less time than I did.</p>
<blockquote>
<p>Figuring I should be okay as long as I use a character entity. <a href="http://tinyurl.com/7p7qc">http://tinyurl.com/7p7qc</a></p>
</blockquote>
<p>Looking at that link, I notice that <span class="caps">CDATA</span> is handled specially within <span class="caps">STYLE</span> and <span class="caps">SCRIPT</span> tags.  Yet more exceptions to the rules!</p>
]]></content:encoded>
			<wfw:commentRss>http://happygiraffe.net/blog/2007/12/19/cliches-are-hard/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
	</channel>
</rss>

