<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jabbering Giraffe &#187; unicode</title>
	<atom:link href="http://happygiraffe.net/blog/tag/unicode/feed/" rel="self" type="application/rss+xml" />
	<link>http://happygiraffe.net/blog</link>
	<description></description>
	<lastBuildDate>Tue, 07 Feb 2012 20:49:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>The joy of apple keyboards</title>
		<link>http://happygiraffe.net/blog/2009/11/03/the-joy-of-apple-keyboards/</link>
		<comments>http://happygiraffe.net/blog/2009/11/03/the-joy-of-apple-keyboards/#comments</comments>
		<pubDate>Tue, 03 Nov 2009 22:56:13 +0000</pubDate>
		<dc:creator>Dominic Mitchell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[osx]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://happygiraffe.net/blog/?p=1638</guid>
		<description><![CDATA[Recently, I&#8217;ve been using a Linux desktop for the first time in ages. It&#8217;s Ubuntu (Hardy Heron), and it looks nice. But after using a mac for three years, I&#8217;m really missing quite a few little things. The ability to &#8230; <a href="http://happygiraffe.net/blog/2009/11/03/the-joy-of-apple-keyboards/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Recently, I&#8217;ve been using a Linux desktop for the first time in ages.  It&#8217;s Ubuntu (Hardy Heron), and it looks nice.  But after using a mac for three years, I&#8217;m really missing quite a few little things.</p>
<ol>
<li>The ability to drag and drop anything anywhere.</li>
<li>Being able to type a wide range of Unicode characters easily.</li>
</ol>
<p>On a mac, it&#8217;s really, really easy to type in a wide variety of useful characters.  All you need is alt (&#x2325;), sometimes known as “option”.</p>
<table border="1">
<tr>
<th>Keys</th>
<th>Character</th>
<th>Name</th>
</tr>
<tr>
<td><code>&#x2325; ;</code></td>
<td>…</td>
<td>HORIZONTAL ELLIPSIS</td>
</tr>
<tr>
<td><code>&#x2325; -</code></td>
<td>–</td>
<td>EN DASH</td>
</tr>
<tr>
<td><code>&#x2325; &#x21E7; -</code></td>
<td>—</td>
<td>EM DASH</td>
</tr>
<tr>
<td><code>&#x2325; [</code></td>
<td>“</td>
<td>LEFT DOUBLE QUOTATION MARK</td>
</tr>
<tr>
<td><code>&#x2325; &#x21E7; [</code></td>
<td>”</td>
<td>RIGHT DOUBLE QUOTATION MARK</td>
</tr>
<tr>
<td><code>&#x2325; 2</code></td>
<td>™</td>
<td>TRADE MARK SIGN</td>
</tr>
<tr>
<td><code>&#x2325; 8</code></td>
<td>•</td>
<td>BULLET</td>
</tr>
<tr>
<td><code>&#x2325; e &nbsp; e</code></td>
<td>é</td>
<td> LATIN SMALL LETTER E WITH ACUTE </td>
</tr>
</table>
<p>How did I find all this out?  The lovely keyboard viewer that comes with OS X.  You can get the flag in your menu bar by going to International in system preferences and checking “Show input menu in menu bar.”</p>
<div style="text-align:center;"><img src="http://happygiraffe.net/blog/wp-content/uploads/2009/11/input-menu.png" alt="Selecting the keyboard viewer in the input menu" border="0" width="237" height="196" /></div>
<div style="text-align:center;"><img src="http://happygiraffe.net/blog/wp-content/uploads/2009/11/keyboard-viewer-normal.png" alt="OS X Keyboard Viewer (normal state)" border="0" width="318" height="175" /></div>
<p>Now, hold down alt and see what you can get (try alt and shift too).</p>
<div style="text-align:center;"><img src="http://happygiraffe.net/blog/wp-content/uploads/2009/11/image-capture-alt.png" alt="OS X Keyboard Viewer (alt)" border="0" width="318" height="175" /></div>
<p>But not everything is attached to a key.  In case you need more characters, there&#8217;s always the character palette.  Usually on the <code>&#x2325; &#x2318; T</code> key as well as in the Edit menu.  Here, you can get access to the vast repertoire of characters in Unicode.  Need an arrow?</p>
<div style="text-align:center;"><img src="http://happygiraffe.net/blog/wp-content/uploads/2009/11/character-palette-arrows.png" alt="Arrows in the Character Palette" border="0" width="370" height="390" /></div>
<p>There&#8217;s a lot you can do with the character palette, but the search box is probably the best way in.  Just tap in a bit of the name of the character you&#8217;re looking for and see what turns up.</p>
<p>This easy access to a wide array of characters is something I&#8217;ve rather come to take for granted in OS X.  So coming back to the Linux desktop, it was odd to find that I couldn&#8217;t as readily type them in.  Of course, I haven&#8217;t invested the time in figuring out how to set up <a href="http://en.wikipedia.org/wiki/X_keyboard_extension">XKB</a> correctly.  Doubtless I could achieve many of the same things.  But my past experiences of XKB and it&#8217;s documentation have shown me how complicated it can be, so I don&#8217;t rate my ability to pull it off.</p>
<p>The end result is that I&#8217;m spending most of my time on the (mac) laptop and ignoring the desktop.  I do like my characters. <img src='http://happygiraffe.net/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://happygiraffe.net/blog/2009/11/03/the-joy-of-apple-keyboards/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Java Platform Encoding</title>
		<link>http://happygiraffe.net/blog/2009/09/24/java-platform-encoding/</link>
		<comments>http://happygiraffe.net/blog/2009/09/24/java-platform-encoding/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 13:45:37 +0000</pubDate>
		<dc:creator>Dominic Mitchell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://happygiraffe.net/blog/?p=1624</guid>
		<description><![CDATA[This came up at $WORK recently. We had a java program that was given input through command line arguments. Unfortunately, it went wrong when being passed UTF-8 characters (U+00A9 COPYRIGHT SIGN [©]). Printing out the command line arguments from inside &#8230; <a href="http://happygiraffe.net/blog/2009/09/24/java-platform-encoding/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This came up at <code>$WORK</code> recently.  We had a java program that was given input through command line arguments.  Unfortunately, it went wrong when being passed UTF-8 characters (U+00A9 COPYRIGHT SIGN [©]).  Printing out the command line arguments from inside Java showed that we had double encoded Unicode.</p>
<p>Initially, we just slapped <code>-Dfile.encoding=UTF-8</code> on the command line.  But that failed when the site that called this code went through an automatic restart.  So we investigated the issue further.</p>
<p>We quickly found that the presence of absence of the <code>LANG</code> environment variable had a bearing on the matter.</p>
<p><strong>NB:</strong> <code>ShowSystemProperties.jar</code> is very simple and just lists all system properties in sorted order.</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ java <span style="color: #660033;">-version</span>
java version <span style="color: #ff0000;">&quot;1.6.0_16&quot;</span>
Java<span style="color: #7a0874; font-weight: bold;">&#40;</span>TM<span style="color: #7a0874; font-weight: bold;">&#41;</span> SE Runtime Environment <span style="color: #7a0874; font-weight: bold;">&#40;</span>build 1.6.0_16-b01<span style="color: #7a0874; font-weight: bold;">&#41;</span>
Java HotSpot<span style="color: #7a0874; font-weight: bold;">&#40;</span>TM<span style="color: #7a0874; font-weight: bold;">&#41;</span> Server VM <span style="color: #7a0874; font-weight: bold;">&#40;</span>build <span style="color: #000000;">14.2</span>-b01, mixed mode<span style="color: #7a0874; font-weight: bold;">&#41;</span>
$ <span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #007800;">$LANG</span>
en_GB.UTF-<span style="color: #000000;">8</span>
$ java <span style="color: #660033;">-jar</span> ShowSystemProperties.jar <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> encoding
file.encoding=UTF-<span style="color: #000000;">8</span>
file.encoding.pkg=sun.io
sun.io.unicode.encoding=UnicodeLittle
sun.jnu.encoding=UTF-<span style="color: #000000;">8</span>
$ <span style="color: #007800;">LANG</span>= java <span style="color: #660033;">-jar</span> ShowSystemProperties.jar <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> encoding
file.encoding=ANSI_X3.4-<span style="color: #000000;">1968</span>
file.encoding.pkg=sun.io
sun.io.unicode.encoding=UnicodeLittle
sun.jnu.encoding=ANSI_X3.4-<span style="color: #000000;">1968</span></pre></div></div>

<p>So, setting <code>file.encoding</code> works, but there&#8217;s an internal property, <code>sun.jnu.encoding</code> as well.</p>
<p>Next, see what happens when we add the explicit override.</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ <span style="color: #007800;">LANG</span>= java -Dfile.encoding=UTF-<span style="color: #000000;">8</span> <span style="color: #660033;">-jar</span> ShowSystemProperties.jar <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> encoding
file.encoding=UTF-<span style="color: #000000;">8</span>
file.encoding.pkg=sun.io
sun.io.unicode.encoding=UnicodeLittle
sun.jnu.encoding=ANSI_X3.4-<span style="color: #000000;">1968</span></pre></div></div>

<p>Hey!  <code>sun.jnu.encoding</code> isn&#8217;t changing!</p>
<p>Now, as far as I can see, sun.jnu.encoding isn&#8217;t actually documented anywhere.  So you have to go into the source code for Java (openjdk&#8217;s <a href="http://hg.openjdk.java.net/jdk6/jdk6/jdk/rev/536cbf2d9d0e">jdk6-b16</a> in this case) to figure out what&#8217;s up.</p>
<p>Let&#8217;s start in <code>main()</code>, which is in <a href="http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/536cbf2d9d0e/src/share/bin/java.c">java.c</a>.  Actually, it&#8217;s <code>JavaMain()</code> that we&#8217;re really interested in.  In there you can see:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">int</span> JNICALL
JavaMain<span style="color: #009900;">&#40;</span><span style="color: #993333;">void</span> <span style="color: #339933;">*</span> _args<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
  …
  jobjectArray mainArgs<span style="color: #339933;">;</span>
&nbsp;
  …
  <span style="color: #808080; font-style: italic;">/* Build argument array */</span>
  mainArgs <span style="color: #339933;">=</span> NewPlatformStringArray<span style="color: #009900;">&#40;</span>env<span style="color: #339933;">,</span> argv<span style="color: #339933;">,</span> argc<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>mainArgs <span style="color: #339933;">==</span> NULL<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      ReportExceptionDescription<span style="color: #009900;">&#40;</span>env<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #b1b100;">goto</span> leave<span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
  …
<span style="color: #009900;">&#125;</span></pre></div></div>

<p><code>NewPlatformStringArray()</code> is defined in <code>java.c</code> and calls <code>NewPlatformString()</code> repeatedly with each command line argument.  In turn, that calls <code>new String(byte[], encoding)</code>.  It gets the encoding from <code>getPlatformEncoding()</code>.  That essentially calls <code>System.getProperty("sun.jnu.encoding")</code>.</p>
<p>So where does that property get set?  If you look in <a href="http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/536cbf2d9d0e/src/share/native/java/lang/System.c"><code>System.c</code></a>, <code>Java_java_lang_System_initProperties()</code> calls:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">    PUTPROP<span style="color: #009900;">&#40;</span>props<span style="color: #339933;">,</span> <span style="color: #ff0000;">&quot;sun.jnu.encoding&quot;</span><span style="color: #339933;">,</span> sprops<span style="color: #339933;">-&gt;</span>sun_jnu_encoding<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>sprops appears to get set in <code>GetJavaProperties()</code> in <a href="http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/536cbf2d9d0e/src/solaris/native/java/lang/java_props_md.c">java_props_md.c</a>.  This interprets various environment variables including the one that control the locale.  It appears to pull out everything after the period in the <code>LANG</code> environment variable as the encoding in order to get <code>sun_jnu_encoding</code>.</p>
<p>Phew.  So we now know that there is a special property which gets used for interpreting &#8220;platform&#8221; strings like:</p>
<p>* Command line arguments<br />
* Main class name<br />
* Environment variables</p>
<p>And it <em>can</em> be overridden:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ <span style="color: #007800;">LANG</span>= java -Dsun.jnu.encoding=UTF-<span style="color: #000000;">8</span> -Dfile.encoding=UTF-<span style="color: #000000;">8</span> <span style="color: #660033;">-jar</span> ShowSystemProperties.jar <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">grep</span> encoding
file.encoding=UTF-<span style="color: #000000;">8</span>
file.encoding.pkg=sun.io
sun.io.unicode.encoding=UnicodeLittle
sun.jnu.encoding=UTF-<span style="color: #000000;">8</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://happygiraffe.net/blog/2009/09/24/java-platform-encoding/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>No escape() from JavaScript</title>
		<link>http://happygiraffe.net/blog/2009/09/14/no-escape-from-javascript/</link>
		<comments>http://happygiraffe.net/blog/2009/09/14/no-escape-from-javascript/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 12:50:58 +0000</pubDate>
		<dc:creator>Dominic Mitchell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[jslint]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://happygiraffe.net/blog/?p=1618</guid>
		<description><![CDATA[A couple of days ago, we got caught out by a few encoding issues in a site at $WORK. The Perl related ones were fairly self explanatory and I&#8217;d seen before (e.g. not calling decode_utf8() on the query string parameters). &#8230; <a href="http://happygiraffe.net/blog/2009/09/14/no-escape-from-javascript/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A couple of days ago, we got caught out by a few encoding issues in a site at <code>$WORK</code>.  The Perl related ones were fairly self explanatory and I&#8217;d seen before (e.g. not calling <a href="http://search.cpan.org/dist/Encode/Encode.pm#$string_=_decode_utf8%28$octets_[,_CHECK]%29;"><code>decode_utf8()</code></a> on the query string parameters).  But the JavaScript part was new to me.</p>
<p>The problem was that we were using JavaScript to create an URL, but this wasn&#8217;t encoding some characters correctly.  After a bit of investigation, the problem comes down to the difference between <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Functions#escape_and_unescape_Functions"><code>escape()</code></a> and <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Functions/encodeURIComponent"><code>encodeURIComponent()</code></a>.</p>
<style type="text/css">
#escaper {
  border: 1px dotted black;
  text-align: center;
}
#escaper thead {
  background: #eee;
}
#escaper thead th {
  border-bottom: 1px solid black;
}
#escaper td, #escaper th {
  padding: .25em;
}
</style>
<table id="escaper" cellspacing="0">
<thead>
<tr>
<th>input</th>
<th><code>escape(…)</code></th>
<th><code>encodeURIComponent(…)</code></th>
</tr>
</thead>
<tr>
<td><code>a&#038;b</code></td>
<td><code>a%26b</code></td>
<td><code>a%26b</code></td>
</tr>
<tr>
<td><code>1+2</code></td>
<td><code>1+2</code></td>
<td><code>1%2B2</code></td>
</tr>
<tr>
<td><code>caf&#xE9;</code></td>
<td><code>caf%E9</code></td>
<td><code>caf%C3%A9</code></td>
</tr>
<tr>
<td><code>&#x100;dam</code></td>
<td><code>%u0100dam</code></td>
<td><code>%C4%80dam</code></td>
</tr>
</table>
<p>The last is particularly troublesome, as no server I know of will support decoding that <code>%u</code> form.</p>
<p>The takeaway is that <code>encodeURIComponent()</code> <em>always</em> encodes as UTF-8 and doesn&#8217;t miss characters out.  As far as I can see, this means you should simply never use <code>escape()</code>.  Which is why I&#8217;ve asked Douglas Crockford to <a href="http://tech.groups.yahoo.com/group/jslint_com/message/906">add it as a warning</a> to <a href="http://jslint.com/">JSLint</a>.</p>
<p>Once we switched the site&#8217;s JavaScript from <code>escape()</code> to <code>encodeURIComponent()</code>, everything worked as expected.</p>
]]></content:encoded>
			<wfw:commentRss>http://happygiraffe.net/blog/2009/09/14/no-escape-from-javascript/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Mixed Character Encodings</title>
		<link>http://happygiraffe.net/blog/2009/03/20/mixed-character-encodings/</link>
		<comments>http://happygiraffe.net/blog/2009/03/20/mixed-character-encodings/#comments</comments>
		<pubDate>Fri, 20 Mar 2009 18:57:44 +0000</pubDate>
		<dc:creator>Dominic Mitchell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[perl]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://happygiraffe.net/blog/?p=1490</guid>
		<description><![CDATA[I&#8217;ve been given a MySQL dump file at work. It&#8217;s got problems — Windows-1252 and UTF-8 characters are mixed in. Bleargh. How can we clean it up to be all UTF-8? Perl to the rescue. use Encode qw&#40; encode decode &#8230; <a href="http://happygiraffe.net/blog/2009/03/20/mixed-character-encodings/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been given a MySQL dump file at work.  It&#8217;s got problems — <a href="http://en.wikipedia.org/wiki/Windows-1252">Windows-1252</a> and <a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8</a> characters are mixed in.  Bleargh.  How can we clean it up to be all UTF-8?  Perl to the rescue.</p>

<div class="wp_syntax"><div class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">use</span> Encode <span style="color: #000066;">qw</span><span style="color: #009900;">&#40;</span> encode decode <span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;"># From http://www.cl.cam.ac.uk/~mgk25/unicode.html#perl</span>
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$utf8_char</span> <span style="color: #339933;">=</span> <span style="color: #000066;">qr</span><span style="color: #009900;">&#123;</span>
    <span style="color: #009900;">&#40;</span><span style="color: #339933;">?:</span>
        <span style="color: #009900;">&#91;</span><span style="color: #0000ff;">\x00</span><span style="color: #339933;">-</span><span style="color: #0000ff;">\x7f</span><span style="color: #009900;">&#93;</span>
        <span style="color: #339933;">|</span>
        <span style="color: #009900;">&#91;</span><span style="color: #0000ff;">\xc0</span><span style="color: #339933;">-</span><span style="color: #0000ff;">\xdf</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">\x80</span><span style="color: #339933;">-</span><span style="color: #0000ff;">\xbf</span><span style="color: #009900;">&#93;</span>
        <span style="color: #339933;">|</span>
        <span style="color: #009900;">&#91;</span><span style="color: #0000ff;">\xe0</span><span style="color: #339933;">-</span><span style="color: #0000ff;">\xef</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">\x80</span><span style="color: #339933;">-</span><span style="color: #0000ff;">\xbf</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#123;</span><span style="color: #cc66cc;">2</span><span style="color: #009900;">&#125;</span>
        <span style="color: #339933;">|</span>
        <span style="color: #009900;">&#91;</span><span style="color: #0000ff;">\xf0</span><span style="color: #339933;">-</span><span style="color: #0000ff;">\xf7</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">\x80</span><span style="color: #339933;">-</span><span style="color: #0000ff;">\xbf</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#123;</span><span style="color: #cc66cc;">3</span><span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span>x<span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">&lt;&gt;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066;">s</span><span style="color: #009900;">&#123;</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$utf8_char</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">|</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">.</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#123;</span>
        <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span> <span style="color: #000066;">defined</span> <span style="color: #0000ff;">$1</span> <span style="color: #009900;">&#41;</span>    <span style="color: #009900;">&#123;</span> <span style="color: #0000ff;">$1</span> <span style="color: #009900;">&#125;</span>
        <span style="color: #b1b100;">elsif</span> <span style="color: #009900;">&#40;</span> <span style="color: #000066;">defined</span> <span style="color: #0000ff;">$2</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span> encode<span style="color: #009900;">&#40;</span> <span style="color: #ff0000;">&quot;utf8&quot;</span><span style="color: #339933;">,</span> decode<span style="color: #009900;">&#40;</span> <span style="color: #ff0000;">&quot;cp1252&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$2</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#125;</span>
        <span style="color: #b1b100;">else</span>                 <span style="color: #009900;">&#123;</span> <span style="color: #ff0000;">&quot;&quot;</span> <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #b1b100;">ge</span><span style="color: #339933;">;</span>
    <span style="color: #000066;">print</span> <span style="color: #0000ff;">$_</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Yes, that&#8217;s a regex for matching UTF-8 characters (courtesy of <a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html#perl">Markus Kuhn</a>).  I hadn&#8217;t considered using a regex when I first started down this road.  I started examining bytes by hand.  And the code was about three times longer.</p>
<p>Anyway, this seems to solve the issues I was having.</p>
]]></content:encoded>
			<wfw:commentRss>http://happygiraffe.net/blog/2009/03/20/mixed-character-encodings/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Character Encodings Bite Again</title>
		<link>http://happygiraffe.net/blog/2008/07/31/character-encodings-bite-again/</link>
		<comments>http://happygiraffe.net/blog/2008/07/31/character-encodings-bite-again/#comments</comments>
		<pubDate>Thu, 31 Jul 2008 23:17:06 +0000</pubDate>
		<dc:creator>Dominic Mitchell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[servlets]]></category>
		<category><![CDATA[solr]]></category>
		<category><![CDATA[tomcat]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://happygiraffe.net/2008/07/31/character-encodings-bite-again/</guid>
		<description><![CDATA[A colleague gave me a nudge today. &#8220;This page doesn&#8217;t validate because of an encoding error&#8221;. It was fairly simple: the string &#8220;Jiménez&#8221; contained a single byte&#8212;Latin1. Ooops. It turned out that we were generating the page as ISO-8859-1 instead &#8230; <a href="http://happygiraffe.net/blog/2008/07/31/character-encodings-bite-again/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A colleague gave me a nudge today.  &#8220;This page doesn&#8217;t validate because of an encoding error&#8221;.  It was fairly simple: the string &#8220;Jiménez&#8221; contained a single byte&#8212;Latin1.  Ooops.  It turned out that we were generating the page as <a href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1"><span class="caps">ISO</span>-8859-1</a> instead of <a href="http://en.wikipedia.org/wiki/UTF-8"><span class="caps">UTF</span>-8</a> (which is what the page had been declared as in the <span class="caps">HTML</span>).</p>
<p>So, which bit of <a href="http://static.springframework.org/spring/docs/2.5.x/reference/mvc.html">Spring WebMVC</a> sets the character encoding?  A bit of poking around in the debugger didn&#8217;t pop up any obvious extension point.  So we stuck this in our <a href="http://static.springframework.org/spring/docs/2.5.x/api/org/springframework/stereotype/Controller.html">Controller</a>.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">  response.<span style="color: #006633;">setContentType</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;UTF-8&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>This worked, but it&#8217;s pretty awful having to do this in every single controller.  So, we poked around a bit more and found <a href="http://static.springframework.org/spring/docs/2.5.x/api/org/springframework/web/filter/CharacterEncodingFilter.html">CharacterEncodingFilter</a>.  Installing this into <code>web.xml</code> made things work.</p>

<div class="wp_syntax"><div class="code"><pre class="xml" style="font-family:monospace;">  <span style="color: #ddbb00;">&lt;</span>filter<span style="color: #ddbb00;">&gt;</span>
    <span style="color: #ddbb00;">&lt;</span>filter-name<span style="color: #ddbb00;">&gt;</span>CEF<span style="color: #ddbb00;">&lt;</span>/filter-name<span style="color: #ddbb00;">&gt;</span>
    <span style="color: #ddbb00;">&lt;</span>filter-class<span style="color: #ddbb00;">&gt;</span>org.springframework.web.filter.CharacterEncodingFilter<span style="color: #ddbb00;">&lt;</span>/filter-class<span style="color: #ddbb00;">&gt;</span>
    <span style="color: #ddbb00;">&lt;</span>init-param<span style="color: #ddbb00;">&gt;</span>
      <span style="color: #ddbb00;">&lt;</span>param-name<span style="color: #ddbb00;">&gt;</span>encoding<span style="color: #ddbb00;">&lt;</span>/param-name<span style="color: #ddbb00;">&gt;</span>
      <span style="color: #ddbb00;">&lt;</span>param-value<span style="color: #ddbb00;">&gt;</span>UTF-8<span style="color: #ddbb00;">&lt;</span>/param-name<span style="color: #ddbb00;">&gt;</span>
    <span style="color: #ddbb00;">&lt;</span>/init-param<span style="color: #ddbb00;">&gt;</span>
    <span style="color: #ddbb00;">&lt;</span>init-param<span style="color: #ddbb00;">&gt;</span>
      <span style="color: #ddbb00;">&lt;</span>param-name<span style="color: #ddbb00;">&gt;</span>forceEncoding<span style="color: #ddbb00;">&lt;</span>/param-name<span style="color: #ddbb00;">&gt;</span>
      <span style="color: #ddbb00;">&lt;</span>param-value<span style="color: #ddbb00;">&gt;</span>true<span style="color: #ddbb00;">&lt;</span>/param-name<span style="color: #ddbb00;">&gt;</span>
    <span style="color: #ddbb00;">&lt;</span>/init-param<span style="color: #ddbb00;">&gt;</span>
  <span style="color: #ddbb00;">&lt;</span>/filter<span style="color: #ddbb00;">&gt;</span>
  <span style="color: #ddbb00;">&lt;</span>filter-mapping<span style="color: #ddbb00;">&gt;</span>
    <span style="color: #ddbb00;">&lt;</span>filter-name<span style="color: #ddbb00;">&gt;</span>CEF<span style="color: #ddbb00;">&lt;</span>/filter-name<span style="color: #ddbb00;">&gt;</span>
    <span style="color: #ddbb00;">&lt;</span>url-pattern<span style="color: #ddbb00;">&gt;</span>/*<span style="color: #ddbb00;">&lt;</span>/url-pattern<span style="color: #ddbb00;">&gt;</span>
  <span style="color: #ddbb00;">&lt;</span>/filter-mapping<span style="color: #ddbb00;">&gt;</span></pre></div></div>

<p>Whilst rummaging around in here, we noticed something interesting: the code is set up like a spring bean&#8212;it doesn&#8217;t read the init-params directly.  There&#8217;s some crafty code in <a href="http://static.springframework.org/spring/docs/2.5.x/api/org/springframework/web/filter/GenericFilterBean.html">GenericFilterBean</a> to get this to work.  Check it out.</p>
<p>Anyway, that Filter ensured that we output <span class="caps">UTF</span>-8 correctly.  The <code>forceEncoding</code> parameter ensured that it was set on the response as well as the request.</p>
<p>Incidentally, we figured out where the default value of <span class="caps">ISO</span>-8859-1 gets applied.  Inside <a href="http://static.springframework.org/spring/docs/2.5.x/api/org/springframework/web/servlet/DispatcherServlet.html#render(org.springframework.web.servlet.ModelAndView,%20javax.servlet.http.HttpServletRequest,%20javax.servlet.http.HttpServletResponse)">DispatcherServlet.render()</a>, the <a href="http://static.springframework.org/spring/docs/2.5.x/api/org/springframework/web/servlet/LocaleResolver.html">LocaleResolver</a> gets called, followed by <a href="http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/ServletResponse.html#setLocale(java.util.Locale)">ServletResponse.setLocale()</a>.  Tomcat uses the Locale to set the character encoding if it hasn&#8217;t been already.  Which frankly is a pretty daft thing to do.  Being british does not indicate my preference as to Latin-1 vs <span class="caps">UTF</span>-8.</p>
<p>Then, the next problem reared its head.  The &#8220;Jiménez&#8221; text was actually a link to search for &#8220;Jiménez&#8221; in the author field.  The <span class="caps">URL</span> itself was correctly encoded as <code>q=Jim%C3%A9nez</code>.  But when we clicked on it, it didn&#8217;t find the original article.</p>
<p>Our search is implemented in <a href="http://lucene.apache.org/solr/">Solr</a>.  So we immediately had a look at the Solr logs.  That clearly had Unicode problems (which is why it wasn&#8217;t finding any results).  The two bytes of <span class="caps">UTF</span>-8 were being interpreted as individual characters (i.e. something was interpreting the <span class="caps">URI</span> as <span class="caps">ISO</span>-8859-1).  Bugger.</p>
<p>Working backwards, we looked at the access logs for Solr.  After a brief diversion to enable the access logs for tomcat inside <span class="caps">WTP</span> inside Eclipse (oh, the pain of yak shaving), we found that the sender was passing doubly encoded <span class="caps">UTF</span>-8.  Arrgh.</p>
<p>So we jumped all the way back to the beginning of the search, back in the Controller.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">  <span style="color: #003399;">String</span> q <span style="color: #339933;">=</span> request.<span style="color: #006633;">getParameter</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;q&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Looking at <code>q</code> in the debugger, that was also wrong.  So at that point, the only thing that could have affected it would be tomcat itself.  A quick google turned up the <code>URIEncoding</code> parameter of the <a href="http://tomcat.apache.org/tomcat-6.0-doc/config/http.html"><span class="caps">HTTP</span> connector</a>.  Setting that to <code>UTF-8</code> in <code>server.xml</code> fixed our search problem by making <code>getParameter</code> return the correct string.</p>
<p>I have no idea why tomcat doesn&#8217;t just listen to the <code>request.setContentType()</code> that the CharacterEncodingFilter performs, but there you go.</p>
<p>So, the lessons are:</p>
<ol>
<li>Use CharacterEncodingFilter with Spring WebMVC to get the correct output encoding (and input encoding for <span class="caps">POST</span> requests).</li>
<li>Always configure tomcat to use <span class="caps">UTF</span>-8 for interpreting <span class="caps">URI</span> query strings.</li>
<li>Always include some test data with accents to ensure it goes through your system cleanly.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://happygiraffe.net/blog/2008/07/31/character-encodings-bite-again/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Character Encodings</title>
		<link>http://happygiraffe.net/blog/2008/07/27/character-encodings/</link>
		<comments>http://happygiraffe.net/blog/2008/07/27/character-encodings/#comments</comments>
		<pubDate>Sun, 27 Jul 2008 22:24:41 +0000</pubDate>
		<dc:creator>Dominic Mitchell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[tech]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://happygiraffe.net/2008/07/27/character-encodings/</guid>
		<description><![CDATA[Q: When a program reads input, what is it reading? A: Bytes. i.e. not characters. If you want characters, you have to convert from one to the other. Thanks to decades of ASCII and Latin-1, with one-to-one byte to character &#8230; <a href="http://happygiraffe.net/blog/2008/07/27/character-encodings/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Q: When a program reads input, what is it reading?</p>
<p>A: Bytes.</p>
<p>i.e. not <em>characters</em>.</p>
<p>If you want characters, you have to convert from one to the other.  Thanks to decades of <a href="http://en.wikipedia.org/wiki/ASCII"><span class="caps">ASCII</span></a> and <a href="http://en.wikipedia.org/wiki/ISO/IEC_8859-1">Latin-1</a>, with one-to-one byte to character mappings, most programmers have never even noticed that there&#8217;s a difference.</p>
<p>But there is.  And as soon as somebody feeds your program something like <a href="http://en.wikipedia.org/wiki/UTF-8"><span class="caps">UTF</span>-8</a>, your code is <em>broken</em>.</p>
<p>Now some environments are aware of the difference between bytes and characters.  Like Perl and Java.</p>
<p>But there&#8217;s still a nasty breakage waiting for you in these environments.  It&#8217;s called the &#8220;default character encoding&#8221;.  And it&#8217;s bitten me several times in the last few weeks.</p>
<p>Picking on Java for a moment, take a look at <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/io/InputStreamReader.html">InputStreamReader</a>.  It has <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/io/InputStreamReader.html#constructor_summary">four constructors</a>, the first of which doesn&#8217;t take an encoding.  So when you use that you have (effectively) no idea of what encoding you&#8217;re reading.  It could be <a href="http://en.wikipedia.org/wiki/Windows-1252">Windows-1252</a> if you&#8217;re on a PC.  It could be <a href="http://en.wikipedia.org/wiki/Mac_OS_Roman">MacRoman</a> on <span class="caps">OS X</span> (seriously).  On Linux, it&#8217;s probably <span class="caps">UTF</span>-8.  But you&#8217;re at the mercy of not only changes in the OS, but also &#8220;helpful&#8221; administrators, environment variables, systems properties.  Really, <em>anything</em> could change it.</p>
<p>Which is why when I see somebody saying <code>new InputStreamReader(someInputStream)</code> in a 3rd party library, I scream.  Loudly.  And often.  Because they&#8217;ve suddenly decided that everybody else knows form my input should be, except me, the person writing the program.  Needless to say, this is rather difficult to cope with.</p>
<p>The lesson is:</p>
<blockquote>
<p>If you ever do <strong>any</strong> I/O or bytes ↔ characters conversion without <strong>explicitly</strong> specifying the character set, you will be fucked.</p>
</blockquote>
<p>Don&#8217;t do it kids, just use <span class="caps">UTF</span>-8.</p>
]]></content:encoded>
			<wfw:commentRss>http://happygiraffe.net/blog/2008/07/27/character-encodings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mongrel&#039;s Default Charset</title>
		<link>http://happygiraffe.net/blog/2006/11/18/mongrels-default-charset/</link>
		<comments>http://happygiraffe.net/blog/2006/11/18/mongrels-default-charset/#comments</comments>
		<pubDate>Sat, 18 Nov 2006 18:07:00 +0000</pubDate>
		<dc:creator>Dominic Mitchell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[mongrel]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://happygiraffe.net/2006/11/18/mongrels-default-charset/</guid>
		<description><![CDATA[I suddenly noticed that my last entry had Unicode problems. How embarrassing. It turns out that mongrel doesn&#8217;t set a default charset, so the usual caveats apply. Looking through the mongrel docs, you can do something with the -m option, &#8230; <a href="http://happygiraffe.net/blog/2006/11/18/mongrels-default-charset/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I suddenly noticed that my <a href="http://happygiraffe.net/blog/archives/2006/11/14/java-is-free">last entry</a> had Unicode problems.  How <a href="http://happygiraffe.net/blog/archives/2006/09/16/unicode-for-rails">embarrassing</a>.  It turns out that <a href="http://mongrel.rubyforge.org/">mongrel</a> doesn&#8217;t set a default charset, so the usual <a href="http://tools.ietf.org/html/rfc2616#section-3.7.1" title="default to ISO-8859-1 [or more likely cp1252]">caveats</a> apply.  Looking through the mongrel docs, you can do something with the <code>-m</code> option, but it still seems difficult to apply a default universally.</p>
<p>Thankfully, I&#8217;m proxying to mongrel via Apache.  So correcting the situation turned out to be as simple as adding this to my VirtualHost config.</p>
<pre>
  AddDefaultCharset UTF-8
</pre>
<p>I was actually not sure that this would work, because Apache is proxying rather than serving files directly.  But it does work.  I suspect that it may not work un der Apache 1.3, but that would need to be confirmed.</p>
<p>But now the error is corrected and I&#8217;m Unicode happy once more.  Hurrah!</p>
]]></content:encoded>
			<wfw:commentRss>http://happygiraffe.net/blog/2006/11/18/mongrels-default-charset/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Locales That Work</title>
		<link>http://happygiraffe.net/blog/2006/11/06/locales-that-work/</link>
		<comments>http://happygiraffe.net/blog/2006/11/06/locales-that-work/#comments</comments>
		<pubDate>Mon, 06 Nov 2006 12:01:00 +0000</pubDate>
		<dc:creator>Dominic Mitchell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[unicode]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://happygiraffe.net/2006/11/06/locales-that-work/</guid>
		<description><![CDATA[As I mentioned before, I don&#8217;t like locales. But of course, the solution is blindingly obvious and had passed me by. Unicode Support on FreeBSD points out the correct solution, which avoids breaking ls. % export LANG=en_GB.UTF-8 LC_COLLATE=POSIX Marvellous. Now &#8230; <a href="http://happygiraffe.net/blog/2006/11/06/locales-that-work/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>As I mentioned before, I don&#8217;t like locales.  But of course, the solution is blindingly obvious and had passed me by.  <a href="http://opal.com/freebsd/unicode.html#langctype">Unicode Support on FreeBSD</a> points out the correct solution, which avoids breaking ls.</p>
<pre>
  % export LANG=en_GB.UTF-8 LC_COLLATE=POSIX
</pre>
<p>Marvellous.  Now things can autodetect that I&#8217;d like <span class="caps">UTF</span>-8, please.</p>
]]></content:encoded>
			<wfw:commentRss>http://happygiraffe.net/blog/2006/11/06/locales-that-work/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unicode in Rails</title>
		<link>http://happygiraffe.net/blog/2006/10/05/unicode-in-rails/</link>
		<comments>http://happygiraffe.net/blog/2006/10/05/unicode-in-rails/#comments</comments>
		<pubDate>Thu, 05 Oct 2006 09:33:00 +0000</pubDate>
		<dc:creator>Dominic Mitchell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://happygiraffe.net/2006/10/05/unicode-in-rails/</guid>
		<description><![CDATA[Unicode in Rails takes a step further today, as ActiveSupport::MultiByte is committed to the edge (r5223). More information is available over at fingertips, including a neat demo video. This should really help people who need proper Unicode support. There&#8217;s no &#8230; <a href="http://happygiraffe.net/blog/2006/10/05/unicode-in-rails/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Unicode in Rails takes a step further today, as ActiveSupport::MultiByte is committed to the edge (<a href="http://dev.rubyonrails.org/changeset/5223">r5223</a>).  More information is available over at <a href="http://www.fngtps.com/2006/10/activesupport-multibyte">fingertips</a>, including a neat demo video.  This should really help people who need proper Unicode support.  There&#8217;s no excuse to not use <span class="caps">UTF</span>-8 now!</p>
]]></content:encoded>
			<wfw:commentRss>http://happygiraffe.net/blog/2006/10/05/unicode-in-rails/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unicode in Rails</title>
		<link>http://happygiraffe.net/blog/2006/09/18/unicode-in-rails-2/</link>
		<comments>http://happygiraffe.net/blog/2006/09/18/unicode-in-rails-2/#comments</comments>
		<pubDate>Mon, 18 Sep 2006 22:51:00 +0000</pubDate>
		<dc:creator>Dominic Mitchell</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[rails]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://happygiraffe.net/2006/09/18/unicode-in-rails-2/</guid>
		<description><![CDATA[I&#8217;m really happy to see that Thijs has just pointed out that the unicode_hacks plugin is undergoing further development: We’re almost ready with a new version of Julik’s ‘Unicode Hacks’ that’s now called ‘ActiveSupport::Multibyte’. You can find more information and &#8230; <a href="http://happygiraffe.net/blog/2006/09/18/unicode-in-rails-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m really happy to see that <a href="http://www.fngtps.com/">Thijs</a> has just <a href="http://happygiraffe.net/blog/archives/2006/09/16/unicode-for-rails#comment-1071">pointed out</a> that the unicode_hacks plugin is undergoing further development:</p>
<blockquote>
<p> We’re almost ready with a new version of Julik’s ‘Unicode Hacks’ that’s now called ‘ActiveSupport::Multibyte’. You can find more information and code <a href="https://fngtps.com/projects/multibyte_for_rails/wiki">on the ‘Multibyte for Rails’ project site</a>.</p>
</blockquote>
<p>I&#8217;m particularly pleased to see that: &#8220;We hope to get ActiveSupport::Multibyte accepted as a new core extension in the 1.2 release of Ruby on Rails&#8221;.  That would be a real boon.  Check out the <a href="https://fngtps.com/projects/multibyte_for_rails/wiki/FAQ"><span class="caps">FAQ</span></a> too.</p>
]]></content:encoded>
			<wfw:commentRss>http://happygiraffe.net/blog/2006/09/18/unicode-in-rails-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

