Tag Archives: unicode

Character Encodings Bite Again

A colleague gave me a nudge today. “This page doesn’t validate because of an encoding error”. It was fairly simple: the string “Jiménez” contained a single byte—Latin1. Ooops. It turned out that we were generating the page as ISO-8859-1 instead of UTF-8 (which is what the page had been declared as [...]

Character Encodings

Q: When a program reads input, what is it reading?
A: Bytes.
i.e. not characters.
If you want characters, you have to convert from one to the other. Thanks to decades of ASCII and Latin-1, with one-to-one byte to character mappings, most programmers have never even noticed that there’s a difference.
But there is. And as soon [...]

Mongrel’s Default Charset

I suddenly noticed that my last entry had Unicode problems. How embarrassing. It turns out that mongrel doesn’t set a default charset, so the usual caveats apply. Looking through the mongrel docs, you can do something with the -m option, but it still seems difficult to apply a default universally.
Thankfully, I’m proxying [...]

Locales That Work

As I mentioned before, I don’t like locales. But of course, the solution is blindingly obvious and had passed me by. Unicode Support on FreeBSD points out the correct solution, which avoids breaking ls.

% export LANG=en_GB.UTF-8 LC_COLLATE=POSIX

Marvellous. Now things can autodetect that I’d like UTF-8, please.

Unicode in Rails

Unicode in Rails takes a step further today, as ActiveSupport::MultiByte is committed to the edge (r5223). More information is available over at fingertips, including a neat demo video. This should really help people who need proper Unicode support. There’s no excuse to not use UTF-8 now!

Unicode in Rails

I’m really happy to see that Thijs has just pointed out that the unicode_hacks plugin is undergoing further development:

We’re almost ready with a new version of Julik’s ‘Unicode Hacks’ that’s now called ‘ActiveSupport::Multibyte’. You can find more information and code on the ‘Multibyte for Rails’ project site.

I’m particularly pleased to see that: “We hope [...]

Unicode for Rails

I finally gave my talk this afternoon. I rushed through things in 40 minutes; I was planning on 45, but I started a little late due to microphone difficulties.
The talk seemed to go down well; a few people came up to ask questions afterwards. My official hecklers, Tom and Paul were noticeably silent. [...]

Character Info in Textmate

One rather useful feature of vim is that you can pull up information about a character by positioning your cursor over it and hitting ga (get ASCII?). I quite miss this in textmate, so I created a small command to add to the Text bundle. This is “Character Info”, which I’ve assigned to [...]

Unicode Depresses Me

Perl is meant to have reasonable Unicode support. So why do I still have to write this at the top of a test?

use utf8;
use Test::More ‘no_plan’;
{
my $Test = Test::Builder->new;
binmode( $Test->output, [...]

Unicode for Rails — accepted

I had a little note today to say that my talk on “Unicode for Rails” has been accepted for RailsConf Europe 2006. Yay!
Now I have to write the thing. This is going to be interesting. I have only a few weeks to go, and most of those weekends are already taken…