Tag Archives: unicode

The joy of apple keyboards

Recently, I’ve been using a Linux desktop for the first time in ages. It’s Ubuntu (Hardy Heron), and it looks nice. But after using a mac for three years, I’m really missing quite a few little things. The ability to … Continue reading

Posted in Uncategorized | Tagged , | 1 Comment

Java Platform Encoding

This came up at $WORK recently. We had a java program that was given input through command line arguments. Unfortunately, it went wrong when being passed UTF-8 characters (U+00A9 COPYRIGHT SIGN [©]). Printing out the command line arguments from inside … Continue reading

Posted in Uncategorized | Tagged , | 4 Comments

No escape() from JavaScript

A couple of days ago, we got caught out by a few encoding issues in a site at $WORK. The Perl related ones were fairly self explanatory and I’d seen before (e.g. not calling decode_utf8() on the query string parameters). … Continue reading

Posted in Uncategorized | Tagged , , | 2 Comments

Mixed Character Encodings

I’ve been given a MySQL dump file at work. It’s got problems — Windows-1252 and UTF-8 characters are mixed in. Bleargh. How can we clean it up to be all UTF-8? Perl to the rescue. use Encode qw( encode decode … Continue reading

Posted in Uncategorized | Tagged , | 4 Comments

Character Encodings Bite Again

A colleague gave me a nudge today. “This page doesn’t validate because of an encoding error”. It was fairly simple: the string “Jiménez” contained a single byte—Latin1. Ooops. It turned out that we were generating the page as ISO-8859-1 instead … Continue reading

Posted in Uncategorized | Tagged , , , , | 1 Comment

Character Encodings

Q: When a program reads input, what is it reading? A: Bytes. i.e. not characters. If you want characters, you have to convert from one to the other. Thanks to decades of ASCII and Latin-1, with one-to-one byte to character … Continue reading

Posted in Uncategorized | Tagged , , | Comments Off

Mongrel's Default Charset

I suddenly noticed that my last entry had Unicode problems. How embarrassing. It turns out that mongrel doesn’t set a default charset, so the usual caveats apply. Looking through the mongrel docs, you can do something with the -m option, … Continue reading

Posted in Uncategorized | Tagged , , , | Comments Off

Locales That Work

As I mentioned before, I don’t like locales. But of course, the solution is blindingly obvious and had passed me by. Unicode Support on FreeBSD points out the correct solution, which avoids breaking ls. % export LANG=en_GB.UTF-8 LC_COLLATE=POSIX Marvellous. Now … Continue reading

Posted in Uncategorized | Tagged , | Comments Off

Unicode in Rails

Unicode in Rails takes a step further today, as ActiveSupport::MultiByte is committed to the edge (r5223). More information is available over at fingertips, including a neat demo video. This should really help people who need proper Unicode support. There’s no … Continue reading

Posted in Uncategorized | Tagged , | Comments Off

Unicode in Rails

I’m really happy to see that Thijs has just pointed out that the unicode_hacks plugin is undergoing further development: We’re almost ready with a new version of Julik’s ‘Unicode Hacks’ that’s now called ‘ActiveSupport::Multibyte’. You can find more information and … Continue reading

Posted in Uncategorized | Tagged , | 1 Comment