Tag Archives: unicode
The joy of apple keyboards
Recently, I’ve been using a Linux desktop for the first time in ages. It’s Ubuntu (Hardy Heron), and it looks nice. But after using a mac for three years, I’m really missing quite a few little things. The ability to … Continue reading
Java Platform Encoding
This came up at $WORK recently. We had a java program that was given input through command line arguments. Unfortunately, it went wrong when being passed UTF-8 characters (U+00A9 COPYRIGHT SIGN [©]). Printing out the command line arguments from inside … Continue reading
No escape() from JavaScript
A couple of days ago, we got caught out by a few encoding issues in a site at $WORK. The Perl related ones were fairly self explanatory and I’d seen before (e.g. not calling decode_utf8() on the query string parameters). … Continue reading
Mixed Character Encodings
I’ve been given a MySQL dump file at work. It’s got problems — Windows-1252 and UTF-8 characters are mixed in. Bleargh. How can we clean it up to be all UTF-8? Perl to the rescue. use Encode qw( encode decode … Continue reading
Character Encodings Bite Again
A colleague gave me a nudge today. “This page doesn’t validate because of an encoding error”. It was fairly simple: the string “Jiménez” contained a single byte—Latin1. Ooops. It turned out that we were generating the page as ISO-8859-1 instead … Continue reading
Character Encodings
Q: When a program reads input, what is it reading? A: Bytes. i.e. not characters. If you want characters, you have to convert from one to the other. Thanks to decades of ASCII and Latin-1, with one-to-one byte to character … Continue reading
Mongrel's Default Charset
I suddenly noticed that my last entry had Unicode problems. How embarrassing. It turns out that mongrel doesn’t set a default charset, so the usual caveats apply. Looking through the mongrel docs, you can do something with the -m option, … Continue reading
Locales That Work
As I mentioned before, I don’t like locales. But of course, the solution is blindingly obvious and had passed me by. Unicode Support on FreeBSD points out the correct solution, which avoids breaking ls. % export LANG=en_GB.UTF-8 LC_COLLATE=POSIX Marvellous. Now … Continue reading
Unicode in Rails
Unicode in Rails takes a step further today, as ActiveSupport::MultiByte is committed to the edge (r5223). More information is available over at fingertips, including a neat demo video. This should really help people who need proper Unicode support. There’s no … Continue reading
Unicode in Rails
I’m really happy to see that Thijs has just pointed out that the unicode_hacks plugin is undergoing further development: We’re almost ready with a new version of Julik’s ‘Unicode Hacks’ that’s now called ‘ActiveSupport::Multibyte’. You can find more information and … Continue reading