Java Unicode Characters

Working on jenx, I’ve started looking at Characters. In particular, Astral characters. My first question was “how do I create one in a string literal?” Well I still don’t know. But my researches have shown that to do anything outside the Basic Multilingual Plane (BMP) requires JDK 5. Drat. That kind of limits the usefulness of this library. But I really need the stuff in JSR 204.

Which is a story in itself. It’s good thing that Java can handle the full Unicode range. But the support is (to be quite frank) a bit crap. Mostly down to the fact that char is a UTF-16 codepoint, not a “Unicode Character.” I personally don’t find it helpful that they’ve propogated the C problem of confusing char and int, and generally allowing the two to roam freely amongst each other. Plus, JSR 204 looks like it was extremely careful to avoid breaking backwards compatibility, which is always a noble goal, but in the case makes the end result incredibly difficult to use. I shouldn’t have to test each codepoint to see whether or not it’s a surrogate. Really. This is an OO language, I should be able to get the next Character object from the String. Shocking, I know.

It strikes me that Python is pretty much the only language I know that got Unicode right by making an entirely separate object, the “Unicode String.”

Update: Oh alright. In my whinging, I managed to miss String.codePointAt, which does what I need.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s