Categories
Uncategorized

No escape() from JavaScript

A couple of days ago, we got caught out by a few encoding issues in a site at $WORK. The Perl related ones were fairly self explanatory and I’d seen before (e.g. not calling decode_utf8() on the query string parameters). But the JavaScript part was new to me.

The problem was that we were using JavaScript to create an URL, but this wasn’t encoding some characters correctly. After a bit of investigation, the problem comes down to the difference between escape() and encodeURIComponent().

#escaper {
border: 1px dotted black;
text-align: center;
}
#escaper thead {
background: #eee;
}
#escaper thead th {
border-bottom: 1px solid black;
}
#escaper td, #escaper th {
padding: .25em;
}

input escape(…) encodeURIComponent(…)
a&b a%26b a%26b
1+2 1+2 1%2B2
café caf%E9 caf%C3%A9
Ādam %u0100dam %C4%80dam

The last is particularly troublesome, as no server I know of will support decoding that %u form.

The takeaway is that encodeURIComponent() always encodes as UTF-8 and doesn’t miss characters out. As far as I can see, this means you should simply never use escape(). Which is why I’ve asked Douglas Crockford to add it as a warning to JSLint.

Once we switched the site’s JavaScript from escape() to encodeURIComponent(), everything worked as expected.

Categories
Uncategorized

Which whitespace was that again?

We recently saw this at $WORK. It appears corrupted in Internet Explorer only. Firefox and Safari show it normally.

Corrupted text in internet explorer

After much exploration in the debugger, we eventually found it was caused by using the innerText property in internet explorer. This has the mildly surprising property of turning multiple spaces into U+00A0 (NO-BREAK SPACE) characters (  to you and me). This behaviour doesn’t appear to be documented. And before you ask, this was all being done by a third-party — I know to not use proprietary extensions where possible.

Anyway, I nailed it down to a small test. Given this markup.

Then this script demonstrates the problem.

var foo = document.getElementById('foo');

// OK
foo.innerText = ['A', ' ', 'B'].join('');
alert(foo.innerHTML);   // "A B"

// FAIL
foo.innerText = ['A', ' ', ' ', 'B'].join('');
alert(foo.innerHTML);   // "A  B"

If you want to try it yourself, check out http://jsbin.com/anigo. (jsbin is awesome! Thanks, rem!)

The quick solution is simple: normalize whitespace before insertion before using innerText.

var text = 'some       where        over      the      rainbow';

foo.innerText = text.split(/s+/).join(' ');

Of course, you should really be using appendChild() and createTextNode().