Static Code Analysis

Most of my work these days is in Java (groan). Well, if you’re going to write in a language with static typing, you might as well make use of it. So I’ve been playing with a few static code analyzers, in order to try and find bugs. I use eclipse, so a plugin is the only sensible way of doing this for me.

This is an eclipse project which aims “integrate a variety of analysis providers”. Whatever. You can set it up to scan your source for “best practice violations”, and it adds a new “analysis view” to your eclipse screen with the results.

On a sample project I ran it over, it found 600+ problems. The vast majority of those were of the form “String literal should be a constant”. Which is not something I happen to 100% agree with.

Some of the problems have a Quick Fix1 option, which is really handy. But not many.

Whilst it found some problems I agreed with, most of the things it found I didn’t agree with. Or didn’t see the reason why, so I ignored it. And that’s the biggest problem with this tool: It never explains why something is bad. You can configure which problems are scanned for (or even add comments to ignore a particular problem for a particular line of code). But the lack of reason deters me from relying on this more.

Overall: 3/5

This is available as an eclipse plugin as well as a standalone tool. Instead of making a new view, it just adds any problems found to the standard “Problems” view, where all the usual compilation errors and warnings are. So it feels like it integrates better.

When it finds a problem, you can right click to get an “explain the problem” view where the exact nature of the problem is spelled out clearly. That rocks.

Not only that, but it’s really good at spotting things that I as a relatively immature java coder just don’t even think about. For example, it found where I was returning objects from accessors. But this is Java—surely that’s what you’re meant to do? No. You should return a copy of an object, because you don’t want somebody else to alter your objects internal state. I’d not have spotted that one in a hurry.

I did have some problems with the eclipse plugin making it a little hard to get rid of the markers in the source code. I’m not sure why, but a little fiddling usually cleared it up where it happened. But that’s a problem with the eclipse integration, not findbugs itself.

The problems it found were generally pretty reasonable, and it didn’t overwhelm me with warnings. But it appears to be easy to make it a lot pickier. In other words, it got the defaults right for me.

Overall: 4/5

This is another one that’s available as both an eclipse plugin and a standalone tool. I’ve only used the eclipse plugin.

Unfortunately, when I first tried it, it switched to the PMD Perspective and brought up a NullPointerException in the “Violations Overview” view (update: looks like bug 1595798). Annoying, but again I suspect it’s the Eclipse integration rather than PMD itself.

The warnings provided by PMD are sensible, but for the first time usage, there are quite a lot (though you can get the outline view to filter by priority). e.g. it complains about direct field access rather than accessors, even though a field is only accessed within a class. I don’t care that much, I can go either way. That said, the explanations about why it’s complaining are excellent. They even include little code samples.

Whilst it claims to offer Quick Fix, it’s grayed out in the menu, so it’s not that useful…

One of the features I really liked is the “cut’n’paste code detection”. Even though it doesn’t integrate properly into eclipse, it just dumps its output into a file. I despise cut’n’paste code.

Overall: 4/5

I’m glad to have the variety of tools. Each of them found different problems. Overall I like like findbugs the best, but PMD is pretty close. There’s no excuse for stupid little bugs…

1 If you haven’t used Quick Fix in eclipse, bring yourself out of the dark age and press Ctrl-1 on a bit of red underlined code. Do this for an hour. Now try and live without it.


contype is an MSIE bug

Yesterday, I was helping a colleague debug a problem involving dynamically generated PDFs. For some reason, he was occaisionally getting double requests for the same file. Eventually, we cottoned on to the fact that the second request had a user-agent of “contype”. Not having heard of that before, I googled for it.

Tim Strehle is the first hit for user agent contype, and he has an explanation and a link to the MSDN article PRB: Three GET Requests Are Sent When You Retrieve Plug-in Served Content.

I have never been so astonished by what I saw.

Symptoms: When a server returns a document that is handled by a plug-in (such as an Adobe Acrobat PDF file), three requests are made for the document in Internet Explorer versions 4.x and 5, and two requests are made in Internet Explorer version 5.5.

Cause: The first behavior is by design. When an initial request is sent for the server file, this returns a data stream with a content-type that is handled by a plug-in (not an ActiveX control), and Internet Explorer closes the initial port and sends a new request with userAgent = contype. The only information that is needed in return from the contype request is the content-type.

That’s pretty broken design. You already have the information in the browser, yet you have to go across the network again to fetch it? What a crock. They didn’t even have the clue to use a HEAD request instead of a GET, which would have been a bit of a hint in the right direction.

But the bit that really infuriated me was this:

However, because most developers are unaware of this request style, they treat each GET the same and return the entire document.

Which I read as:

Because we fucked up and you didn’t know about it, you lose dumbass.

The whole thing has an infuriatingly arrogant tone to it. There’s no apology for their insane design. Just Nelson going “Har-Har”.

You have been warned.


Coding Dojo V

Last night I attended the 5th Brighton Coding Dojo. The task: to implement binary search in 5 different ways (source).

Only four people turned up (myself included), but we still managed to make a good go of things. Richard Dallaway had prepared an interface, a unit test and a sample implementation (using Arrays.binarySearch)). This was an enormous head start, allowing us to really focus on the problem at hand, as well as check that we’d actually implemented it correctly.

Binary search is an algorithm which is readily understood. Yet it has a large number of corner cases when it comes to actually implementing it. I believe that our basic maths skills were all found to be lacking. I think the only problem we didn’t come across was the infamous google’s large arrays problem.

When we’d finished, we’d managed to implement four different variations:

  • A SimpleSearch, which in all honesty probably doesn’t count, as it was pretty much a linear scan through the array.
  • A RecursiveSearch, which was far too much hard work for what it was doing.
  • A RandomSearch, which worked surprisingly well, once we’d upped the maximum number of attempts to something suitable large.
  • A GuessingSearch, which tried to spot when we were looking at a a value near the beginning or end of the array and optimise accordingly.

Overall, really good fun. Especially familiarizing yourself with an area of programming that you probably don’t use all that often. Once again, thanks to Joh for organising.

When I got home, I started on an alternate approach: converting the list to a string of numbers and counting the number of commas before and after. I haven’t finished this yet, and I don’t expect it to be performant, but hey’ it’s another approach!


BBC Backstage Xmas Bash

After the LPW, I followed a few other Perl mongers along to the BBC Backstage Xmas Bash. It looked completely empty when we got there, which was slightly disconcerting, but it just meant that the iceberg was 90% hidden, with all the action being downstairs.

Immediately on my arrival, I spotted Ribot and Devi from Future Platforms—always glad to see a friendly face. But they weren’t the only part of the strong Brighton contingent. I also bumped into Andy Budd and Aral Balkan, as well as James McCarthy. And I’m sure I missed a couple of people as well. It’s really good to let everyone know how much good talent is down in Brighton.

What was really nice about the party as a whole was the grand mixing of different peoples. The Perl people met the Ruby people who met the BBC people, etc. It felt like a huge great, noisy (very noisy) melting pot. Top marks to Aunty for putting it on!


London Perl Workshop 2006

Yesterday was the London Perl Workshop, a one-day, two-track conference put together by the London Perl Mongers (in particular, muttley and Greg). I arrived early (mostly due to my lack of faith in train times), so got to put up posters first.

First talk of the day was Jesse Vincent on Jifty, the web application framework originating from within Best Practical. It was a basic introduction—making blog software, of course. Jesse took us through the details quite quickly, but it was immediately clear that very little work needs to be done for a basic CRUD app. It appears to use Mason by default, which I consider a plus point. Overall it felt very railsish, particularly the fact that it has migrations. I love migrations.

After the basic intro, he went over a few advanced features like the continuations support (of which more later) and the developer support. There’s a bunch of stuff in there that I really need to look at nicking for work, like the builtin Mason profiler support and the inline fragment editing, not to mentioned CSS::Squish and JS::Squish.

On the whole, I’m not sure about Jifty. It looks lovely, and quick to develop in. But that amount of concentrated magic scares me. I need to try out a couple of small applications to get a better feel for it.

I stayed on for the next talk, Mike Astle on wigwam (a deployment tool). I’m interested in doing deployment better, but ultimately, wigwam didn’t seem to offer that much more than what I have at the moment: a way of building packages into a compartmentalised space that can be distributed around different servers. Looking around the audience in the question time, I think I wasn’t alone.

Afterwards, I popped down to see Tom Hukins talk on “Just in time testing”. Sadly, Tom was ill. However, abigail stepped up and offered to talk about instead, highlighting the many ways in which it can be abused. He had culled a number of uses of Benchmark from perlmonks, and demonstrated various flaws. Such as not benchmarking the same thing, or trying to benchmark volatile data. The best was when he demonstrated how the compiler had completely optimised away one of the branches. Naturally executing a statement which has been optimised away is very quick.

My main beef with all of this was simply that the things being benchmarked were phenomenally simple. Really, if you care about map vs grep performance, go write it in C. Otherwise, profile your app long before you start to think about these things. demerphq pointed out that you can end up dieing the death of a thousand cuts if you don’t care about some of these little things, however.

Funnily enough, demerphq was speaking next on the changes to the regex engine coming up in Perl 5.10. This was a really deep, informative talk, and I’ll admit to glossing over some parts of it, but there were two things that really stuck out for me:

  • Recursive patterns. This makes it really easy to call back into the regex you’re matching. This is very handy for doing things like matching balanced tags correctly.
  • Named parameter groups. Nicked from .Net and Python. This should make large regexes much simpler.

Apart from that, there’s been a whole lot of work to optimize the regex engine, as well as making it properly pluggable. This now leads to the situation of making PCRE truly Perl compatible by embedding it…

After lunch, I listened to Jesse Vincent again, on “Advanced Jifty”. This was basically peeking inside some of the deep magic that’s going on in there. First, Jesse gave an overview of the message bus inside Jifty. The heart of it is IPC::PubSub. Moving on, he peeked inside Template::Declare. This is a bit like markaby. Jesse pointed out a couple of “unusual” implementation details such as local *__ANON__ = $tag; which is an undocumented way of naming auto-generated subroutines so that stack traces make sense. He also presented a quote from Audrey: “we read every bit of perlsub.pod and abused it all”.

Lastly he covered the i18n pragma, which is just filled with scary magic to make ~~'hello world' look up hello world in a message catalog and return a translated version. There’s a great deal of use of overload and overload::constant.

A this point, Jesse started to run out of time, so he rushed through a few other interesting uses of Perl:

  • Using a function called _, which is globally available in all packages.
  • Blessing into a class called 0 in order to return false from ref.
  • Creating an MD5 sum of the call stack. I’m guessing that’s how they implement continuations support.

From one mind boggling talk to another: Abigail was on next, with “Sudoku by regex”. I won’t begin to pretend to understand what it was all about, except to note that a standard 9×9 Sudoku grid took 1.5 hours to solve in a single regex. Apparently, he’s also been trying out other games in a similar fashion, except that one of them he let run for 2 weeks before pressing ^C

Alistair McGlinchy talked about “How to make a grumpy network capacity planner happy”. This was a really nice little piece on what his work as a network admin involves, and why developers chew up lots of bandwith without thinking. He gave a really good overview of HTTP caching / compression, which needs to be more widely known.

Ash Berlin spoke on Angerwhale, which is a blog that doesn’t use a database, but does use Catalyst. Consensus from my bit of the audience: why aren’t you using bloxsom? It’s smaller, simpler and works just as well.

Finally, Jos Boumans gave his superb talk on barely legal XXX Perl. It’s a detailed blow by blow account of making Acme::BadExample run, despite all the deviousness contained therein. As with Jesse’s talk, this is scary stuff, but gives a real insight into how to mold Perl to your will.

All in all, a superb day. Interesting people, interesting talks. It was well organised. I’m extremely grateful that it was put on, particularly for free thanks to the sponsors…

Anyway, after the conference, the only natural thing to do was retreat to another pub. I only spent a short while there before departing for the BBC Backstage Xmas bash


mod_security now switched off

Last night, I spent a long while trying to get mod_security working on this web server. Installation was simple, thanks to the FreeBSD ports system. Configuration was another matter entirely.

Not having much experience of web application firewalls, I opted for the ModSecurity Core Rules to give me a head start. These are essentially some pre-provided Apache configs that you can include into your existing config files. It seems like a good idea, although some of the rules are questionable—I don’t really think that “googlebot visited” is a security event.

After configuring minor details like the audit log file, I deployed it. Hmmm, front page still comes up, so let’s commit the configs to subversion. Booom.

It turns out that the Core Rules don’t interact well with subversion in a number of minor, but irritating ways. For example, they expect every request to come with an Accept header. And only certain Content-Types can be submitted to the web server. It’s all minor stuff, and relatively easy to work out how to fix. This gave me a good feel for what’s involved in properly customizing mod_security.

This afternoon, I came back and inspected the logs. There are nearly 400 events from mod_security. Quite a lot of these were people trying to spam my trac instance (which I’ve now finally gotten under control) and blog. Importantly however, I noticed that it had blocked a legitimate user of an RSS feed.

At this point, I realised the problem. mod_security needs a lot of work to set up and maintain. You customize it towards a specific purpose—your application. But I’m running lots of applications. So it becomes harder and harder to customize correctly (particularly as I’m not running everything on it’s own virtualhost), because a rule that’s correct for subversion might well not be correct for trac. Or more to the point, it’s going to take me a very long time to get it configured correctly. So I’ve switched off mod_security for now.

Don’t take this as a slur on mod_security. It’s a useful tool, and I will be using it again. But it’s far easier to configure when you’re covering a single application running inside that Apache. And you’ll still need to invest a good chunk of time to get it set up correctly (a very iterative process).