mod_perl 1 blows chunks

At $WORK, I’m looking at a web service built on mod_perl 1 / Apache 1. The service takes XML as input and returns XML as output. So far, so good.

Unfortunately, whilst I was testing with curl, I found something odd:

  curl -s -v -T- -H'Content-Type: text/xml;charset=UTF-8' http://localhost/api < input.xml

That -T- says to do a PUT request from stdin. It fails and my code returned “no input”.

But when I did this, things worked:

  curl -s -v -Tinput.xml -H'Content-Type: text/xml;charset=UTF-8' http://localhost/api

That reads directly from the file. The only difference between the two requests is that the latter includes a Content-Length header whilst the former has Transfer-Encoding: chunked instead.

This is the code that was reading the request.

    my $content;
    if ( $r->header_in( 'Content-Type' ) ) {
        $r->read( $content, $r->header_in( 'Content-Length' ) );
    return $content;

So, if there’s no Content-Length, what should we do? My first stop is always the venerable eagle book. There’s a little footnote next to read():

At the time of this writing, HTTP/1.1 requests which do not have a Content-Length header, such as one that uses chunked encoding, are not properly handled by this API.

Marvellous. Now, I had a look around in the source code and noticed a function called new_read(). Unfortunately, that failed to work. It stopped chunked reads, but failed to work for ordinary ones.

I did see a post on the mod_perl mailing list which reckoned you could loop and read all input. But I was unable to get that to work either.

So I just decided to disallow chunked input. That’s fairly easy to do, and HTTP has a special status code for it: 411 Length Required. It’s not ideal, but unless this project gets upgraded to Apache 2 (unlikely, quite frankly), it seems to be the best option.


mod_security now switched off

Last night, I spent a long while trying to get mod_security working on this web server. Installation was simple, thanks to the FreeBSD ports system. Configuration was another matter entirely.

Not having much experience of web application firewalls, I opted for the ModSecurity Core Rules to give me a head start. These are essentially some pre-provided Apache configs that you can include into your existing config files. It seems like a good idea, although some of the rules are questionable—I don’t really think that “googlebot visited” is a security event.

After configuring minor details like the audit log file, I deployed it. Hmmm, front page still comes up, so let’s commit the configs to subversion. Booom.

It turns out that the Core Rules don’t interact well with subversion in a number of minor, but irritating ways. For example, they expect every request to come with an Accept header. And only certain Content-Types can be submitted to the web server. It’s all minor stuff, and relatively easy to work out how to fix. This gave me a good feel for what’s involved in properly customizing mod_security.

This afternoon, I came back and inspected the logs. There are nearly 400 events from mod_security. Quite a lot of these were people trying to spam my trac instance (which I’ve now finally gotten under control) and blog. Importantly however, I noticed that it had blocked a legitimate user of an RSS feed.

At this point, I realised the problem. mod_security needs a lot of work to set up and maintain. You customize it towards a specific purpose—your application. But I’m running lots of applications. So it becomes harder and harder to customize correctly (particularly as I’m not running everything on it’s own virtualhost), because a rule that’s correct for subversion might well not be correct for trac. Or more to the point, it’s going to take me a very long time to get it configured correctly. So I’ve switched off mod_security for now.

Don’t take this as a slur on mod_security. It’s a useful tool, and I will be using it again. But it’s far easier to configure when you’re covering a single application running inside that Apache. And you’ll still need to invest a good chunk of time to get it set up correctly (a very iterative process).


Mongrel’s Default Charset

I suddenly noticed that my last entry had Unicode problems. How embarrassing. It turns out that mongrel doesn’t set a default charset, so the usual caveats apply. Looking through the mongrel docs, you can do something with the -m option, but it still seems difficult to apply a default universally.

Thankfully, I’m proxying to mongrel via Apache. So correcting the situation turned out to be as simple as adding this to my VirtualHost config.

  AddDefaultCharset UTF-8

I was actually not sure that this would work, because Apache is proxying rather than serving files directly. But it does work. I suspect that it may not work un der Apache 1.3, but that would need to be confirmed.

But now the error is corrected and I’m Unicode happy once more. Hurrah!


Log Rotation

I hate log rotation. It’s a pain to configure on my FreeBSD server. Just look at newsyslog.conf. That, and my experiences of the utter non-portability of log rotation programs between different Unixes have led me to believe that programs should probably handle their own log rotation. It just makes life easier having one less thing to integrate with the Operating System.

So, I’ve switched my Apache over to using cronolog in order to get date based logfiles automatically. I’ve used it on a project at work recently and it really works a treat.

I also noticed that PostgreSQL has grown the ability to take care of its own log files in recent versions. So I’ve switched that to doing date based logging with automatic rotation as well. Lovely.

All is not perfect of course. Oh no, that would be far too simple. There’s still the issue of removing old logfiles and/or compressing them. But that seems to be a smaller integration problem.

Of course the trigger for all this activity was finding a 220Mb access_log lying around. Doh!


mod_perl 2 not ready yet

I’ve spent nearly three days this week trying to port one of our sites to Apache 2.2 and mod_perl 2.0.2 (from Apache 1.3.33). It should be a relatively simple exercise thanks to the porting notes available:

Yet sadly, there’s still a lot of problems with mod_perl 2. Firstly, much CPAN software still hasn’t adjusted. In my case it was SOAP-Lite. But I also noticed that libapreq wasn’t classed as “ready” by Mason, so we had to fall back to there.

But the real killer is that they managed to completely break environment variables, in the name of thread safety. Unfortunately, our application uses Inline::Java from inside Apache to talk to Lucene. Now Inline::Java spawns a JVM to run a JAR file and be the server. So that the JVM can find the jar (as well as the lucene jar and our code), it sets $ENV{CLASSPATH}. Except that change never shows up, so the JVM just says “unknown class” and exits.

Basically, mod_perl2 breaks system. This is not clever.

So we won’t be upgrading to Apache 2.2 for a while. This is a shame, as it has a bug fix we really need (>4Gb file support). Instead, we switched to using FTP. Yuck.


Cygwin / Apache

A quick note about running Apache under cygwin. If you just run httpd2 on it’s own, you get an error about “invalid system call”. Very annoying.

A quick bit of googling reveals that you have to have something called cygserver running. You have to set it up by running cygserver-config the first time. After that, you just need to ensure that cygserver is running.

Then, you can start Apache by saying CYGWIN=server httpd2. And it all works!