Categories
Uncategorized

Mapping a servlet to /

An interesting problem cropped up today. We want to use a servlet for the home page in a Spring Web MVC project. Initially we had this in web.xml.

  <servlet>
    <servlet-name>foo</servlet-name>
    <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>foo</servlet-name>
    <url-pattern>*.html</url-pattern>
  </servlet-mapping>

Now, we want to add in a handler specifically for /. How to do that? A bit of googling turned up this method.

  <servlet>
    <servlet-name>foo</servlet-name>
    <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
  </servlet>
  <servlet-mapping>
    <servlet-name>foo</servlet-name>
    <url-pattern>*.html</url-pattern>
  </servlet-mapping>
  <servlet-mapping>
    <servlet-name>foo</servlet-name>
    <url-pattern>/home</url-pattern>
  </servlet-mapping>
  <welcome-file-list>
    <welcome-file>home</welcome-file>
  </welcome-file-list>

I’m not 100% sure why this works—the rules for mapping URLs to servlets are a little confusing to me. But it certainly seems to handle the situation correctly. My initial attempts to get this working managed to break everything by handling everything through the DispatcherServlet, even things like CSS. That was unacceptable.

Aha. Looking in the servlet spec shows why:

Web Application developers can define an ordered list of partial URIs called
welcome files in the Web application deployment descriptor.

I had thought that they were files, not URIs. That explains why it works nicely.

Of course, the solution above will attempt to look for “…/home” in any directory, even though I’ve only set it up to work for the root URL. I don’t think that’s much of a problem though.

Categories
Uncategorized

Character Encodings Bite Again

A colleague gave me a nudge today. “This page doesn’t validate because of an encoding error”. It was fairly simple: the string “Jiménez” contained a single byte—Latin1. Ooops. It turned out that we were generating the page as ISO-8859-1 instead of UTF-8 (which is what the page had been declared as in the HTML).

So, which bit of Spring WebMVC sets the character encoding? A bit of poking around in the debugger didn’t pop up any obvious extension point. So we stuck this in our Controller.

  response.setContentType("UTF-8");

This worked, but it’s pretty awful having to do this in every single controller. So, we poked around a bit more and found CharacterEncodingFilter. Installing this into web.xml made things work.

  <filter>
    <filter-name>CEF</filter-name>
    <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
    <init-param>
      <param-name>encoding</param-name>
      <param-value>UTF-8</param-name>
    </init-param>
    <init-param>
      <param-name>forceEncoding</param-name>
      <param-value>true</param-name>
    </init-param>
  </filter>
  <filter-mapping>
    <filter-name>CEF</filter-name>
    <url-pattern>/*</url-pattern>
  </filter-mapping>

Whilst rummaging around in here, we noticed something interesting: the code is set up like a spring bean—it doesn’t read the init-params directly. There’s some crafty code in GenericFilterBean to get this to work. Check it out.

Anyway, that Filter ensured that we output UTF-8 correctly. The forceEncoding parameter ensured that it was set on the response as well as the request.

Incidentally, we figured out where the default value of ISO-8859-1 gets applied. Inside DispatcherServlet.render(), the LocaleResolver gets called, followed by ServletResponse.setLocale(). Tomcat uses the Locale to set the character encoding if it hasn’t been already. Which frankly is a pretty daft thing to do. Being british does not indicate my preference as to Latin-1 vs UTF-8.

Then, the next problem reared its head. The “Jiménez” text was actually a link to search for “Jiménez” in the author field. The URL itself was correctly encoded as q=Jim%C3%A9nez. But when we clicked on it, it didn’t find the original article.

Our search is implemented in Solr. So we immediately had a look at the Solr logs. That clearly had Unicode problems (which is why it wasn’t finding any results). The two bytes of UTF-8 were being interpreted as individual characters (i.e. something was interpreting the URI as ISO-8859-1). Bugger.

Working backwards, we looked at the access logs for Solr. After a brief diversion to enable the access logs for tomcat inside WTP inside Eclipse (oh, the pain of yak shaving), we found that the sender was passing doubly encoded UTF-8. Arrgh.

So we jumped all the way back to the beginning of the search, back in the Controller.

  String q = request.getParameter("q");

Looking at q in the debugger, that was also wrong. So at that point, the only thing that could have affected it would be tomcat itself. A quick google turned up the URIEncoding parameter of the HTTP connector. Setting that to UTF-8 in server.xml fixed our search problem by making getParameter return the correct string.

I have no idea why tomcat doesn’t just listen to the request.setContentType() that the CharacterEncodingFilter performs, but there you go.

So, the lessons are:

  1. Use CharacterEncodingFilter with Spring WebMVC to get the correct output encoding (and input encoding for POST requests).
  2. Always configure tomcat to use UTF-8 for interpreting URI query strings.
  3. Always include some test data with accents to ensure it goes through your system cleanly.
Categories
Uncategorized

Servlets vs mod_perl

I’ve been looking at the Java servlets stuff for a couple of days now and it’s clear that it bears a remarkable similarity to mod_perl.

  • In mod_perl, you define handler() which gets passed a $r object to deal with the request and send a response. Servlets make things a little more explicit. You make doGet() or doPost() methods, which get passed separate request and response. But they’re basically equivalent to $r.
  • mod_perl and servlets both have filters, but they work in slightly different ways. servlets are much more explicit; you have to forward the call on to the next servlet when you’re done. mod_perl handles things nearly implicitly (with help from Apache::Filter).
  • mod_perl is exposing the apache httpd API so you get access to different request phases. servlets run in a completely different environment, so that just isn’t applicable to them.
  • Whats also incredibly similiar to both pieces of software is that most people think that they’re far too low level and have come up with higher level “frameworks” on top. The standard one for Java appears to be JSP, which on close inspection bear a marked similiarity to HTML::Mason. As with mod_perl though, no one framework covers everybody’s needs, so there are lots of similiar, competing ones.
  • One thing that’s markedly different between the two is deployment and configuration. Most mod_perl deployments tend towards the messy, unless the application is packaged up like a CPAN module. But configuration is always manually done. Servlets come with a nicely defined “deployment descriptor” (aka “config file”—they love big words in the Java world). In addition, there’s a standardised directory layout and packaging format making it easy to deploy your application. In theory. I haven’t actually got as far as deploying an application in a production environment yet, but it looks reasonable to me.