Categories
Uncategorized

java -jar blats your classpath

This tripped up a colleague today. When he mentioned it, about three other people (myself included) piped up with oh yeah, that caught me out a while back.

  java -cp 'a.jar:b.jar' -jar foo.jar

This completely ignores the classpath. It’s not blindingly obvious that this is going on. If you manage to wind your way down to java(1), you do see:

-jar

Execute a program encapsulated in a JAR file. The first argument is the name of a JAR file instead of a startup class name. In order for this option to work, the manifest of the JAR file must contain a line of the form Main-Class: classname. Here, classname identifies the class having the public static void main(String[] args) method that serves as your application’s starting point. See the Jar tool reference page and the Jar trail of the Java Tutorial for information about working with Jar files and Jar-file manifests.

When you use this option, the JAR file is the source of all user classes, and other user class path settings are ignored.

Note that JAR files that can be run with the “java -jar” option can have their execute permissions set so they can be run without using “java -jar”. Refer to Java Archive (JAR) Files.

(underlining mine)

This is fairly clear, but unless you’re used to looking for man pages, this might well not be found. In particular, some indication on the java -help output would be a boon.

Categories
Uncategorized

Scala Actors

I’ve been wanting to play with scala for a while. So when I needed an irc bot to read RSS feeds recently, it seemed like a good excuse to play. Sure there are probably loads of existing bots out there that can do this. But I wanted to play. With tools like pircbot and rome, it should be little more than an integration exercise, right? In particular, it seemed like a great excuse to play with scala’s actors.

Actors are a really nice API for handling concurrency. Modelled after erlang’s concurrency, they remove most of the locking problems you’re likely to face in your own code by funnelling everything through a single mailbox (which is well implemented and thread-safe). You just pass messages.

So I started bender and ended up with the obvious solution: two actors, one for speaking irc and one for reading feeds. I can’t make the bot an actor, as that’s already a thread under the control of pircbot.

Original Bender architecture

This worked fine to start with, but I needed to kick off periodic downloads of all the feeds I’m supposed to be monitoring. So after looking at programming in scala, I came up with something like this.

class Feeder extends Actor {
  val periodSeconds = 60 * 60;
  private def periodicFetch() {
    val feeder = self
    actor {
      loop {
        Thread.sleep(periodSeconds * 1000)
        feeder ! "fetch"
      }
    }
  }
  periodicFetch()
  // …
}
bender2.png

This introduces a third actor, which does nothing except sleep and prod the feed reading actor into action every now & again. All very simple. Or so I thought.

The problem was that the fetch message never arrived at Feeder. I put lots of debugging in, but it just never showed up. There was one weirdness though. When I printed out the value of feeder inside periodicFetch(), it claimed to be an ActorProxy.

This is where my inexperience kicked in. I assumed that it was a proxy for the Feeder actor. Sadly not. Really, it’s a proxy for Thread.currentThread(). So the reason I was never seeing the fetch message turn up was that it was going to a completely different mailbox. Simple.

But why was I ending up with an ActorProxy? Well, it stems from my ignoring the rule: “don’t do real work in a constructor”. When I call periodicFetch() initially, the actors mechanism isn’t yet set up. So Actor.self returns the ActorProxy instead.

The fix is simple: either make the initialisation explicit or override start(). Hey presto, my feeds all start downloading now.

class Feeder extends Actor {
  val periodSeconds = 60 * 60;
  private def periodicFetch() {
    // … as before …
  }
  // …
  def act() {
    loop {
      react {
        // …
        case "init" => periodicFetch()
      }
    }
  }
}

This is all obvious in retrospect, but wasn’t particularly easy to figure out from first principles. I have to say that I found actors to be quite difficult to debug. That said, they have one huge advantage. They’re part of a library. So I can just extract the library, and start editing it to put debug println()s in.

The other thing that caused me grief so far was import scala.actors.Actor._. The examples in the book tend to use this, and it’s handy for that. But it introduces a lot of extra names into your space. In particular, exit() caught me out.

Anyway, despite all this, I still like actors very much. They’re so much simpler to use than any other concurrency mechanism I’ve come across.

Categories
Uncategorized

Mercurial support on Google Code

Google Code Blog: Mercurial support for Project Hosting on Google Code: We are happy to announce that Project Hosting on Google Code now supports the Mercurial version control system in addition to Subversion.

This is great news. I love git, but mercurial is also widely used by many projects (Python, Mozilla, …). Bringing DVCSs to a wider audience can only be a good thing. The freedom of making everyone a “committer” is awesome.

Categories
Uncategorized

Cross Site Scripting, again

Twitter all clear after worm wave

Twitter has been given the all clear after a worm infected “tens of thousands of users”. But experts say the attack could have been much worse.

Another day, another XSS hole. It reminds me of something (probably apocryphal) that I heard about lung cancer research. There’s no real need for it. We know what causes lung cancer — smoking.

True or not, we know what causes XSS holes. It’s poor tools. Now given a choice between:

And

${userName}

Guess which one is going to be picked, every single time. And guess which one doesn’t escape HTML properly. Lest you think I’m picking on JSPs, most templating systems have the same flaw.

This is why I was immensely pleased to see Reducing XSS by way of Automatic Context-Aware Escaping in Template Systems from Google. Unfortunately, their solution isn’t universally available yet, but it really serves the purpose of showing how this can be done correctly.

Categories
Uncategorized

More Java Memory Analysis

If you found Heap Dump Analysis interesting, you might also be interested in Identifying ThreadLocal Memory Leaks in JavaEE Web Apps from Igo Molnar. I saw this a day or so after my original post. He uses The eclipse memory analysis tool (MAT) to figure out what’s going on.

In his case, he couldn’t use visualvm as the heap dump was too large.

I downloaded MAT for my problem and had a look at it. It seems like it does much more than visualvm (there are may “pre-canned” queries available, plus an “object query language”). But that makes it correspondingly harder to use. I spent a while going through the tutorials to get the hang of it, whereas visualvm seemed to come together quickly for me. I suspect that if I ploughed more time into it, I’d get a lot more out of it. It seems to be something that I should learn before I get to need it again. 🙂

Categories
Uncategorized

Pedantry Time

Please stop saying redgex. It’s painful to listen to. They are regular expressions. Not rejular expressions.

That is all.

Categories
Uncategorized

Heap Dump Analysis

As part of the investigation into cocoon memory usage, I had to try and understand what was going on inside the JVM. The best way to do that is a heap dump. The general idea is to dump out the contents of the JVM’s memory into a file, then analyse it using a tool.

First, getting a heap dump. This used to be something of a pain, but Java 6 now includes the jmap -dump command. This allows you to (relatively) easily extract a heap dump from a running JVM.

  % pid=$(cat /some/where/pid.txt)
  % jmap=$JAVA_HOME/bin/jmap
  % sudo $jmap -F -dump:live,format=b,file=mla-post-cache-clear.hprof $pid

That takes a little while (a minute or two) to run, and ended up with a 500Mb file. It’s best to bring it back to your workstation in order to look through. Thankfully it compresses quite nicely.

  % time rsync -avz server:mla-post-cache-clear.hprof .
  receiving file list ... done
  mla-post-cache-clear.hprof

  sent 42 bytes  received 72988157 bytes  3105880.81 bytes/sec
  total size is 580115453  speedup is 7.95
  rsync -avz server:mla-post-cache-clear.hprof .  8.42s user 2.85s system 47% cpu 23.643 total

Now things get interesting. We can fire up visualvm. Inside visualvm, do File → Load… and point it at the heap dump file we just obtained. You should end up with something like this:

VisualVM 1.1 heap dump summary

Now this is interesting, but not that useful. If you click on the classes button, you see this:

VisualVM 1.1 classes in heap dump

What’s really interesting about that is the byte[] line. Those instances are only 2% of objects, but are taking up 67% of memory! If you double click on that line, you get taken through to the instances view.

VisualVM 1.1 instances inheap dump

There are three panels here:

  1. The left hand view is a list of all the instances, ordered by size. In this case the top instances are the same size, which is quite suspicious.
  2. The top right hand view is a breakdown of the fields in the object. Not that useful for a byte[].
  3. The bottom right hand view is the useful one — it lists all references to this instance. Here it’s showing that thi byte[] is linked to a ByteArrayOutputStream.

Now, you can expand the references to that byte[] to find out where it’s being referenced. But there’s a really nifty trick. If you right-click, you get an option to “Show Nearest GC Root”. And this is what it shows (after a little while).

VisualVM 1.1 nearest GC root

This isn’t the simplest thing to interpret. But, it does show that the HttpServletResponseBufferingWrapper is holding the byte[] itself. Whilst that servlet response is held by a TraxTransformer. That sounds like a good place to begin investigating. Why is TraxTransformer holding on to ServletResponses?