Google Collections to the rescue

A few days ago, I was writing a piece of code that turned a line at a time into an Object. And it was using iterators. I had a RecordStream, which wrapped a LineStream (just a thin veneer over LineNumberReader).

Then I discovered that there was a terminating record at the end of each file. And it was in a completely different format to all the other lines. Bother.

Ok, I know, I’ll insert another iterator in the middle, which specifically ignores that record. Well, easier said than done as it turns out. I spent the best part of a day trying to create an Iterator which reads the next value and pretends that it’s not there. It turns out to have an awful lot of state.

Eventually I managed the task, and it worked. But boy, was it ugly. And it was long—about two pages of code.

Then the light bulb went off. I remembered that google collections had some tools for dealing with Iterators. In particular, there’s a function filter(), which takes a Predicate. And look! The Predicates class contains some handy builtins!

After about 5 minutes work, my two pages of code boiled down to three lines of code.

    import static com.google.common.base.Predicates.*;

    private static final String END_RECORD = "END RECORD,END RECORD,END RECORD";

    public Iterator<T> iterator() {
        // Produce an iterator that returns one line at a time.
        Iterator<String> lines = new LineStream(reader).iterator();
        // A predicate to return all records which are not the end record.
        Predicate<String> notEndRecord = not(isEqualTo(END_RECORD));
        // Apply the predicate to the iterator.
        final Iterator<String> it = Iterators.filter(lines, notEndRecord);
        return new Iterator<T>() { … };
    }

Marvellous and powerful stuff. It’s seriously worth checking out in case you haven’t played with it before. My favourite is the static factory methods. e.g.

  // Before
  Map<String, String> myMap = new HashMap<String,String>();

  // After
  Map<String, String> myMap = Maps.newHashMap();

Isn’t it lovely how the compiler just figures it all out for you? Anything that can save space like that has to be a Good Thing™.

There are a whole bunch of other useful things in there.

  • Preconditions.checkNotNull() is a compact way of validity checking your arguments.
  • Join.join()—I don’t know how many times I’ve written this by hand (usually badly). Much better to have somebody else do it for me.

Do yourself a favour and go check them out. You won’t regret it.

6 Comments to Google Collections to the rescue

  1. Sam says:

    I LOVE google collections. Here is a series of blog posts by one of the guys that works on it, detailing some nice idioms:

    http://publicobject.com/2007/09/series-recap-coding-in-small-with.html

  2. Matt says:

    We’ve recently incorporated Google Collections at work, and it’s chock full of useful stuff. Admittedly, much of it is the sort of thing that any decent programmer would inevitably invent for themselves at some point, but would probably not have the time or inclination to do as thorough a job as Google has. I’d like to see the best bits folded into Java 7.

  3. @KiLVaiDeN: I was reading from a file originally and passing on the iterator from that directly. It’s a streaming source, so it made sense to try and incorporate it into the iterator.

    As to efficiency, I very much doubt that would be an issue. The cost of IO usually dwarfs the cost of processing.

    The main point isn’t really about this particular problem. It’s more of an indication of conceptually nice the google-collections API is. The more I look at it, the more useful I find it.

  4. KiLVaiDeN says:

    Interesting stuff, but I think the problem is not when you need to iterate, but rather when you gather the collection at the first place.

    If your method which returns the collections doesn’t filter, order or check validity of the input data ( let it be a database filter, or something else ) you for sure need to find a way to do those steps afterwards, but usually it’s a much better design to only gather the lines you need from either a database or a file, it saves time, simplifies code, and is a much more efficient way of dealing the problems that Google Collections solve.

    My 2 cents
    K

  5. I missed one more that was utterly invaluable when I needed it: Iterables.reverse().