Memory leak hunting in Java

A bit about a recent memory leak hunt in a university project
Published on May 20, 2010 under the tag ugent

What is this

This semester at UGent, we’re following a course “Software-Development II”. It’s a Java-based course, were we learned a bit about design patterns, nothing really special. The fun part was that this course included a project. I’m not a big Java fan, but I do think it’s not a bad language for cross-platform event-based GUI programming.

Our project consisted of creating multimedia, peer-to-peer chat application. The “multimedia” part consisted of the fact that the application should find images (nothing fancy, just Google Image or Flickr search) relevant to the conversation subject and show these to the users. The user can then recommend images to the conversation partner.

Screenshot of the application

The problem

Two weeks or so ago, the project was getting close to completion, and we were quite excited about this. However, at a certain point, we were testing the program slightly longer than usual, we suddenly got the not-so-awesome java.lang.OutOfMemoryException. Yay!

I started a bit of bug hunting with nudded. Using jconsole, we saw that the memory usage rose linearly – which is, erm, not really good. Our University tends to push netbeans as IDE for Java development1, so we tried using it’s profiler on our project, but we didn’t really get any results.

Because of the plugin-based nature of the project, it was quickly made out that the memory leak was located in the image recommendation system (read: it didn’t occur when we disabled this).

I was the author of it, and I thought the possibility of a memory leak was real: the images were kept in different lists and sets, for convenience and performance reasons.

The solution

However, after a lot of frustrations, it turned out to be something completely different. We were using Java serialization for our communication, and we were sending the images over the network using an ObjectOutputStream.

I discovered eclipse’s memory analyzer tool. It’s not a great piece of software, and quite unstable, but it has one particulary cool feature: the possibility to trace the path to the GC roots for an object.

The path to the GC roots

Wait, the image is kept by the ObjectOutputStream? Our class looked a bit like:

public class ProvidedImage implements Serializable {
     * URL of the image.
    private URL url;

     * Actual image. We use transient here so we don't send the (possible large)
     * image over the network, the receiver can retrieve it using the URL.
    private transient Image image;

    /* More stuff ... */

Wait? The Image is transient? How the heck can we have a memory leak here? Well, it turns out that Java serialization tries to be smarter than it should be.

Imagine the following scenario: we send objects a, b to the ObjectOutputStream. We still have an object c, which has a reference to a. What will happen when we send c? In particular, what will the reference to a look like?

The answer is that the ObjectOutputStream “remembers” the object a: it keeps a table of previously serialized objects. The fact that the image is transient does not matter here, because there’s still a reference to the Image.

Possible solutions included:

  1. Dropping the Image member from the class, and storing that elsewhere.
  2. Using XML or JSON (probably the best solution, but we didn’t have to time to throw half of the project around).
  3. Closing or resetting the ObjectOutputStream regularly. But when, and why?

We chose (1), because (2) was impossible and (3) looked a little dirty. And that, as they say, is that.

  1. No worries, I still use vim.↩︎