The "exposes" Keyword in Java

Don't worry. You have not been sleeping under a rock for the past few years. There is no such keyword in Java. It's just something I wish for Christmas. It's great that Java modules finally allow us to truly seal off our carefully crafted API surfaces, without clients being able to crack them open anyway using reflection . If a module does not " open" a package to the outside world, there is no way you can access its private and package-private m embers. No more reflectively setting a private field's accessible flag to true and force it to yield or change its value. Unfortunately, you can also no longer use reflection to dynamically read/write public fields or execute public methods. It does not matter whether the fields and methods hail from an exported or non-exported package. As long as the module does not also open the package, any reflection is off limits. In fact you no longer have any self-contained (purely programmatic) way of knowing what publi...

LinkedList - Lost Opportunity?

I've always found LinkedList a bit of a strange bird among the Collections Framework classes. By rights it should probably come right after ArrayList, HashMap, HashSet and LinkedHashMap as a general-purpose collection class, but it never quite provided the functionality I needed — and expected from a linked list implementation.

Internals

It seems like the makers of LinkedList were very intent on emphasising its queue-like potential and it is easy to see why. A linked list consists of a series of "nodes". A node contains a value, a reference to the previous node and a reference to next node. In addition, a linked list maintains a reference to the first node and a reference to the last node.

With this setup, queue-like operations like push, pop, shift and unshift become cheap and simple. To push 54 onto the list, wire it up to 13 and make tail point to 54. To pop 13 off the list, just make tail point to 33 and nullify the reference from 33 to 13. Using a linked list, these operations never ever require an array copy, as would be the case with ArrayList. They are just pointer manipulations, and very few at that.

However, there are many more useful operations that can exploit the pointer-based mechanics of a linked list and that can therefore be (dramatically) faster than ArrayList. For example:
  • cut - cutting out a chunk of contiguous elements from the list
  • embed - embedding one linked list in another one
  • replace - the combination of cut and embed
  • attach - attaching one linked list to another one (which would be as simple as connecting the tail of one to the head of the other)
  • split - splitting the list into multiple smaller lists
  • swap - swapping two regions within the list
  • exchange - exchanging lists segments between two lists
  • move - moving a region to another position
Unfortunately, LinkedList does not implement these operations. This not only diminishes its usefulness, it also causes it to perform very poorly in places where it could have performed at lightning speed, as the benchmarks below show.

WiredList, CrisprList

The klojang-collections module contains two linked list implementations that do implement these operations: WiredList and CrisprList. The latter's name is a nod to the CRISPR gene editing technique. 

Both WiredList and CrisprList ignore the queue-like aspect of linked lists, as that has already been taken care of by LinkedList. Instead, they exclusively focus on list manipulation, notably through destructive editing: edits that sever, establish and rewire links between nodes, but leave their values alone. They do implement the push, pop, shift and unshift operations, but they have been given more neutral names: append, removeLast, removeFirst and prepend, respectively.

To use WiredList and CrisprList, add the following dependency to your POM file:

<dependency>
    <groupId>org.klojang</groupId>
    <artifactId>klojang-collections</artifactId>
    <version>2.0.8</version>
</dependency>

WiredList or CrisprList?

WiredList and CrisprList contain exactly the same methods and they behave identically. They only differ in how, under the hood, they handle the possibly very long chains of nodes that get expelled from the list by various operations. CrisprList will just leave them floating around in deep space (the heap). WiredList, on the other hand, meticulously cleans up after itself. It will nullify the values of the disposed nodes as well as the links between them. This is how LinkedList behaves as well. In principle, this should make it easier for the garbage collector to identify unreachable chunks of heap memory, at the cost some extra administration.

Depending on your use case, one class may perform somewhat better than the other. Generally, though, you can just pick the one with the best name.

Performance Benchmarks

So let's start with the kicker. The benchmark below measures the performance of the replace operation. Within a list containing 50,000,000 elements, it replaces a region of about 500,000 elements with a list that also contains about 500,000 elements. ArrayList and LinkedList do not directly implement the replace operation, so for them the operation is emulated using a series of method calls. If you want to verify that it is a good faith emulation, here is the code for JMH benchmark.

Benchmark  Mode  Cnt   Score   Error  Units
ArrayList  avgt   12   5.322 ± 2.065   s/op
CrisprList avgt   12   0.004 ± 0.004   s/op
LinkedList avgt   12   9.708 ± 1.151   s/op
WiredList  avgt   12   0.003 ± 0.001   s/op
 

WiredList and CriprList are four orders of magnitude faster than ArrayList. Notice that it takes LinkedList an astounding 9.7 seconds to complete the replacement ... once!  Whereas it should (or at least could) have been realy, really good at it. After all, it is built on the same data structure as WiredList and CrisprList, which complete the operation in a couple of milliseconds (three and four milliseconds, respectively, in this run of the benchmark).

So, Ditch ArrayList?

Well, no. Though authentic, this benchmark very much played to the strengths of WiredList and CrisprList. What it does not account for is where the two lists involved in the benchmark came from. Yet, they must have been instantiated at some point. As you can imagine, wiring up 50,000,000 nodes is quite a bit more expensive than creating an array with 50,000,000 elements.

The benchmark below pulls the instantiation into the measurement window. This time the list sizes are 10,000,000 and 1,000,000, respectively. Notice that we are now measuring performance in milliseconds rather than seconds.

Benchmark   Mode  Cnt    Score     Error  Units
ArrayList   avgt   12   54.474 ±  27.380  ms/op
CrisprList  avgt   12   97.730 ±   1.039  ms/op
LinkedList  avgt   12  323.225 ± 128.591  ms/op
WiredList   avgt   12  102.433 ±  26.810  ms/op

So, Ditch WiredList and CrisprList?

Well, neither. It just means that list edits must outnumber list instantiations to make WiredList and CrisprList gain the upper hand over ArrayList. That's not an uncommon scenario.

Moreover, this benchmark turns out to be very sensitive to the amount of memory you give the JVM - not entirely surprising given the rather extreme list sizes. The above benchmark was done with 4 gigabytes of JVM memory. Here is the exact same benchmark, but with 16 gigabytes of JVM memory:

Benchmark   Mode  Cnt    Score     Error  Units
ArrayList   avgt   12  201.044 ±  67.754  ms/op
CrisprList  avgt   12   82.067 ±   1.292  ms/op
LinkedList  avgt   12  717.640 ± 856.598  ms/op
WiredList   avgt   12   81.694 ±   1.966  ms/op

That's a baffling result. That WiredList and CrisprList now run faster was to be expected. That they now run faster than ArrayList is remarkable. But that ArrayList now runs substantially slower than when it had just 4 gigabytes of memory is a big surprise. Yet, this result got replicated every time I ran the benchmark.

Maybe the garbage collector is elbowing itself to the front in a way that is very unfortunate for ArrayList and less so for WiredList and CrisprList. But whatever it is, it reaffirms the mantra: if you want to know how things will work out for your use case, you will have to measure it for your use case.

Other Considerations

For small lists (anything smaller than about 500 elements) don't bother straying away from ArrayList. As the list size comes down to about 100 elements, ArrayList starts to outperform WiredList and CrisprList even if you push the list instantiation outside the measurement.
 
Index-based access with linked lists is slow compared to ArrayList. For example, List.get(5_000_000) is near instantaneous with ArrayList, and just as fast as List.get(0). With linked lists, on the other hand, you would first have to chase 5,000,000 pointers before you get to the node you are interested in. WiredList and CrisprList include a few methods that mitigate this problem, but it is something to reckon with.

Conclusion

  1. ArrayList remains hard to beat, and System.arraycopy(), its weapon of choice, is just very, very fast.
  2. Nevertheless, when working with large lists that require lots of edits, you might want to try out WiredList or CrisprList, especially if you can afford to endow your JVM with generous amounts of memory
  3. Don't bother with LinkedList if all you want to do is some good old list editing.

Comments

Popular posts from this blog

The "exposes" Keyword in Java

Type Maps