LinkedList - Lost Opportunity?
- Get link
- X
- Other Apps
I've always found LinkedList a bit of a strange bird among the Collections Framework classes. By rights it should probably come right after ArrayList, HashMap, HashSet and LinkedHashMap as a general-purpose collection class, but it never quite provided the functionality I needed — and expected from a linked list implementation.
Internals
It seems like the makers of LinkedList were very intent on emphasising its queue-like potential and it is easy to see why. A linked list consists of a series of "nodes". A node contains a value, a reference to the previous node and a reference to next node. In addition, a linked list maintains a reference to the first node and a reference to the last node.
With this setup, queue-like operations like push, pop, shift and unshift become cheap and simple. To push 54 onto the list, wire it up to 13 and make tail point to 54. To pop 13 off the list, just make tail point to 33 and nullify the reference from 33 to 13. Using a linked list, these operations never ever require an array copy, as would be the case with ArrayList. They are just pointer manipulations, and very few at that.
However, there are many more useful operations that can exploit the pointer-based mechanics of a linked list and that can therefore be (dramatically) faster than ArrayList. For example:- cut - cutting out a chunk of contiguous elements from the list
- embed - embedding one linked list in another one
- replace - the combination of cut and embed
- attach - attaching one linked list to another one (which would be as simple as connecting the tail of one to the head of the other)
- split - splitting the list into multiple smaller lists
- swap - swapping two regions within the list
- exchange - exchanging lists segments between two lists
- move - moving a region to another position
WiredList, CrisprList
The klojang-collections module contains two linked list implementations that do implement these operations: WiredList and CrisprList. The latter's name is a nod to the CRISPR gene editing technique.
Both WiredList and CrisprList ignore the queue-like aspect of linked lists, as that has already been taken care of by LinkedList. Instead, they exclusively focus on list manipulation, notably through destructive editing: edits that sever, establish and rewire links between nodes, but leave their values alone. They do implement the push, pop, shift and unshift operations, but they have been given more neutral names: append, removeLast, removeFirst and prepend, respectively.
To use WiredList and CrisprList, add the following dependency to your POM file:
<dependency>
<groupId>org.klojang</groupId>
<artifactId>klojang-collections</artifactId>
<version>2.0.8</version>
</dependency>
WiredList or CrisprList?
WiredList and CrisprList contain exactly the same methods and they behave identically. They only differ in how, under the hood, they handle the possibly very long chains of nodes that get expelled from the list by various operations. CrisprList will just leave them floating around in deep space (the heap). WiredList, on the other hand, meticulously cleans up after itself. It will nullify the values of the disposed nodes as well as the links between them. This is how LinkedList behaves as well. In principle, this should make it easier for the garbage collector to identify unreachable chunks of heap memory, at the cost some extra administration.
Depending on your use case, one class may perform somewhat better than the other. Generally, though, you can just pick the one with the best name.
Performance Benchmarks
So let's start with the kicker. The benchmark below measures the performance of the replace operation.
Within a list containing 50,000,000 elements, it replaces a region of
about 500,000 elements with a list that also contains about 500,000
elements. ArrayList and LinkedList do not directly implement the replace operation,
so for them the operation is emulated using a series of method calls.
If you want to verify that it is a good faith emulation, here is the code for JMH benchmark.
Benchmark Mode Cnt Score Error Units
ArrayList avgt 12 5.322 ± 2.065 s/op
CrisprList avgt 12 0.004 ± 0.004 s/op
LinkedList avgt 12 9.708 ± 1.151 s/op
WiredList avgt 12 0.003 ± 0.001 s/op
WiredList and CriprList are four orders of magnitude faster than ArrayList. Notice that it takes LinkedList an astounding 9.7 seconds to complete the replacement ... once! Whereas it should (or at least could) have been realy, really good at it. After all, it is built on the same data structure as WiredList and CrisprList, which complete the operation in a couple of milliseconds (three and four milliseconds, respectively, in this run of the benchmark).
So, Ditch ArrayList?
Well, no. Though authentic, this benchmark very much played to the strengths of WiredList and CrisprList. What it does not account for is where the two lists involved in the benchmark came from. Yet, they must have been instantiated at some point. As you can imagine, wiring up 50,000,000 nodes is quite a bit more expensive than creating an array with 50,000,000 elements.
The benchmark below pulls the instantiation into the measurement window. This time the list sizes are 10,000,000 and 1,000,000, respectively. Notice that we are now measuring performance in milliseconds rather than seconds.
Benchmark Mode Cnt Score Error Units
ArrayList avgt 12 54.474 ± 27.380 ms/op
CrisprList avgt 12 97.730 ± 1.039 ms/op
LinkedList avgt 12 323.225 ± 128.591 ms/op
WiredList avgt 12 102.433 ± 26.810 ms/op
So, Ditch WiredList and CrisprList?
Well, neither. It just means that list edits must outnumber list instantiations to make WiredList and CrisprList gain the upper hand over ArrayList. That's not an uncommon scenario.
Moreover, this benchmark turns out to be very sensitive to the amount of memory you give the JVM - not entirely surprising given the rather extreme list sizes. The above benchmark was done with 4 gigabytes of JVM memory. Here is the exact same benchmark, but with 16 gigabytes of JVM memory:
Benchmark Mode Cnt Score Error Units
ArrayList avgt 12 201.044 ± 67.754 ms/op
CrisprList avgt 12 82.067 ± 1.292 ms/op
LinkedList avgt 12 717.640 ± 856.598 ms/op
WiredList avgt 12 81.694 ± 1.966 ms/op
That's a baffling result. That WiredList and CrisprList now run faster was to be expected. That they now run faster than ArrayList is remarkable. But that ArrayList now runs substantially slower than when it had just 4 gigabytes of memory is a big surprise. Yet, this result got replicated every time I ran the benchmark.
Maybe the garbage collector is elbowing itself to the front in a way that is very unfortunate for ArrayList and less so for WiredList and CrisprList. But whatever it is, it reaffirms the mantra: if you want to know how things will work out for your use case, you will have to measure it for your use case.
Other Considerations
For small lists (anything smaller than about 500 elements) don't bother straying away from ArrayList. As the list size comes down to about 100 elements, ArrayList starts to outperform WiredList and CrisprList even if you push the list instantiation outside the measurement.Conclusion
- ArrayList remains hard to beat, and System.arraycopy(), its weapon of choice, is just very, very fast.
- Nevertheless, when working with large lists that require lots of edits, you might want to try out WiredList or CrisprList, especially if you can afford to endow your JVM with generous amounts of memory
- Don't bother with LinkedList if all you want to do is some good old list editing.
- Get link
- X
- Other Apps
Comments
Post a Comment