GapList feedback

95 views
Skip to first unread message

Thibault Kruse

unread,
Oct 25, 2013, 7:15:55 AM10/25/13
to brownies-c...@googlegroups.com
Hi Thomas,

looking at the GapList article, I have a few questions.

In the gaplist article, removal of an item from the middle of a Gaplist is hown like this:
http://java.dzone.com/articles/gaplist-%E2%80%93-lightning-fast-list

Before: 1, 3, 6, 8, -, -, -, -,
After (removing 6): 1, 3, -, 8, -, -, -, -,

Shouldn't this be: 1, 3, -, -, -, -, -, 8
such that only one gap remains? Or do you keep an additional variable to allow a mid-gap and an end-gap? Or maybe have a variable number of gaps?

Also have you considered putting your source code on a public repository like github, for easier forking and pull requests?

Also have you considered suggesting a JSR to either add gaplist to the JDK, change the ArrayList implementation to be a GapList?

Thomas Mauch

unread,
Oct 25, 2013, 8:23:47 AM10/25/13
to brownies-c...@googlegroups.com

Hi Thibault

GapList manages one "real" gap which can be anywhere: "If a GapList has no gap, one is created if needed. If the gap is at a wrong place, it is moved".
However as GapList is always handling operations at head and tail of the list specially (because adding/remove to/from start/end are very common operation),
it can look like there are three gaps in the array used for storage: the real one, one at the head and one at the tail (as shown in the attached image).

The source coude is in private repository right now and there are currently no plans to migrate it anywhere else. You can however easily download the sources directly from the website or from the maven central repository.

I take the suggestion to add GapList to the JDK as compliment - thanks. So if Oracle contacts me, I'm happy to contribute it, but no, there are no plans, it'a all matter of time...

Regards,
Thomas

Thibault Kruse

unread,
Oct 25, 2013, 9:36:03 AM10/25/13
to brownies-c...@googlegroups.com
Ah, I see. So GapList resides somewhere between CircularArrayList and TreeList (e.g. org.apache.commons.collections.list), in performance and memory consumption. Maybe it can be expressed as a "TreeArrayList" with only two branches. So maybe not exciting enough for a JSR.

Out of curiosity: Do you not want to migrate to a public repo because you think it's too much work, or for other reasons? The issue i see with a private repo is that you only share the java runtime classes source, not the build/test/benchmark files, which are essential parts of the code, and necessary for others to extend it. Maybe that is intentional.

Thibault Kruse

unread,
Oct 25, 2013, 10:35:51 AM10/25/13
to brownies-c...@googlegroups.com
I also found that the idea of keeping a gap is not new:
http://en.wikipedia.org/wiki/Gap_buffer

So your GapList is really a circular Gap Buffer, and I guess the general recommendation is that when adding deleting  a lot from start/end, a CircularArray is best, whereas when changing a lot at a same location in an array, a GapBuffer is bset. That anyone might need to do both seems rare enough that the CircularGapBuffer was never invented.

Thomas Mauch

unread,
Oct 25, 2013, 7:38:23 PM10/25/13
to brownies-c...@googlegroups.com
Have a look at the following performance figures for GapList, CircularArrayList, and TreeList (this are the benchmarks also used in the GapList article):

Get last 50000
GapList                         :  0.000284s, factor     1.000, min/max  0.000229s/ 0.000500s, elapsed  0.528325s, work  0.500030s, 1758 runs , Memory: 400072)
CircularArrayList               :  0.000430s, factor     1.513, min/max  0.000362s/ 0.000757s, elapsed  0.529896s, work  0.500515s, 1163 runs , Memory: 600064)
TreeList                        :  0.004675s, factor    16.437, min/max  0.004161s/ 0.005678s, elapsed  1.029560s, work  0.500244s, 107 runs , Memory: 2200036)
Get first 50000
GapList                         :  0.000267s, factor     1.000, min/max  0.000233s/ 0.000554s, elapsed  0.505663s, work  0.500301s, 1872 runs )
CircularArrayList               :  0.000401s, factor     1.501, min/max  0.000323s/ 0.000751s, elapsed  0.503644s, work  0.500124s, 1247 runs )
TreeList                        :  0.004122s, factor    15.423, min/max  0.003871s/ 0.005547s, elapsed  0.552541s, work  0.502871s, 122 runs )
Get random 50000
GapList                         :  0.001592s, factor     1.000, min/max  0.001457s/ 0.002649s, elapsed  0.505556s, work  0.501554s, 315 runs )
CircularArrayList               :  0.002216s, factor     1.392, min/max  0.002017s/ 0.003655s, elapsed  0.502841s, work  0.500846s, 226 runs )
TreeList                        :  0.017914s, factor    11.251, min/max  0.016903s/ 0.019125s, elapsed  0.548883s, work  0.501586s, 28 runs )
Add last 50000
GapList                         :  0.002418s, factor     1.991, min/max  0.001735s/ 0.021124s, elapsed  1.527609s, work  0.507779s, 210 runs )
CircularArrayList               :  0.001214s, factor     1.000, min/max  0.000757s/ 0.004271s, elapsed  1.670181s, work  0.500351s, 412 runs )
TreeList                        :  0.030254s, factor    24.912, min/max  0.025648s/ 0.068574s, elapsed  1.597100s, work  0.514315s, 17 runs )
Add first 50000
GapList                         :  0.002322s, factor     1.815, min/max  0.001903s/ 0.005850s, elapsed  1.514019s, work  0.501515s, 216 runs )
CircularArrayList               :  0.001279s, factor     1.000, min/max  0.000918s/ 0.002313s, elapsed  1.182593s, work  0.500196s, 391 runs )
TreeList                        :  0.035450s, factor    27.711, min/max  0.027245s/ 0.069694s, elapsed  1.532523s, work  0.567199s, 16 runs )
Add random 50000
GapList                         :  2.669687s, factor    48.662, min/max  2.669687s/ 2.669687s, elapsed  2.673531s, work  2.669687s, 1 runs )
CircularArrayList               :  2.184082s, factor    39.811, min/max  2.184082s/ 2.184082s, elapsed  2.186020s, work  2.184082s, 1 runs )
TreeList                        :  0.054861s, factor     1.000, min/max  0.046039s/ 0.093450s, elapsed  1.180101s, work  0.548615s, 10 runs )
Add random near 50000/0.1
GapList                         :  0.288570s, factor     5.617, min/max  0.288225s/ 0.288915s, elapsed  0.585012s, work  0.577140s, 2 runs )
CircularArrayList               :  2.162942s, factor    42.105, min/max  2.162942s/ 2.162942s, elapsed  2.164936s, work  2.162942s, 1 runs )
TreeList                        :  0.051370s, factor     1.000, min/max  0.044759s/ 0.066556s, elapsed  1.190917s, work  0.513700s, 10 runs )
Add random near 50000/0.01
GapList                         :  0.038012s, factor     1.000, min/max  0.037004s/ 0.041729s, elapsed  0.600566s, work  0.532168s, 14 runs )
CircularArrayList               :  3.299469s, factor    86.801, min/max  3.299469s/ 3.299469s, elapsed  3.301682s, work  3.299469s, 1 runs )
TreeList                        :  0.042915s, factor     1.129, min/max  0.035300s/ 0.087095s, elapsed  1.303448s, work  0.514983s, 12 runs )
Add iter 100000/2
GapList index                   :  0.007354s, factor     1.000, min/max  0.006640s/ 0.011812s, elapsed  0.831418s, work  0.500075s, 68 runs )
CircularArrayList index         :  1.717975s, factor   233.610, min/max  1.717975s/ 1.717975s, elapsed  1.719883s, work  1.717975s, 1 runs )
TreeList index                  :  0.044491s, factor     6.050, min/max  0.037139s/ 0.078990s, elapsed  1.296346s, work  0.533891s, 12 runs )

Performance:
As we can expect, GapList and CircularArrayList behave similar, except if several add/remove operation happens locally: here GapList can be factor 100 faster.
TreeList is never really slow, but each get operation (which is typically the most common one), is at least factor 10 slower - and it uses a lot of memory.

Memory:
For the first test, I also added the size of the resulting collection (for 100'000 elements). CircularArrayList should conceptually have the same size as GapList, so I suppose that they are using a slightly different capacity management than GapList (which mimics the behavior of ArrayList). TreeList however is using more than 5 time more memory.

As written in the article, it was one of the design goals to write a list which should behave well in every use case - and I think this was successful (of course I am biased...). Why is this important? If you're having lists as part of your API, you can only guess what the caller will do. So what implementation should you choose? You can GapList in any case, no need to make decision for one of ArrayList, LinkedList, ArrayDequeu, CircularArrayList, TreeList - which could then be wrong...

Regarding gap buffer: yes, I know this concept - it's hard to invent really new things... But as far as I know, GapList is the first implementation mixing the two concept, so it may be a evolution instead of a revolution...
Regarding the repo: yes, too much work, the collection stuff is just part of a SVN repo which all fits together, may be I can split at some time if I migrate to Git or so.

Thibault Kruse

unread,
Oct 26, 2013, 10:43:53 AM10/26/13
to brownies-c...@googlegroups.com
Thanks.

Going down that route however seems will lead to even better runtime performance for arbitrary operations such as here: http://cg.scs.carleton.ca/~morin/teaching/2402/notes/arrays-iii.pdf
Not sure whether an free implementation exists for that beast.

I can see how it is painful to get subprojects out of SVN, might not be worth the effort for now.

Thomas Mauch

unread,
Oct 28, 2013, 7:39:37 PM10/28/13
to brownies-c...@googlegroups.com
Interesting reading... I googled around and found the site
https://github.com/patmorin/ods/blob/master/java/ods/
which seems to be from this guy and contains some list implementations.

So I added some of them to my test suite, but GapList still seems to be fastest - see the results below.

Get last 50000
GapList                         :  0.000452s, factor     1.000, min/max  0.000294s/ 0.001230s, elapsed  0.533220s, work  0.500051s, 1107 runs , Memory: 400072)
RootishArrayStack               :  0.002769s, factor     6.129, min/max  0.001787s/ 0.006107s, elapsed  0.553739s, work  0.501100s, 181 runs , Memory: 410032)
DualRootishArrayDeque           :  0.003743s, factor     8.286, min/max  0.002900s/ 0.005334s, elapsed  0.557941s, work  0.501576s, 134 runs , Memory: 415592)
DualArrayDeque                  :  0.002601s, factor     5.759, min/max  0.002031s/ 0.003535s, elapsed  0.537627s, work  0.502055s, 193 runs , Memory: 426192)
Get first 50000
GapList                         :  0.000379s, factor     1.000, min/max  0.000277s/ 0.000608s, elapsed  0.502703s, work  0.500224s, 1319 runs )
RootishArrayStack               :  0.001290s, factor     3.401, min/max  0.001069s/ 0.001862s, elapsed  0.508142s, work  0.500408s, 388 runs )
DualRootishArrayDeque           :  0.002780s, factor     7.330, min/max  0.002406s/ 0.003828s, elapsed  0.541044s, work  0.500391s, 180 runs )
DualArrayDeque                  :  0.001395s, factor     3.679, min/max  0.001197s/ 0.002120s, elapsed  0.514832s, work  0.500865s, 359 runs )
Get random 50000
GapList                         :  0.001707s, factor     1.000, min/max  0.001356s/ 0.003142s, elapsed  0.503825s, work  0.501723s, 294 runs )
RootishArrayStack               :  0.003997s, factor     2.342, min/max  0.003425s/ 0.005634s, elapsed  0.509114s, work  0.503620s, 126 runs )
DualRootishArrayDeque           :  0.005689s, factor     3.334, min/max  0.004860s/ 0.007606s, elapsed  0.537081s, work  0.500629s, 88 runs )
DualArrayDeque                  :  0.003591s, factor     2.104, min/max  0.003103s/ 0.005357s, elapsed  0.511619s, work  0.502777s, 140 runs )
Add last 10000
GapList                         :  0.000216s, factor     1.000, min/max  0.000165s/ 0.000971s, elapsed  0.912245s, work  0.500037s, 2317 runs )
RootishArrayStack               :  0.000664s, factor     3.075, min/max  0.000578s/ 0.001647s, elapsed  0.954196s, work  0.500450s, 754 runs )
DualRootishArrayDeque           :  0.003509s, factor    16.260, min/max  0.003164s/ 0.005309s, elapsed  0.977955s, work  0.501795s, 143 runs )
DualArrayDeque                  :  0.001108s, factor     5.133, min/max  0.000959s/ 0.002170s, elapsed  0.920414s, work  0.500701s, 452 runs )
Add first 10000
GapList                         :  0.000230s, factor     1.000, min/max  0.000192s/ 0.000842s, elapsed  0.835665s, work  0.500051s, 2177 runs )
RootishArrayStack               : 10.889707s, factor 47408.979, min/max 10.889707s/10.889707s, elapsed 10.890403s, work 10.889707s, 1 runs )
DualRootishArrayDeque           :  0.000990s, factor     4.310, min/max  0.000920s/ 0.001683s, elapsed  2.142844s, work  0.500964s, 506 runs )
DualArrayDeque                  :  0.000735s, factor     3.202, min/max  0.000640s/ 0.001359s, elapsed  1.078165s, work  0.500137s, 680 runs )
Add random 10000
GapList                         :  0.042018s, factor     4.184, min/max  0.040079s/ 0.046279s, elapsed  0.506260s, work  0.504212s, 12 runs )
RootishArrayStack               :  5.489413s, factor   546.661, min/max  5.489413s/ 5.489413s, elapsed  5.490243s, work  5.489413s, 1 runs )
DualRootishArrayDeque           :  2.807646s, factor   279.598, min/max  2.807646s/ 2.807646s, elapsed  2.811134s, work  2.807646s, 1 runs )
DualArrayDeque                  :  0.010042s, factor     1.000, min/max  0.009651s/ 0.011235s, elapsed  0.549603s, work  0.502085s, 50 runs )
Add random near 10000/0.1
GapList                         :  0.005771s, factor     1.000, min/max  0.005457s/ 0.007091s, elapsed  0.515470s, work  0.502104s, 87 runs )
RootishArrayStack               :  4.985861s, factor   863.904, min/max  4.985861s/ 4.985861s, elapsed  4.986558s, work  4.985861s, 1 runs )
DualRootishArrayDeque           :  2.931530s, factor   507.949, min/max  2.931530s/ 2.931530s, elapsed  2.935029s, work  2.931530s, 1 runs )
DualArrayDeque                  :  0.010580s, factor     1.833, min/max  0.010193s/ 0.011202s, elapsed  0.550754s, work  0.507827s, 48 runs )
Add random near 10000/0.01
GapList                         :  0.001666s, factor     1.000, min/max  0.001515s/ 0.002439s, elapsed  0.547805s, work  0.501596s, 301 runs )
RootishArrayStack               :  8.498374s, factor  5099.744, min/max  8.498374s/ 8.498374s, elapsed  8.499085s, work  8.498374s, 1 runs )
DualRootishArrayDeque           :  2.430136s, factor  1458.287, min/max  2.430136s/ 2.430136s, elapsed  2.433396s, work  2.430136s, 1 runs )
DualArrayDeque                  :  0.009071s, factor     5.443, min/max  0.008675s/ 0.009897s, elapsed  0.557619s, work  0.507965s, 56 runs )
Add iter 10000/2
GapList                         :  0.000753s, factor     1.000, min/max  0.000681s/ 0.001358s, elapsed  0.601939s, work  0.500711s, 665 runs )
RootishArrayStack               :  3.636562s, factor  4829.758, min/max  3.636562s/ 3.636562s, elapsed  3.637223s, work  3.636562s, 1 runs )
DualRootishArrayDeque           :  2.713461s, factor  3603.776, min/max  2.713461s/ 2.713461s, elapsed  2.717311s, work  2.713461s, 1 runs )
DualArrayDeque                  :  0.009079s, factor    12.058, min/max  0.008608s/ 0.009945s, elapsed  0.557737s, work  0.508445s, 56 runs )

Thibault Kruse

unread,
Oct 29, 2013, 5:40:35 AM10/29/13
to brownies-c...@googlegroups.com
Ah, I confused RootishArrayStack with something else. It is strictly optimized reducing memory waste.

I though it was using multiple arrays to represent multiple blocks with gaps in between. Maybe nobody has thought of that, using a GapBuffer with multiple gaps, and using multiple arrays to reduce the duration of growing operations.
Reply all
Reply to author
Forward
0 new messages