Introducing HPPC-RT, a fork for RealTime, with some additions...

120 views
Skip to first unread message

Vincent Sonnier

unread,
Oct 23, 2013, 3:43:36 AM10/23/13
to java-high-performance...@googlegroups.com
Hello everyone,

Following the consel of Dawid Weiss, in this discussion, I finally forked my own HPPC and started toying with it.
Soon after, the toy became not quite a toy anymore, so HPPC-RT appeared. 
My initial goal was to create a "Realtime" HPPC, in the sense that dynamic allocations could be reduced to zero, unless requested by API. 
This consisted in removing temporaries creation, and adding a ricycling mechanism for iterators.

Then in the end, I finally added some more features :
- In place sorts for Arrays.sort replacement,
- Heap-based priority queues,
- Double-linked lists with powerful iteration methods (not the ugly java.util.iterator ones...)
.... and more (here).

Everything is Unit tested, so should work OK.

So I realised it may interest someone else outside me, so here it is:

Vincent

Dawid Weiss

unread,
Oct 24, 2013, 12:43:16 PM10/24/13
to java-high-performance-primitive-collections
This looks pretty cool, Vincent! I'm glad you've decided to release
it! One request -- could you change the group ID in the POM from

com.carrotsearch

to something that would be more related to you? I think you'd have to
do it eventually anyway if you wanted to release it via Maven Central.
You could also repackage the entire source tree to avoid conflicts
with HPPC, although I don't think this is as crucial -- probably not a
biggie in practice.

Dawid
> --
> You received this message because you are subscribed to the Google Groups
> "High Performance Primitive Collections for Java" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> java-high-performance-primi...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Dawid Weiss

unread,
Oct 24, 2013, 12:48:24 PM10/24/13
to java-high-performance-primitive-collections
Oh, I added this so that whoever downloads HPPC can try your fork too!
https://github.com/carrotsearch/hppc/blob/master/FORKS

Dawid

Vincent Sonnier

unread,
Oct 24, 2013, 4:05:10 PM10/24/13
to java-high-performance...@googlegroups.com
Well, thank you David ! 

To be honest, I know next to nothing about Maven, I've just renamed some strings in some files :)

Still, I will consult my team and firm (legal an confidential matters....) , to know if they aggree to some meaningfull namespace with their name in it,
else I will make one out of my mind not to conflict with yours.  

Vincent

Dawid Weiss

unread,
Oct 24, 2013, 5:04:00 PM10/24/13
to java-high-performance-primitive-collections
> To be honest, I know next to nothing about Maven, I've just renamed some
> strings in some files :)

Ha. Don't know if I can recommend that you enter the Maven world --
it's a land of surprises and twists ;) And people are very opinionated
about it so I shut up now.

Anyway, the "group id" is what you eventually get a permission to
publish your code at, for example:
http://central.maven.org/maven2/com/carrotsearch/

You don't have to use your company's identifier -- it can be anything
that uniquely identifies you (or the code), for example the project's
github namespace:

http://central.maven.org/maven2/com/github/

Dawid

Vincent Sonnier

unread,
Oct 24, 2013, 9:52:22 PM10/24/13
to java-high-performance...@googlegroups.com
Truth be told, I don't plan to publish anywhere, except on GitHub.

Besides, as far as I can get it, if I change the "group id", does it not mean 
that the java sources namespace must match it, like for instance

com.nowsomethingcompletelydifferentwithmynameinit.hppc. ... 

Then , what of the  future merges, back/forth with the original HPPC, or a drop-in replacement ?
In addition, that would be somehow "taking  credit" for something  I didn't created entirely ?

Vincent

Dawid Weiss

unread,
Oct 25, 2013, 6:40:40 AM10/25/13
to java-high-performance-primitive-collections
The pom's group id has nothing to do with actual packaging/ sources --
it can be independent. The only difference is that if you decide to
publish is in maven central it'll end up in your "namespace".

As for repackaging sources -- the project is licensed in a way that
allows it. The only "credit" you probably should include is the origin
of the fork. This would be nice.

Dawid
> --
> You received this message because you are subscribed to the Google Groups
> "High Performance Primitive Collections for Java" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> java-high-performance-primi...@googlegroups.com.

Vincent Sonnier

unread,
Oct 25, 2013, 7:16:04 AM10/25/13
to java-high-performance...@googlegroups.com
So the class naming will continue to be 'com.carrotsearch.hppc' while the group id could be different ? Right then.
About the credit, which form would it take ? Some kind of disclamer in README, LICENCE, CHANGES ?

Thank you

Vincent 

Dawid Weiss

unread,
Oct 25, 2013, 7:52:59 AM10/25/13
to java-high-performance-primitive-collections
Ideally I'd like you to repackage too but I'm not fussy -- if you
don't have the time to do it, fine. The credit can be given in a
NOTICE file or anywhere else -- again, it's just a kind gesture by
you, not a strict requirement.

Dawid
>> > java-high-performance-primi...@googlegroups.com.
>> > For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "High Performance Primitive Collections for Java" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> java-high-performance-primi...@googlegroups.com.

Tatu Saloranta

unread,
Oct 25, 2013, 1:22:43 PM10/25/13
to java-high-performance...@googlegroups.com
On Thu, Oct 24, 2013 at 6:52 PM, Vincent Sonnier <vson...@gmail.com> wrote:
Truth be told, I don't plan to publish anywhere, except on GitHub.

If not, how would others use it? Maven is becoming de facto dependency management system (directly, and being the backend for other systems to resolve their dependencies). So expectation from most developers is that there is a Maven repository from which released versions are found.

As far as I know, github does not have its own Maven repo (would be glad to proven wrong here).
Because of this, most OSS I know of uses Sonatype's OSS repo:

https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide

which allows easy release of versions (at least when using Maven).

So I would strongly encourage using this, if you would like others be able to use it. It may seem like lots of work, but after initial set up it is easy to use and works quite nicely.
 
But it is of course your project and you can choose to do whatever you want. I am suggesting above based on my experiences, on getting asked for maven deployed jar about second time someone finds my new project. :-D

Besides, as far as I can get it, if I change the "group id", does it not mean 
that the java sources namespace must match it, like for instance

com.nowsomethingcompletelydifferentwithmynameinit.hppc. ... 


No. Group id is just logical name, which may or may not match physical Java package names. It may make sense to keep two the same, but it is not a requiremnt.

Then , what of the  future merges, back/forth with the original HPPC, or a drop-in replacement ?
In addition, that would be somehow "taking  credit" for something  I didn't created entirely ?

 I let David comment on that, but I don't think group id or package names matter too much wrt credit: you can document this separately. And since project itself has dependency on HPPC, it seems reasonably clear.
Finally, you can keep hppc in artifact name, to keep it clear that this is derivative.

Just my 2c,

-+ Tatu +-
 

Vincent


   

Le jeudi 24 octobre 2013 23:04:00 UTC+2, Dawid Weiss a écrit :
> To be honest, I know next to nothing about Maven, I've just renamed some
> strings in some files :)

Ha. Don't know if I can recommend that you enter the Maven world --
it's a land of surprises and twists ;) And people are very opinionated
about it so I shut up now.

Anyway, the "group id" is what you eventually get a permission to
publish your code at, for example:
http://central.maven.org/maven2/com/carrotsearch/

You don't have to use your company's identifier -- it can be anything
that uniquely identifies you (or the code), for example the project's
github namespace:

http://central.maven.org/maven2/com/github/

Dawid

--
You received this message because you are subscribed to the Google Groups "High Performance Primitive Collections for Java" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-high-performance-primi...@googlegroups.com.

Dawid Weiss

unread,
Oct 25, 2013, 4:10:29 PM10/25/13
to java-high-performance-primitive-collections
>> Then , what of the future merges, back/forth with the original HPPC, or a
>> drop-in replacement ?
>> In addition, that would be somehow "taking credit" for something I
>> didn't created entirely ?
>
> I let David comment on that, but I don't think group id or package names
> matter too much wrt credit: you can document this separately.

I somehow missed that, sorry. Yeah -- like Tatu said, package names
really don't matter much. I would still repackage -- this isn't much
of an issue with modern development tools (reorganize imports...), so
after you switch implementations you can just update your sources and
voila. Whether a drop-in replacement idea is worth persuing... don't
have an opinion on that -- I think with time you may want to add
further things or deviate from the baseline so drop-in won't work
anyway.

Dawid

Vincent Sonnier

unread,
Oct 29, 2013, 4:20:20 AM10/29/13
to java-high-performance...@googlegroups.com
Good morning,

Thanks tsaloranta and David for your input. 
I'm considering  publishing on Sonatype, with maybe a repackaging, but 
I'm waiting for my team/firm opinion about the namings and grouId and such,since we are (now) 
using HPPC-RT internally. I'll also add the NOTICE eventually.
So in the meantime, I fear everybody is stuck with the vintage way of : download on GitHub ==> "Add to build path" way of using this library :)

Dawid, If you are interested in any feature of mine, I'll be glad enough to give you pull requests, it the least I can do :). By the way, th fact that
the "templates", either source or tests, are valid Java is just amazing for ease of developpement.  

Have a good day.

Vincent

Dawid Weiss

unread,
Oct 29, 2013, 4:25:20 AM10/29/13
to java-high-performance-primitive-collections
> So in the meantime, I fear everybody is stuck with the vintage way of :
> download on GitHub ==> "Add to build path" way of using this library :)

el clasico :)

> Dawid, If you are interested in any feature of mine, I'll be glad enough to
> give you pull requests, it the least I can do :).

I think these projects have slightly different goals and it's better
not to mix them. It's good to have the option to switch to what you
need, unifying the codebase will bring more harm than good I think.

If you have a spare cycle it'd be great to see some benchmarks/
results of how your fork manages to solve the problems you wanted to
solve (how's the performance, how many iterator runs do you need to
actually see the difference in non-allocating vs. allocating, etc.)?

> the "templates", either source or tests, are valid Java is just amazing for
> ease of developpement.

Thanks, appreciated. That was actually my primary goal. I remember
looking at fastutil, PCJ and Trove and it just seemed wrong that you
had to go through the whole generate/compile phase every time. These
templates are far from perfect but in my humble opinion they work
better than conditionals preprocessed with external tools.

Dawid

Vincent Sonnier

unread,
Dec 7, 2013, 6:48:48 AM12/7/13
to java-high-performance...@googlegroups.com
Hello again.
I'm just passing by to say that I'm (the project) is not dead at all, I just didn't add time and motivation to work on it. 
All suggestions about repackaging and Maven publication are currently considered in house. 
Alas, the bigger the firm, the slower descisions are made.

On the developpement side, I'm planning 2 future evolutions : 

1) A simple evoltution of the template processor, to generate boolean vesrsions where it make sense (lists, dequeues, values in maps) as it was suggested to me.
2) A tree-based, ordered data structure. The goal is to provide range queries, in order traversal, and such.

As usual, performance, minimization of objects, ability to zero-runtime allocation...etc, and also cache conscious design has been taken into considration.

So far: 
- Classic AVL, Red-Black trees (left-leaning or not) are excluded for their complexity. 
- Splay trees / Scapegoat trees looking too rough in rebalancing policy, which is bad in realtime systems. We don't want a O(n) op occuring,
even if the later ops are much faster. Also, they seem to have inferior performance in practice. 
 
- Probabilistic trees looked better in the average (!!!) in terms of simplicity and performance, counting on the power of randomization to minimize the average cost : treaps, and  the shuffle tree
The drawback is  that that their height is not limited to O(log n) height, which means iteration is complicated without dynamic memory allocation. (even if such bad luck is supposed to be "vanishly small"). 

- B(+)-trees : This last choice may be the one I eventually choose to implement, because it is the only one having a cache-conscious  design, since it was initially being used for slow disk-access in mind. (filesystems, databases)

Problem is, RAM is also highly hierarchical and so the same problem arises, which all the other BST solutions neglect.

So I'm finally considering a port of the STX B+ Tree C++ implementation (Github). It's general philosophy seems to match my goals.

I may return on dev soon, but do not bet on a Chritmas present. 

Vincent   
Reply all
Reply to author
Forward
0 new messages