Suggestion: hppc-contrib

25 views
Skip to first unread message

Andrew Clegg

unread,
Mar 15, 2015, 1:07:28 PM3/15/15
to java-high-performance...@googlegroups.com
Hi everyone, first post, so a quick message first: Thanks Dawid and collaborators for making this really useful library available.

I've been working a lot with HPPC recently and have got a couple of ideas of my own that I've been prototyping, e.g. an arraylist backed by multiple arrays in a chain. (So you don't have to reallocate and copy memory if you're aggregating data from multiple sources, and you can grow without moving the whole lot around...)

Putting these in a separate maven project is a little awkward -- it's possible to install the maven templating plugin from the original project and hook it into the new project's build, but there are other issues like useful testing utilities which are package-private and not included in the hppc jar.

But I'm a bit reluctant to fork it and make major changes there, because my needs may not be mainstream enough (or sufficiently well written!) to go back into trunk, and then inevitably it'd end up diverging and being a pain to maintain.

Dawid, how would you feel about setting up a separate hppc-contrib project, with access to the same build and test tools, but an "anything goes" policy and very lightweight review process? (i.e. don't break other people's tests.)

This might also be a good place for hppc-rt to live if Vincent agreed, and it might be a good way to test out new ideas that could later make it into the main codebase.

Or there might be better ways to approach this -- splitting out the build and test tools into a separate package that other people could use more easily, or maybe something involving optional git subtrees/submodules that can pull other codebases in. Interested to hear other people's suggestions.

Thanks,

A.

Dawid Weiss

unread,
Mar 15, 2015, 1:19:12 PM3/15/15
to java-high-performance-primitive-collections
Hi Andrew!

Thanks for your kind words, I'm glad you're finding the library useful.

> I've been working a lot with HPPC recently and have got a couple of ideas of
> my own that I've been prototyping, e.g. an arraylist backed by multiple
> arrays in a chain.

Funny -- I actually need something like this myself (the reasons being
not just memcpy on grow, but also the limits on the maximum size of
Java arrays).

> Putting these in a separate maven project is a little awkward -- it's
> possible to install the maven templating plugin from the original project
> and hook it into the new project's build, but there are other issues like
> useful testing utilities which are package-private and not included in the
> hppc jar.

I've been changing a *lot* in HPPC recently -- check out the master
branch. I am on vacation this week, but I'll be working on it once I
get back with the intention of releasing an API-breaking version as
soon as I am happy with the changes.

While it's not a good idea for a library to expose its internals in
general, in HPPC's case there are reasons for doing so (we sometimes
need to fiddle with the buffers directly). So if there's something
that is package-private (and hasn't been removed) we should probably
open it up. If somebody wants an implementation-behind-an-api then
Koloboke and fastutil are probably better choices anyway (because they
implement JUC interfaces).

> But I'm a bit reluctant to fork it and make major changes there, because my
> needs may not be mainstream enough (or sufficiently well written!) to go
> back into trunk, and then inevitably it'd end up diverging and being a pain
> to maintain.

Yeah, I understand. But it's the nature of a fork -- the more changes
you introduce, the more you'll have to maintain... eventually the
overhead will probably cause the fork to become a stand-alone
project. Vincent has implemented so many changes (and good for him!)
that it's effectively a different project now :)

Also, I admit we do have some selfish needs in HPPC and I would like
to have this freedom of deciding what we want to introduce, what to
maintain and what to drop.

> Dawid, how would you feel about setting up a separate hppc-contrib project,
> with access to the same build and test tools, but an "anything goes" policy
> and very lightweight review process? (i.e. don't break other people's
> tests.)

But how would it be different from a fork? For example when I make
some major refactorings in the codebase I'd need to go and fix all the
contribs as well? This wouldn't work for me. As much as I'd love to do
it, I just won't have the time :(

> Or there might be better ways to approach this -- splitting out the build
> and test tools into a separate package that other people could use more
> easily, or maybe something involving optional git subtrees/submodules that
> can pull other codebases in.

The template processor should be fairly functional, especially that on
master it's a fully fledged Maven mojo. We can make it part of an
official release (I don't think it's currently deployed to Maven
Central). It is a simple tool, but if you (or others) find it useful
then sure thing. Would this work?

Dawid

Andrew Clegg

unread,
Mar 15, 2015, 2:42:58 PM3/15/15
to java-high-performance...@googlegroups.com

On Sunday, 15 March 2015 17:19:12 UTC, Dawid Weiss wrote:
 
Funny -- I actually need something like this myself (the reasons being
not just memcpy on grow, but also the limits on the maximum size of
Java arrays).

I would offer to share mine but it's only half finished, and may well be obsolete already if you've overhauled the API recently.

If your implementation could support construction from existing arrays, and access to the underlying arrays after construction, e.g. for bulk copying, then it would do everything I had been planning to do...

While it's not a good idea for a library to expose its internals in
general, in HPPC's case there are reasons for doing so (we sometimes
need to fiddle with the buffers directly). So if there's something
that is package-private (and hasn't been removed) we should probably
open it up.  If somebody wants an implementation-behind-an-api then
Koloboke and fastutil are probably better choices anyway (because they
implement JUC interfaces).


Internals and BitUtil in main are package private, that would be useful to change. In the test classes, the problem is just that these aren't included in the jar. There is some stuff in here that would be generally useful to people writing extensions though. Would it be possible to publish a separate test jar, the same way e.g. Lucene does?

> Dawid, how would you feel about setting up a separate hppc-contrib project,
> with access to the same build and test tools, but an "anything goes" policy
> and very lightweight review process? (i.e. don't break other people's
> tests.)

But how would it be different from a fork? For example when I make
some major refactorings in the codebase I'd need to go and fix all the
contribs as well? This wouldn't work for me. As much as I'd love to do
it, I just won't have the time :(

Well it would be different in the trivial sense that it wouldn't contain the main codebase, except as a dependency. But... I hear you about not being able to maintain it through breaking API changes. And on reflection, leaving it up to the individual contributors will never be sustainable, sad fact of life...

The only other thing it could do is provide a place to grow an ecosystem of useful hppc-compatible stuff, e.g. algorithms and data structures that have been templatized and are compatible with all your interfaces. But there are other ways to do that.

 The template processor should be fairly functional, especially that on 
master it's a fully fledged Maven mojo. We can make it part of an
official release (I don't think it's currently deployed to Maven
Central). It is a simple tool, but if you (or others) find it useful
then sure thing. Would this work?

I for one would find that really useful! Especially if there was an hppc-test.jar as well. Then I could just write anything I needed as a totally separate project without breaking compatibility.

Thanks!

A.


 

Dawid Weiss

unread,
Mar 15, 2015, 3:50:26 PM3/15/15
to java-high-performance-primitive-collections
> I would offer to share mine but it's only half finished, and may well be
> obsolete already if you've overhauled the API recently.

That's all right, I just mentioned it because we needed it recently.
Such "big" arrays are already part of fastutil, I just wanted to
experiment with block sizes to see what kind of block size minimizes
runtime penalty.

> Internals and BitUtil in main are package private, that would be useful to change.

Internals has been removed -- "hash" is now moved to a bit mixing class:
https://github.com/carrotsearch/hppc/blob/master/hppc/src/main/java/com/carrotsearch/hppc/BitMixer.java

This just better reflects what we're after -- we need ideally uniform
bit-mixer which would scatter values across the integer domain (we
can't address larger arrays in Java anyway). I removed MurmurHash and
was thinking about dropping bit utilities as well, actually. Adrien
Grand and others did a lot of improvements in the bit fiddling domain
over in Lucene; perhaps I'll extract those classes and move them here
(don't know how much cost it'd be).

> In the test classes, the problem is just that these aren't included
> in the jar. There is some stuff in here that would be generally useful to
> people writing extensions though. Would it be possible to publish a separate
> test jar, the same way e.g. Lucene does?

Lucene test utilities are actually dedicated utilities for writing
Lucene tests... those JARs don't include tests themselves though. I
don't think there's much test-related specific code in HPPC... but I'd
have to take a look. Randomized testing is used, but it's a separate
project already.

> I for one would find that really useful! Especially if there was an
> hppc-test.jar as well.

Sure, I'll look into that.

Dawid

Andrew Clegg

unread,
Mar 15, 2015, 4:31:21 PM3/15/15
to java-high-performance...@googlegroups.com
On Sunday, 15 March 2015 19:50:26 UTC, Dawid Weiss wrote:
 
> In the test classes, the problem is just that these aren't included
> in the jar. There is some stuff in here that would be generally useful to
> people writing extensions though. Would it be possible to publish a separate
> test jar, the same way e.g. Lucene does?

Lucene test utilities are actually dedicated utilities for writing
Lucene tests... those JARs don't include tests themselves though. I
don't think there's much test-related specific code in HPPC... but I'd
have to take a look. Randomized testing is used, but it's a separate
project already. 

I dug around through my attempts again, and apart from a couple of potentially useful things in TestUtils, the only thing that caused issues for me was AbstractKTypeTest.

If you're writing a class that obeys the same contract as an existing one, but with a different storage backend, it's useful to be able to make sure all the same tests pass, but replicating KTypeArrayListTest was annoying as I couldn't inherit from AbstractKTypeTest.

The randomized testing library is really cool, I've used that before in other projects.

> I for one would find that really useful! Especially if there was an
> hppc-test.jar as well.

Sure, I'll look into that.

Nice one, thanks again!
 
Reply all
Reply to author
Forward
0 new messages