MultiKey?

3,713 views
Skip to first unread message

Miki Tebeka

unread,
Dec 18, 2010, 10:37:23 AM12/18/10
to guava-...@googlegroups.com
Greetings,

Is there an equivalent of org.apache.commons.collections.keyvalue.MultiKey in Guava?

Thanks,
--
Miki

Chris Povirk

unread,
Dec 18, 2010, 11:47:28 AM12/18/10
to guava-...@googlegroups.com, miki....@gmail.com
Hi, Miki,

MultiKey has several possible uses, most of which we've tried to
support with more specialized classes.


1. map.put(new MultiKey(url, today), latencyHistogram);

Here we're storing the latency distribution for a given URL on a given
day. The keys are more or less independent. That's kind of a vague
statement, and I can't come up with a better one, so I'll give two
ideas that I hope will convey the flavor of what I mean by
"independent keys": First, if I know we have data for
http://www.google.com and I know we have data for Tuesday, we might
well have data for http://www.google.com on Tuesday. Second, I might
be interested in "all latency distributions for http://www.google.com,
regardless of day" or "all latency distributions today, regardless of
URL."

For this kind of data, we have the Table class:
Table<Keyword, LocalDate, LatencyHistogram>

Here it makes sense to ask for "all rows" (rowKeySet(); "all
keywords") or "all columns" (columnKeySet(); "all dates"). It also
makes sense to as for either "all values for keyword 'guava'"
(row(guava)) or "all values for today" (column(today)). Also,
Table<Keyword, LocalDate, LatencyHistogram> is much more explicit than
Map<MultiKey, LatencyHistogram>, especially if you end up with
multiple MultiKey-keyed maps.


2. map.put(new MultiKey(userName, jobName), jobDetails);

Here we still have a fixed number of keys, but there's a clearer
relationship between them. There might be two users running jobs with
the same name, but it's probably not very interesting to see all users
who happened to name one of their jobs "collector." There's a clear
hierarchy: user, then job.

Table can still be useful here. HashBasedTable and TreeBasedTable
aren't stored in memory as 2D arrays, so it's not wasteful for one row
to have values for columns A, B, and C and another key to have values
for columns for D, E, and F. Table again provides you with the
ability to ask for all users or all jobs for a given user, something
that you can't easily do with MultiKey or other solutions.

However, a lot of times it makes sense to simply define your own class:

public final class JobId {
private final String userName;
private final String jobName;

public JobId(String userName, String jobName) {
this.userName = checkNotNull(userName);
this.jobName = checkNotNull(jobName);
}

public String getUserName() {
return userName;
}

public String getJobName() {
return jobName;
}

@Override
public boolean equals(Object o) {
if (o instanceof JobId) {
JobId other = (JobId) o;
return userName.equals(other.userName)
&& jobName.equals(other.jobName);
}
return false;
}

@Override
public int hashCode() {
return userName.hashCode() * 37 + jobName.hashCode();
}
}

Java makes this much harder than it ought to be, but it took me maybe
2 minutes, even without an IDE to autogenerate much of it for me.

As in my first example, we benefit here from a clearer type name:
Map<JobId, JobDetails> vs. Map<MultiKey, JobDetails>. We also gain
the ability to use JobId elsewhere, like as a *value* to a Map.
(True, we could have a Map<Port, MultiKey> if we really wanted, but if
the <userName, jobName> identifier is coming up that often, it's
probably time to acknowledge it as a first-class concept and give it
its own type.) Nominal types like these can also help cut down on
methods with parameter types <String, String, String, String> (src
user, src job, dest user, dest job), replacing them with <JobId,
JobId> (src job, dest job).


3. map.put(pathComponents, fileMetadata);

This is somewhat contrived because I couldn't come up with a
non-contrived example, but it's conceivable that we'd store ["etc",
"passwd"] instead of "/etc/passwd" for some reason or other. Here we
have keys of arbitrary length, so we can't use the fixed-key-count
Table or custom-class solutions. If the key type is really an
arbitrarily sized series of items, we need another solution.

Luckily, we have a word for "arbitrarily sized series of items" in Java: List.
Map<ImmutableList<PathComponent>, FileMetadata>

List provides us with access to the components, should we require it,
including its usual methods like subList. Additionally, we can
specialize the element type (here, PathComponent, or more likely plain
"String") instead of MultiKey's "Object."


There are cases that these three approaches can't quite cover, like a
3D Table-like structure. But MultiKey is even less flexible. (Think
of it this way: If nothing else, you can always substitute
ImmutableList<?> for MultiKey.)

Chris Povirk

unread,
Dec 18, 2010, 12:04:54 PM12/18/10
to guava-...@googlegroups.com, miki....@gmail.com

One more advantage I'd meant to include: If you ever need to add a new
dimension to a key (maybe it's OK for a user to run two jobs with the
same name as long as they're in different VMs), none of your Maps need
to change. You simply update the JobId class to include a vmId field,
and only at the edges of the system, where you initially create the
JobId objects, do you need to change your code. And the compiler can
help you, unlike in the MultiKey case.

Miki Tebeka

unread,
Dec 18, 2010, 12:47:10 PM12/18/10
to guava-...@googlegroups.com, miki....@gmail.com
Hello Chris,

Thanks for the detailed answer.



2. map.put(new MultiKey(userName, jobName), jobDetails);

...



However, a lot of times it makes sense to simply define your own class:

...

Java makes this much harder than it ought to be, but it took me maybe
2 minutes, even without an IDE to autogenerate much of it for me.

This  is the case I was aiming for. I'd like to avoid adding code just for a key class. It might be 2 minutes to implement, but  about 25 more lines of code that are just noise.  Also since I'm probably going to write several of these "key" classes I'd rather use one utility class for that.

All the best,
--
Miki

Miki Tebeka

unread,
Dec 18, 2010, 12:50:24 PM12/18/10
to guava-...@googlegroups.com, miki....@gmail.com
One more thing ... :)

3. map.put(pathComponents, fileMetadata);

Map<ImmutableList<PathComponent>, FileMetadata>

I can't use this one since the items the make the key are not from the same class (and I don't want to cast everything to Object).

Chris Povirk

unread,
Dec 18, 2010, 1:55:08 PM12/18/10
to guava-...@googlegroups.com, miki....@gmail.com
>> 2. map.put(new MultiKey(userName, jobName), jobDetails);
>>
>> ...

>>
>> However, a lot of times it makes sense to simply define your own class:
>>
>> ...

>>
>> Java makes this much harder than it ought to be, but it took me maybe
>> 2 minutes, even without an IDE to autogenerate much of it for me.
>
> This is the case I was aiming for. I'd like to avoid adding code just for a
> key class. It might be 2 minutes to implement, but about 25 more lines of
> code that are just noise. Also since I'm probably going to write several of
> these "key" classes I'd rather use one utility class for that.

This is the crux of the matter, and there's certainly no good option,
just various bad ones. The intermediate option, which I've carefully
avoided mention of, is of course Pair<A, B> (and perhaps Triple<A, B,
C>, etc.). These classes are... contentious :) We've also got some
internal , reflection-based value-type magic, which may some day see
the light of day.
http://groups.google.com/group/guava-discuss/browse_thread/thread/185e10c81bb482c2/bbbf4f0f1dcffaf9

You can probably tell that I'm not a big fan of MultiKey and Pair, but
I'll admit that my dislike for them probably goes beyond what they
objectively deserve. It's probably become symbolic to me of the
conflict between hacky, short-term solutions and overengineered,
long-term solutions. And I'm nothing if not an overengineer.

>> 3. map.put(pathComponents, fileMetadata);
>>
>> Map<ImmutableList<PathComponent>, FileMetadata>
>
> I can't use this one since the items the make the key are not from the same
> class (and I don't want to cast everything to Object).

For the closest possible equivalent to MultiKey, declare the key type
as ImmutableList<?> to avoid having to cast your arguments or
explicitly specify ImmutableList.<Object>of(...):

Map<ImmutableList<?>, FileMetadata> metadata = newHashMap();
metadata.put(ImmutableList.of("etc", "passwd"), defaultMetadata); // fine

Dimitris Andreou

unread,
Dec 18, 2010, 2:12:02 PM12/18/10
to guava-...@googlegroups.com, miki....@gmail.com
-or- you could start using scala (for case classes, not tuples!) :)

True story: I was pair programming (basically just watching), and the
need comes up to use as a map key the pair of two things, which
unfortunately were both strings too. So, Pair.of(thingy1, thingy2) of
course! I mentioned that this is, but let it go there were more
important things to focus on than the drudgery of writing the
boilerplate of a specific class. (But my "protest" was probably
communicated indirectly, since I was asking all the time what's the
first string and what's the second!). There were few mentions of this
Pair<String, String>. I left, and the next day I reviewed the code.
Now, there were 22 appearances of Pair<String, String> in a class of
~200 lines body! So I said that now we really ought to create a new
type here, and lo and behold, the type was created....and it was a
subclass of Pair<String, String> :)

> --
> guava-...@googlegroups.com.
> http://groups.google.com/group/guava-discuss?hl=en
> unsubscribe: guava-discus...@googlegroups.com
>
> This list is for discussion; for help, post to Stack Overflow instead:
> http://stackoverflow.com/questions/ask
> Use the tag "guava".
>

Nikolas Everett

unread,
Dec 18, 2010, 4:21:45 PM12/18/10
to guava-...@googlegroups.com

Vg

We don this kind of thing all the time and Project Lombok makes those classes 5 lines instead 25.

Nik

Tim Peierls

unread,
Dec 18, 2010, 5:01:08 PM12/18/10
to guava-...@googlegroups.com, miki....@gmail.com
On Sat, Dec 18, 2010 at 12:47 PM, Miki Tebeka <miki....@gmail.com> wrote:
I'd like to avoid adding code just for a key class. It might be 2 minutes to implement, but about 25 more lines of code that are just noise.

I've seen far too many examples of programs where it seems as though the only quality metric applied is 1/LoC, as if the time spent typing or (yuck) auto-generating is the only important part of a program's existence.

People are (or should be) more interested in readability, which is not something easily described by LoC. Those extra 25 lines of code might be tedious to type, but it's not hard to scroll past them when reading (I use a plain text editor, but most IDEs will let you collapse those lines neatly), and they can guide the reader in understanding what you're doing. Of course *you* know what you mean, but putting a name on something, even if you only use it in one place, helps convey the idea clearly. (Being forced to choose a name for something is a valuable exercise in itself.)

It's the same thing that goes on when you give names to intermediate values in a complicated expression instead of putting the whole thing on one line. It might take up more vertical space, but the reader can better follow the logic of the expression when it's broken up into named pieces. Sometimes it's nice to be reminded what the type of some sub-expression is.

I'm not against concision. I look forward to being able to use lambdas in Java in the future. But I predict that there will be many egregious uses of lambdas where readability will suffer, all because the programmer wanted to save a few keystrokes and didn't care about the reader.

--tim

lacroix1547

unread,
Dec 18, 2010, 6:13:18 PM12/18/10
to guava-...@googlegroups.com, miki....@gmail.com
I am a big fan of Pair<F, S>
The worst use case of Pair<F, S> is precisely when both F and S are of the same type.
Its pretty creative to subclass Pair in that case.
But you cant always can come up with better names then first-second, first-last, previous-new.

Still, when you really just want to pair two different types of object,
Pair<String, SomePojo> will be much more explicit than StringSomePojo.java about what is going on.
For the reader, the Pair makes it clear that this is a pair, just a pair, with no hidden wisdom.
The reader will eventually get there too with StringSomePojo.java, but just after scanning those extra 25 lines once or twice, in fear of hidden wisdom, as usual.

Tim Peierls

unread,
Dec 18, 2010, 6:40:06 PM12/18/10
to guava-...@googlegroups.com, miki....@gmail.com
On Sat, Dec 18, 2010 at 6:13 PM, lacroix1547 <simon....@gmail.com> wrote:
I am a big fan of Pair<F, S>
The worst use case of Pair<F, S> is precisely when both F and S are of the same type.
Its pretty creative to subclass Pair in that case.
But you cant always can come up with better names then first-second, first-last, previous-new.

I think Chris Povirk's original response demonstrated that there are often better ways (in Guava) in those cases where you can't come up with a better name.

 
Still, when you really just want to pair two different types of object,
Pair<String, SomePojo> will be much more explicit than StringSomePojo.java about what is going on.
For the reader, the Pair makes it clear that this is a pair, just a pair, with no hidden wisdom.

Except for the fact that its only use is as a key. I would rather see StringSomePojoKey than Pair<String, SomePojo> any day.

 
The reader will eventually get there too with StringSomePojo.java, but just after scanning those extra 25 lines once or twice, in fear of hidden wisdom, as usual.

Do people really read code that way? I don't. I scan everything, but I get most of my understanding from names. I find Java particularly congenial for scanning. I trust that if there's "wisdom" in a class I'm scanning, the author will have made it stand out for me.

--tim

Miki Tebeka

unread,
Dec 18, 2010, 8:38:57 PM12/18/10
to guava-...@googlegroups.com, miki....@gmail.com

People are (or should be) more interested in readability, which is not something easily described by LoC.

I agree, but IMO
    MultiKey key = new MultiKey(obj.getProperty1(), obj.getProperty2());
    Value value = myHashMap.get(key)

is pretty readable and also uses less code.

Tim Peierls

unread,
Dec 18, 2010, 9:14:54 PM12/18/10
to guava-...@googlegroups.com

Less code than what? More readable than what? How about

  Value value = myHashMap.get(
      keyFromObj(obj));

Looks nicer to me.

--tim

Miki Tebeka

unread,
Dec 18, 2010, 9:34:25 PM12/18/10
to guava-...@googlegroups.com

Less code than what? More readable than what? How about

  Value value = myHashMap.get(
      keyFromObj(obj));

Looks nicer to me.

It's the same readability a the a MultiKey code posted but does not have keyFromObj class definition (this is what I mean by "less code").

Tim Peierls

unread,
Dec 19, 2010, 12:32:34 AM12/19/10
to guava-...@googlegroups.com
Really? You find this

  MultiKey key = new MultiKey(obj.getProperty1(), obj.getProperty2());
  Value value = myHashMap.get(key)

just as easy to understand as this

  Value value = myHashMap.get(keyFromObj(obj));

?

The keyFromObj method definition can be in one place, but its use can be all over. The "new MultiKey(obj.getProperty1(), obj.getProperty2())" incantation has to made everywhere you need a key for the map. That might be only a couple of places now, but it could easily change.

Furthermore, if you use MultiKey, you miss an opportunity to document your intent in the type of the map:

  private final Map<BrandAspectRatioKey, Integer> tvInventory;

tells me more than 

  private final Map<MultiKey<String, Double>, Integer> tvInventory;

This is getting way OT -- entirely my fault -- so let me try to drag it back to relevance: The Guava team has done an amazing job of providing a lot of useful tools without letting Guava turn into a kitchen sink. The standard for inclusion has to be set very high. Chris Povirk's response made an excellent case for why MultiKey doesn't currently meet that standard.

--tim

Kevin Bourrillion

unread,
Dec 19, 2010, 12:57:27 PM12/19/10
to Tim Peierls, guava-...@googlegroups.com, miki....@gmail.com
I agree with you completely, Tim -- but the magnitude of those 25 lines demands a better solution be invented.  Pair certainly ain't it.  Lombok has its ups and downs (handwave).  Our internal reflection-based utility is workable, but if you start to use it a lot the performance tax is very real.  We still need something better....




--
guava-...@googlegroups.com.
http://groups.google.com/group/guava-discuss?hl=en
unsubscribe: guava-discus...@googlegroups.com
 
This list is for discussion; for help, post to Stack Overflow instead:
http://stackoverflow.com/questions/ask
Use the tag "guava".



--
Kevin Bourrillion @ Google
http://guava-libraries.googlecode.com

Tim Peierls

unread,
Dec 19, 2010, 4:21:17 PM12/19/10
to Kevin Bourrillion, guava-...@googlegroups.com, miki....@gmail.com
On Sun, Dec 19, 2010 at 12:57 PM, Kevin Bourrillion <kev...@google.com> wrote:
I agree with you completely, Tim -- but the magnitude of those 25 lines demands a better solution be invented.  Pair certainly ain't it.  Lombok has its ups and downs (handwave).  Our internal reflection-based utility is workable, but if you start to use it a lot the performance tax is very real.  We still need something better....

I don't see the urgency, but I trust that you'll do it right. Short of code generation, though, I don't have any ideas for a truly general solution that wouldn't be likewise taxed; I would rather chip away at the use cases with more targeted data structures like Table.

--tim

Nikolas Everett

unread,
Dec 19, 2010, 7:49:52 PM12/19/10
to Kevin Bourrillion, Tim Peierls, guava-...@googlegroups.com, miki....@gmail.com
On Sun, Dec 19, 2010 at 12:57 PM, Kevin Bourrillion <kev...@google.com> wrote:
Lombok has its ups and downs (handwave).

I know it is off topic but I'd certainly like to hear your impressions of Lombok and those of the other Guava users/developers.  My peers and I love it but it certainly has failings.  I just can't get over the novelty of not having generated getters, setters, hashcodes, and equals.

--Nik

Kevin Bourrillion

unread,
Dec 20, 2010, 11:44:43 AM12/20/10
to Nikolas Everett, Tim Peierls, guava-...@googlegroups.com, miki....@gmail.com
I looked back at our internal discussions and unfortunately can't summarize them too easily.  There were just a hodgepodge of negative impressions and positive impressions from a dozen people -- just enough that I could credibly say "it has its ups and downs" without understanding what they all are. :-)  Sorry.

Reply all
Reply to author
Forward
0 new messages