Is there a way to do this?
Ie. if I had a name for the objects I'm adding as well as a social
security number, I'd like to be able to return the object by either
name or number depending on need. Is there an object that does this,
or do I have to write it myself?
Thanks!
You can create your own SuperHaspMap with .put(k1, k2, val) and
.getByKey1(k1) and .getByKey2(k2) methods.
But usually you would just put the same objects in two different
HashMap's.
(since the HashMap only stores a ref to the objects not a copy,
then that works fine)
Arne
You can put the same object in more than once, with different keys.
This is a little less safe in terms of programmer usage, but works fine.
class Person {
int ssn;
String name = "";
}
HashMap<Object,Person> map = new HashMap<Object,Person>();
Person person = new Person();
map.put( person.ssn, person );
map.put( person.name, person );
Now "person" is in the map twice, once under SSN and once by their name.
As Arne said, this is done by reference so there's no wasted space or
extra copies or anything bad like that.
If you have two maps, then you would need to delete it from both.
Arne
>I would like to use an object that behaves like a hashmap (ie. put,
>get, with a key), but I'd like to be able to index items in that
>object by more than one key.
You could do this with two HashMaps, or with one HashMap where you do
a put with two different keys and the same value object. HashMap is
really concerned only with the keys. It does not care it two keys
point to the same object.
--
Roedy Green Canadian Mind Products
http://mindprod.com
"Patriotism is fierce as a fever, pitiless as the grave, blind as a stone, and as irrational as a headless hen."
~ Ambrose Bierce (born: 1842-06-24 died: 1914 at age: 71)
Almost certainly yes, for correct operation.
markspace wrote:
> Almost certainly yes, for correct operation.
You don't put objects into maps, you put references into maps.
If you have two different keys, say a name string (reference) and a
social-security-number string (reference), that retrieve the same value
(reference) from the same map, both keys need to go away for the value to be
removed from the map.
--
Lew
A Social Security Number is a string, not an int. Were you to foolishly
represent it as a numeric type, an int wouldn't hold the range of values.
> String name = "";
> }
>
> HashMap<Object,Person> map = new HashMap<Object,Person>();
Say, rather,
Map <String, Person> map = new HashMap <String, Person> ();
> Person person = new Person();
> map.put( person.ssn, person );
> map.put( person.name, person );
>
> Now "person" is in the map twice, once under SSN and once by their name.
> As Arne said, this is done by reference so there's no wasted space or
> extra copies or anything bad like that.
There could be wasted space. Using your example:
map.put( person.name, person );
map.put( new String( person.name ), person );
will create two instances of name strings with the same value, neither of
which can be GCed while the person lives and is in the map.
I also fear it will create two entries in the map. While 'equals()' is
polymorphic, it also has to handle comparing String to Integer.
--
Lew
> [...]
>> Now "person" is in the map twice, once under SSN and once by their
>> name. As Arne said, this is done by reference so there's no wasted
>> space or extra copies or anything bad like that.
>
> There could be wasted space. Using your example:
>
> map.put( person.name, person );
> map.put( new String( person.name ), person );
>
> will create two instances of name strings with the same value, neither
> of which can be GCed while the person lives and is in the map.
>
> I also fear it will create two entries in the map.
Surely not. That would be a bug in the HashMap class, as the two strings
are identical in content and so will compare as equal. Keys must be
unique, so the second call to put() will simply replace the entry created
by the first call.
It's true that, with the first string coming from the Person instance,
having that entry no longer in the HashMap won't allow the string instance
to be GC'ed (it's still referenced by the Person instance). But if you
reverse the order, so that the newly constructed string is added first,
then when the second call to put() happens, that first newly constructed
string will be collectable (assuming no other code, where it's retained
elsewhere, of course).
In other words, it's no worse than creating a new copy of a String
instance would be normally. Any "waste" is inherent in the copying
process and general use, not its use specifically in a HashMap.
> While 'equals()' is polymorphic, it also has to handle comparing String
> to Integer.
I don't really know how that's supposed to be relevant here. Maybe you
can elaborate, if you still feel it's important.
Pete
Depends on the country.
But US SSN's has only 9 digits - correct ? (and that can be in an int)
>> String name = "";
>> }
>>
>> HashMap<Object,Person> map = new HashMap<Object,Person>();
>
> Say, rather,
>
> Map <String, Person> map = new HashMap <String, Person> ();
>
>> Person person = new Person();
>> map.put( person.ssn, person );
>> map.put( person.name, person );
>>
>> Now "person" is in the map twice, once under SSN and once by their
>> name. As Arne said, this is done by reference so there's no wasted
>> space or extra copies or anything bad like that.
>
> There could be wasted space. Using your example:
>
> map.put( person.name, person );
> map.put( new String( person.name ), person );
>
> will create two instances of name strings with the same value, neither
> of which can be GCed while the person lives and is in the map.
True.
But I can not see any reason for doing that.
Arne
> I also fear it will create two entries in the map.
Not if it is java.lang.String.
Arne
>
> A Social Security Number is a string, not an int. Were you to foolishly
> represent it as a numeric type, an int wouldn't hold the range of values.
>
I did a quick check before I posted that, maybe I miscounted:
2^31 = 2147483648
0xxxyyzzzz = ssn xxx-yy-zzzz
> 2^31 = 2147483648
> 0xxxyyzzzz = ssn xxx-yy-zzzz
This example looks right while I type this reply but seemed to have
extra spaces in the second line when being viewed by Thunderbird. Line
up the 2147483648 and the 0xxxyyzzzz to see what I'm trying to display
there.
2147483648 = 2^31
0xxxyyzzzz = ssn xxx-yy-zzzz
There, that should line up regardless.
The text-to-html converter swallowed the ^ when making the letters
superscript: there's your missing space.
--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
I hate it when programs try to "improve" my ASCII. :)
markspace wrote:
> I did a quick check before I posted that, maybe I miscounted:
>
> 2^31 = 2147483648
> 0xxxyyzzzz = ssn xxx-yy-zzzz
Silly me.
I must have been counting the dashes.
Everyone who corrected me on that point is correct, of course.
An SSN is still not numeric, though.
--
Lew
Lew wrote:
>> There could be wasted space. Using your example:
>>
>> map.put( person.name, person );
>> map.put( new String( person.name ), person );
>>
>> will create two instances of name strings with the same value,
>> neither of which can be GCed while the person lives and is in the map.
Peter Duniho wrote:
> It's true that, with the first string coming from the Person instance,
> having that entry no longer in the HashMap won't allow the string
> instance to be GC'ed (it's still referenced by the Person instance).
> But if you reverse the order, so that the newly constructed string is
> added first, then when the second call to put() happens, that first
> newly constructed string will be collectable (assuming no other code,
> where it's retained elsewhere, of course).
Sure, but my point was a response to the notion that the Map as construed in
the example unequivocally would have "no wasted space or extra copies or
anything bad like that". All it took was one example of how that isn't true.
--
Lew
> Sure, but my point was a response to the notion that the Map as
> construed in the example unequivocally would have "no wasted space or
> extra copies or anything bad like that". All it took was one example of
> how that isn't true.
Well, even the example you posted won't create two entries in the
HashMap. As far as whether the copied string is "wasted space", it's only
wasted space if copying the string is wasteful in the first place.
Granted, it usually is. But if that's the case, then the waste comes from
creating the copy, not from the use of the HashMap itself. I.e. the
"example" is entirely contrived and has nothing to do with what
"markspace" was writing about.
Pete
>
> An SSN is still not numeric, though.
>
Well.... an SSN can be pretty trivially encoded in an Integer. ;)
AHS
Arved Sandstrom wrote:
> It's numeric, consisting of 3 numbers. Even the SSA says so.
Where does it say so?
> It's not a string. It's just that it makes little or no sense to contemplate
> arithmetic operations involving SSNs, except perhaps for incrementing
> the serial number part.
If you simply increment the serial number part, you could break the checksum,
so that operation isn't actually valid.
--
Lew
As an alternative you could use a database. Set up a table with Name,
SSN, etc. and build indexes for the Name and SSN fields for fast
retrieval.
rossum
Without being facetious, all over their website, including at
http://www.socialsecurity.gov/history/ssn/geocard.html.
The group number and serial number definitely use numerical concepts
(like odd and even for group number).
>> It's not a string. It's just that it makes little or no sense to
>> contemplate arithmetic operations involving SSNs, except perhaps for
>> incrementing the serial number part.
>
> If you simply increment the serial number part, you could break the
> checksum, so that operation isn't actually valid.
What is the checksum procedure for a SSN? I wasn't aware there was any.
The SSA does not describe any kind of checksum procedure for figuring
out whether an SSN is valid. There is a procedure of course for doing so.
AHS
For the Name key, the OP could use a multimap implementation easily
enough. In fact, one approach would be to use a interface that allows
for either a Java multimap implementation or the use of a database
(again, for the Name key). Lookups by SSN would return single values;
lookups by name ought to (just my 2 ISK worth) return multiple
results...either implementation would do that.
AHS
There are two main strategies I know of, and you can blend
them in various degrees:
1) Use multiple Maps, one for each key. Your example would
have a Map<Name,Thing> and a Map<SSN,Thing>. Each time you add
a Thing to one Map you also add it to the other; same business
on deletions. It may be convenient to gather all these Maps
into one big MegaMap, so you can more easily produce things
like Iterators that support remove(). See "inverted file."
(I doubt that a MegaMap could easily implement Map, since the
keySet() method would be problematic. But maybe your problem
suggests some domain-specific approach to that difficulty.)
2) Use a single Map<Object,Thing>, entering each Thing
into the Map once for each of its keys. If the keys are easily
distinguished (e.g., "Mortimer Snerd" is easily recognizable as
a Name and 011-22-4343 as an SSN), you might just use the keys
as they stand. If not, it might be best to make yourself a
little TaggedKey class holding the key's value along with an
enum designating its type:
class TaggedKey {
enum KeyType { NAME, SSN }
private final KeyType type;
private Object value;
}
... and use a Map<TaggedKey,Thing>. See "library card catalog."
Note that an Iterator over the values() of such a Map will
encounter each Thing as many times as it has keys, just as
"The Sot-Weed Factor" appears many times in the library's catalog.
It will often turn out that not all the "keys" satisfy the
properties one wants in a key, properties like uniqueness. In
your example, Name is unlikely to be a unique key and even SSN
has problems. If so, you may want to synthesize a "primary key"
of some kind (this is why you likely have an "employee number")
and use one Map<PrimaryKey,Thing> for it, and use additional
Map<SecondaryKey,Collection<Thing>> maps for the others. (As
a refugee from the Bad Old Small-Memory days, I might instead
use a Map<SecondaryKey,Object> and store a mixture of Things
and Collection<Thing>s, using `instanceof Thing' at run-time
to figure out what I'd retrieved, and earning the scorn of
every modern-era right-thinking Java zealot.[*]) See also
TAOCP Volume III, "Retrieval on Secondary Keys."
[*] No offense, folks. But I recall a respected senior
programmer/architect at a former job whose response to "Why
worry? Memory's cheap" was to extend his hand, palm upwards,
and wait for the speaker to give him some. Oddly, nobody
ever did ...
--
Eric Sosman
eso...@ieee-dot-org.invalid
You will need multiple Maps for that. Even though you could stuff
everything in a single Map I'd rather not do this because then you
cannot manipulate indexes individually any more. I would probably write
my own class which does all the Map handling and ensuring consistency
internally.
If you need this in multiple places with different data types and index
values it may pay off to look for a generic solution to the problem or
write it yourself (probably using reflection).
If the properties of objects which you use as keys are allowed to
change, things get more complicated. You probably then would have to
use some form of observer pattern to notify your multi index class of
the change. At the very least you would need a reindex operation which
can be invoked manually.
The fun starts, if you mix both (generic solution and mutable key
fields). :-)
Another interesting question is whether your indexes are all unique or
non unique and what happens if you have a unique index and get two
objects with equivalent key values...
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
They use "number" in the loose English sense of "numeric string"; they make no
claim there that the SSN is actually numeric, I term I selected advisedly.
Their use of "odd" and "even" is cognate to describing letters as "capital" or
"lower case". There is a numeric algorithm involved in creating an SSN, but
once assigned it acts as a label, and not as a number.
I could have been more accurate and said that the SSN is non-numeric as
commonly used in situations such as described in this thread.
> The group number and serial number definitely use numerical concepts
> (like odd and even for group number).
...
>>> It's not a string. It's just that it makes little or no sense to
>>> contemplate arithmetic operations involving SSNs, except perhaps for
>>> incrementing the serial number part.
The only numeric operation that makes sense for an SSN is confirmation of a
check digit, an operation also commonly performed in software on character
strings with letters in them.
>> If you simply increment the serial number part, you could break the
>> checksum, so that operation isn't actually valid.
>
> What is the checksum procedure for a SSN? I wasn't aware there was any.
> The SSA does not describe any kind of checksum procedure for figuring
> out whether an SSN is valid. There is a procedure of course for doing so.
I don't know the procedure in detail, and I don't think the SSA particularly
wants to publicize it lest identity hackers mess with it.
Note that their site, in particular the page you linked, describes three
groups of characters (that they call digits) that are independent of each
other. Note also that leading zeros are significant - you cannot have an SSN
of "12345678" or "012-4-5678". Also, group numbers are assigned in an
arbitrary (but fixed) lexical order, not a numeric order.
As you point out, it makes no sense to do any calculations (other than
checksum) on an SSN. What is the square root of an SSN?
Bear in mind that the English word "number" is used for numeric strings as
well as quantitative entities. My point is that SSNs are the former, not the
latter, not that the word "number" does not apply.
I do agree that it is possible to model an SSN as an int or as a String.
For the OP's purpose in particular, it makes more sense to model an SSN as a
String than an int.
--
Lew
Given that:
- arithmetic operations does not make sense
- leading zero is significant
then I would suggest String over int.
Arne
> As an alternative you could use a database. Set up a table with Name,
> SSN, etc. and build indexes for the Name and SSN fields for fast
> retrieval.
In most cases apps have both a persistent and an in memory
representation.
Even with correct index a database does not exactly have the
same performance characteristics as a HashMap.
Arne
I'm still not sold on the idea that there is a checksum for SSNs. I
myself don't see how there can be one. You yourself say that the three
numbers making up an SSN are independent of each other.
AHS
I would model a SSN as an SSN - meaning: since a SSN is not exactly a
number and has more properties than a simple length (for example, a
valid format) I would create a class for SSN handling. It's likely that
it will include a constructor with a String (or even CharSequence)
argument but the internal representation does not really matter. And it
can be changed, too if you notice that all of a sudden the long you used
initially is not sufficient to hold all the relevant information. My
0.02 EUR anyway.
How I would approach SSN processing (or SIN or credit card number
processing, for that matter) would depend on where I am getting the
number from, and what I need to do with it. For example, a number of the
government applications I am helping to maintain deal with SINs, but we
get them electronically from trusted partners, and so there is no need
to deal with them other than as opaque strings.
If OTOH the SSN/SIN was being typed in by a clerk I'd consider
validation of some sort. It won't be perfect but it'll catch some typos.
I would contemplate your design if my application was tasked with
generating and/or validating an SSN/SIN, but not otherwise.
AHS
I have been told by programmers at SSA that there is a checksum, but not the
details of the algorithm. However, Wikipedia states that there is no check
digit in an SSN, so it seems I was wrong.
<http://en.wikipedia.org/wiki/Social_security_number>
<http://en.wikipedia.org/wiki/Social_security_number#Absence_of_a_check_digit>
I was probably confusing the checksum idea with
<http://www.socialsecurity.gov/employer/ssnvhighgroup.htm>
--
Lew
Even with validation String would probably be a better format than
int or int's.
Like matching the string against \d{3}-\d{2}-\d{4}.
Arne
With more than 300 millions of US citizens, there would not be much
room for a checksum in a 9-digit number.
French SSN are quite longer (although population of France is lower than
US population, at about 65 millions) and have a two-digit checksum (with
an algorithm involving a numerical reduction modulo 97) supposedly
designed to catch common typing errors.
--Thomas Pornin
I would not want to go with plain String in an OO language if there must
be validation. IMHO that approach is only justifiable in the light of
performance issues. The default for me would be a separate class with
ensures all the invariants that are needed for an SSN in the application.
Only if that class is the bottleneck of the application I would consider
changing this. First line of defense might be to make the validation
optional if that is actually the bottleneck and I have sources that I
can trust. Then I'd consider switching to String. It's easier to
modify a complete program to exchange class SSN for class String as the
reverse operation.
Given the HashMap contains the population of U.S. and the use case needs one
search from that popula, a HashMap that needs to be loaded before the search
does not have the performance characters of an SQL SELECT with the key.
True.
But having then entire US population and only needing to lookup
one person is a rather unlikely scenario.
Arne
I disagree.
It comes in as string, it is persisted as string and i can not think of
any operation to be done on it that will not work fine on a string.
It is not particular OO to convert data to unnatural forms.
Arne
When I stated earlier that I would contemplate Robert's design for an
SSN class if validation was demanded, I would still store the actual SSN
as a String, with no structure. I am not sure that Robert was suggesting
otherwise himself.
AHS
It's not that there are operations that would not work on a String. My
point is that there are _too many_ operations that work on a String that
are either superfluous for a SSN or actually do harm (for mutable
strings). From what I have read a SSN is significantly important and
special enough that a dedicated class is warranted.
>> It is not particular OO to convert data to unnatural forms.
Whatever "unnatural" means in this context. I find it most natural when
working in an OO language to model important entities of the real world
as separate classes and not try to cover everything with basic types.
After all, that's what OO is about.
> When I stated earlier that I would contemplate Robert's design for an
> SSN class if validation was demanded, I would still store the actual SSN
> as a String, with no structure. I am not sure that Robert was suggesting
> otherwise himself.
As you rightly assumed, I wasn't. If Arne's argument were followed,
there would be no point in having other classes as String in an
application at all because all the data comes in as strings and goes out
as strings. (Of course I am exaggerating here - but just a bit.) ;-)
Map<String, Person> byName;
Map<SocialSecurityNumber, Person> bySsn;
Map<Color, List<Person>> byFavoriteColor;
--
Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
It is a type in-and-of itself.
come on people, we're OO programmers, use classes over primitives!
Great examples.
For that last I proffer
Map <Color, Set <Person>> byFavoriteColor;
to prevent duplicate entries in the target collection.
A word to the wise - this technique increase the programmer's responsibility
to clean up references. It's all too easy to leave a 'Person' reference
buried in 'byFavoriteColor' that you removed from 'byName' and 'bySsn'.
With JPA, @Entity and judicious use of @OneToMany and siblings to inject
member collections, you mitigate the risk of reference retention because you
don't maintain multiple long-lived independent structures.
EntityManager and its related mechanisms provide useful resource and entity
object management. The programmer can't abdicate responsibility, but the JPA
provides methods that simplify control.
For example you could create the cited 'Map's at need rather than as permanent
edifices. You do a little query, retrieve one or another useful collection of
entities expressing one or another useful relationship, do something useful
with those objects, then pass the objects out of scope, mayhap closing an
entity manager on the way out. There's less chance of packratting with that
idiom.
JPA managers also unify managed instances so that you don't get entity bloat.
--
Lew
String is a class.
If it should be a custom class, then we should have a purpose
for doing so.
Beyond the "cuteness" factor.
More code (measured in functionality not KLOC) => higher cost.
Arne
Strings are not mutable in Java.
And encapsulating String's in classes to make String's methods
unavailable seems superfluous to me.
>>> It is not particular OO to convert data to unnatural forms.
>
> Whatever "unnatural" means in this context. I find it most natural when
> working in an OO language to model important entities of the real world
> as separate classes and not try to cover everything with basic types.
> After all, that's what OO is about.
No.
Keywords are: information hiding, data abstraction, encapsulation,
modularity, polymorphism, and inheritance.
Good OO is using basic types if that makes most sense.
>> When I stated earlier that I would contemplate Robert's design for an
>> SSN class if validation was demanded, I would still store the actual
>> SSN as a String, with no structure. I am not sure that Robert was
>> suggesting otherwise himself.
>
> As you rightly assumed, I wasn't. If Arne's argument were followed,
> there would be no point in having other classes as String in an
> application at all because all the data comes in as strings and goes out
> as strings. (Of course I am exaggerating here - but just a bit.) ;-)
Good programming is not about doing things just for doing it.
Good programming is about doing things for good reasons.
If you have an app that:
* only takes input, persist and produce output as strings
* does not have a need to bundle data together
* does not have a need for any methods on the data besides
those provided by String
then you don't need any classes except String.
I don't think I have ever seen an app meeting those
requirements, so your point is extremely irrelevant.
Arne
PS: And you were exaggerating more than a bit. Claiming that
not using a custom class for SSN is the same as meaning
no custom classes at all is off the scale for
exaggerating.
String is a class, but it is a fairly primitive abstraction. The
purpose of using a custom SSN class is provide an easy way to handle
invariants in one place. SSN's have a definite structure which can be
checked when created and modified.
The cost of having an SSN class is less than the cost of sprinkling SSN
verification code throughout the rest of the app.