Hash weirdness

1 view
Skip to first unread message

Jared Nedzel

unread,
May 14, 2008, 4:55:38 PM5/14/08
to boston-r...@googlegroups.com
Folks:

I'm getting some weird behavior that I don't understand. I'm probably
doing something really noobish. I've subclassed Hash:

class ResultHash < Hash

def process_line(line, value_type)
# do some stuff that isn't important here....
end
end

There's nothing else in my ResultHash class.

In a different class, I create a ResultHash instance, call process_line
repeatedly, which creates multi-level hash of hashes.

Then I want to iterate over the top level of keys:

result_hash = ResultHash.new()
# add a bunch of stuff to using process_line

# loop over it
keys = result_hash.keys
keys.each do |well_key|
# do some stuff here
end

At the keys.each call, I get the exception "wrong number of arguments (1
for 0)"

In the debugger I can see that the "keys" temporary variable is an
instance of Array with the 96 elements that I expect.

I did a small test case using just bare Hash:

def test_hash
a_hash = {"A" => 1, "B" => 2, "C" => 3}
keys = a_hash.keys
keys.each do |key|
puts "key: " + key + " value: " + a_hash[key].to_s
end
end

and this works as expected.

I was already thinking about changing my design to get rid of my
ResultHash methods (since all it adds is a single method that can
logically go elsewhere). But I'd like to know what I'm doing wrong.

Any ideas?

--
Jared Nedzel
Cancer Genomics Informatics
Broad Institute
7 Cambridge Center
Cambridge, MA 02142

617-324-4825
jne...@broad.mit.edu

Ron Newman

unread,
May 14, 2008, 5:09:05 PM5/14/08
to Jared Nedzel, boston-r...@googlegroups.com
I created a Ruby source file with exactly your code, and did not get any exception:


class ResultHash < Hash

def process_line(line, value_type)
# do some stuff that isn't important here....
end
end

result_hash = ResultHash.new()


# add a bunch of stuff to using process_line

result_hash[3] = 8
result_hash['foo'] = 'bar'

# loop over it
keys = result_hash.keys
keys.each do |well_key|
# do some stuff here

puts well_key
end

It prints
foo
3
as expected.

Doug Pfeffer

unread,
May 14, 2008, 5:23:02 PM5/14/08
to boston-r...@googlegroups.com, Jared Nedzel
Is process_line() returning some kind of funky object? Maybe it's just
not a hash?

Doug

Jared Nedzel

unread,
May 14, 2008, 5:46:43 PM5/14/08
to Doug Pfeffer, boston-r...@googlegroups.com
No, process_line just adds things to the hash:

def process_line(line, value_type)
if (line == nil || line.empty?)
return
end

well_location = line[0]
return if well_location.empty?

well_hash = nil
if (!self.has_key?(well_location))
self[well_location] = Hash.new()
end
well_hash = self[well_location]

bead_num = 1
for i in 2..(line.length() - 3)
value = line[i]

if (!well_hash.has_key?(bead_num))
self[well_location][bead_num] = Hash.new()
end
bead_hash = self[well_location][bead_num]

bead_hash[value_type] = value
bead_num += 1
end
self
end

I'm parsing a file that has a bunch of lines that look like this:

A1,,"37.8108108108108","15.4789473684211","9.33838383838384","32.9772727272727","24.7336956521739","21.9666666666667","11.7808219178082","558.337662337662","272.688073394495","10.1146788990826","27.2953020134228","7.55294117647059","6.39285714285714","16.1666666666667","12.3673469387755","26.8061224489796","9.72222222222222","11.3552631578947","13.4054054054054","19.4487179487179","100.627272727273","16.2857142857143","22.1777777777778","15.4556962025316","35.962962962963","13.8641975308642","28.546875","9.65384615384615","41.4260869565217","12.2441314553991","25.8511627906977","16.7647058823529","85.6335877862595","18.4748603351955","23.2871794871795","157.963235294118","231.4","6.22685185185185","22.5991189427313","9.04878048780488","7.35","21.1284403669725","22.3235294117647","50.7471264367816","32.7946428571429","8.60869565217391","65.4065934065934","24.71","31.9396551724138","19.952380952381","59.9272727272727","10.86","70.4367816091954","41.5462962962963","138.794117647
059","56.61","26.3736263736264","37.6865671641791","49.5733333333333","35.1063829787234","46.2661290322581","20.3575418994413","35.0940170940171","58.7946428571429","23.3913043478261","53.1682242990654","38.1470588235294","36.075","54.5555555555556","47.6172839506173","16.9107142857143","25.1111111111111","22.5185185185185","55.8602150537634","65.0422535211268","21.6463414634146","14.5483870967742","49.6526315789474","43.5","22.2020202020202","20.1401869158878","18.7246376811594","63.1971830985916","187.8","26.5294117647059","86.8807339449541","27.9791666666667","100.045454545455","40.0561797752809","52.6736842105263","66.1166666666667","1825.46875","323.371681415929","30.6818181818182","61.0157480314961","39.4190476190476","2108.79816513761","100.670454545455","24785.1619047619","150.12037037037",13306,"Sample
Empty"
A2,,"39.7594501718213","15.5533596837945","8.5764192139738","34.6787003610108","30.2253521126761","25.5748031496063","11.8037735849057","729.758293838863","336.157024793388","10.2933333333333","27.4210526315789","7.71341463414634","7.3047619047619","19.2368421052632","12.7637795275591","25.0181818181818","11.6813186813187","9.56818181818182","17.25","19.7375","202.242424242424","17.5494505494506","24.3737373737374","13.4235294117647","35.256880733945","15.0238095238095","31.9158878504673","7.56521739130435","44.8888888888889","14.2791666666667","32.1936936936937","20.7925531914894","74.9831932773109","17.8461538461538","27.8851674641148","209.347826086957","328.926829268293","7.21611721611722","24.7683397683398","10.0648148148148","8.92771084337349","27.3092105263158","28.5042016806723","60.1477272727273","40.603305785124","9.77931034482759","95.1428571428571","23.1868131868132","30.0952380952381","24","59.6639344262295","12.979797979798","73.2903225806452","47.3898305084746"
,"149.336363636364","66.1415094339623","24.4646464646465","42.9736842105263","60.4556962025316","40.6282051282051","49.5658914728682","19.8011695906433","58.7155172413793","64.9259259259259","31.2260869565217","46","40.625","44.8260869565217","56.7241379310345","63.1022727272727","16.4444444444444","21.6060606060606","22.7916666666667","72.6194690265487","52.9425287356322","24.6714285714286","15.2992125984252","51.4536082474227","36.2293577981651","25.5714285714286","20.2053571428571","20.5604395604396","56.68","386.684931506849","31.7361111111111","80.9404761904762","30.0943396226415","123.491525423729","65.1964285714286","59.4945054945055","62.2592592592593","2181.97674418605","519.81512605042","35.0625","61.0747663551402","43.6020408163265","2829.99212598425","98.5454545454545","25799.5365853659","132.223140495868",14255,"Sample
Empty"
...


The 0 column (e.g., A1 on the first row), is a well location on a 96
well plate (A-H1-12). Columns 2 through 101 are data columns,
representing the value for bead_nums 1-100.

The file has repeating blocks of this pattern, with each block
containing the values for a particular value_type.

So I'm creating a structure that looks like:

<well location> --> <bead_name_hash> --> <value_type_hash> --> value

For example,

A1 --> 1 --> trimmed_mean --> 37.8108108108108
--> peak --> 51.2
...
--> 2 --> trimmed_mean --> 15.4789473684211
...
--> 100 --> trimmed_mean --> 150.1203704
A2 --> 1 --> trimmed_mean --> 39.7594501718213


This works for a small test case. But when I run on a real data file
(96 wells x 100 beads per well x 11 value_types) I get this behavior.

I've tried refactoring the process_line method to a different class and
just using an instance of Hash, and I'm getting the same behavior.

Thanks,

Jared

Tyler McMullen

unread,
May 14, 2008, 6:00:20 PM5/14/08
to boston-r...@googlegroups.com
You probably already looked into this, but is there any chance the line that it is failing on (I assume it's the same line everytime) is out of the ordinary in some way?  Specifically in number of data points.  I don't immediately see how something like this would cause it to fail, but if there is something out of the ordinary it could certainly point you in the right direction...

Also, because I'm a succinctness nazi...


well_hash = nil
if (!self.has_key?(well_location))
  self[well_location] = Hash.new()
end
well_hash = self[well_location]

... can be refactored into ...

well_hash = self[well_location] || {}


And...



if (!well_hash.has_key?(bead_num))
  self[well_location][bead_num] = Hash.new()
end
bead_hash = self[well_location][bead_num]

... can be refactored into ...

bead_hash = self[well_location][bead_num] ||= {}



Sorry if the refactoring was too forward. :)

tyler

Tyler McMullen

unread,
May 14, 2008, 6:01:30 PM5/14/08
to boston-r...@googlegroups.com
And I just realized that first succinctness tweak is wrong and should look more like the second one.  Sorry.


tyler

Jared Nedzel

unread,
May 14, 2008, 6:36:19 PM5/14/08
to boston-r...@googlegroups.com
Oddly enough, while looping over the hash this way breaks:


keys.each do |well_key|
well_hash = result_hash[well_key]
puts "processing well: " + well_key + " size: " +
well_hash.size.to_s
end

This way works (please excuse the many temp variables in here, I put
them in so I could easily see what was going on in the debugger):

keys = result_hash.keys
length = keys.length
for i in 0..(length - 1)
key = keys[i]
well_hash = result_hash[key]
puts "processing well: " + key + " size: " + well_hash.size.to_s
end

That's pretty nasty, but since it is working, I'll live it with it for
now. Thanks for the suggestions.

Jared

--

Ron Newman

unread,
May 14, 2008, 6:41:20 PM5/14/08
to Jared Nedzel, boston-r...@googlegroups.com
Could you please post a short, self-contained, and complete program that demonstrates
the problem you are having? Without this, there's little chance that any of
us can usefully debug.
Reply all
Reply to author
Forward
0 new messages