Problem with 0.9.2.2, please use 0.9.2.1 for now

3 views
Skip to first unread message

Doug Judd

unread,
Mar 13, 2009, 12:54:35 PM3/13/09
to hyperta...@googlegroups.com, hyperta...@googlegroups.com
Several people have reported problems with this release when trying to load a large amount of data.  Unfortunately, our bulk load test does not trigger the problem.  I'm trying to reproduce it now and from the problem reports so far, I've got a hunch as to what is going on.  As soon as we get to the bottom of it, we will A) augment our suite of integration tests to include a test case that triggers this problem, and B) cut a 0.9.2.3 release. 

For now, please use the 0.9.2.1 release ( you can branch from the git repository using the "v0.9.2.1" tag ).

- Doug

url81-1

unread,
Mar 13, 2009, 6:07:59 PM3/13/09
to Hypertable User
Doug,

Thanks, keep me posted. Let me know if I can somehow help in
producing a datasample that triggers this.

In the meantime I've reverted to the 0.9.2.1, managed to load
everything in, now am restarting to see if it cleans up all the
logfiles ok so I can
get 0.9.2.1 operational with a few terabytes of data.

Best,
earle.

Adrenalin

unread,
Mar 13, 2009, 7:28:13 PM3/13/09
to Hypertable User
Hi,
Any hopes for FreeBSD support ?

Doug Judd

unread,
Mar 13, 2009, 8:10:28 PM3/13/09
to hyperta...@googlegroups.com
That's not something we're focused on.  If someone has time and would like to add support for FreeBSD, then we'd be more than happy to merge it in.

- Doug

url81-1

unread,
Mar 14, 2009, 4:28:23 PM3/14/09
to Hypertable User
Doug,

I have a little more information for you that may help.

I backed off to 0.9.2.1 and loaded in all the data successfully (3
terabytes+). From here, I can do queries, use HQL, and everything
else without any problems. However, I did see tht the user log
directories had a ton of stuff in it. Still, everything worked fine.

Once I did a HQL shutdown, cap stop, restarted Hadoop, and tried to
restart hypertable, I began to get errors. It would clean up all the
user logs, load all the cell stores, then the Master RangeServer would
crash and appeared to be complaining about unable to create a
range_txn log. I'm going to work on isolating this and getting logs
together.

Best,
e.

Doug Judd

unread,
Mar 15, 2009, 12:05:21 PM3/15/09
to hyperta...@googlegroups.com
Hi Earle,

The Master server has a file garbage collection process that runs periodically.  It is responsible for removing old unreferenced CellStore files.  I suspect what's going on is that the Master garbage collector is removing CellStores that are actively being referenced.  This would explain the "Could not obtain block blk_-8476058389477224798_1789 from any node:  java.io.IOException: No live nodes contain current block" errors that you're seeing in the DfsBroker log.  One of the bugs that got fixed in 0.9.2.2 was a problem where the Master GC interval was being interpreted as seconds instead of milliseconds.  The default value for MasterGc interval is 300000.  When interpreted as milliseconds, this is 5 minutes, however, when interpreted as seconds, it is 3 1/2 days.  This bug was effectively preventing the Master garbage collector from running and I suspect is why we never saw the problem before.

Could you try the following experiment?  Run Hypertable 0.9.2.2 (ideally on Hadoop 0.19.1) and set the following hypertable.cfg property:

Hypertable.Master.Gc.Interval=300M

Be sure this gets pushed to all of your nodes before starting the Hypertable processes.  This config change will have the effect of rolling back the change to correct the mis-interpretation of the Gc.Interval value.  Then run your load test and see if the problem goes away.  Knowing the outcome of this test will help out tremendously in isolating this problem.  In the meantime, I'll continue to try to re-create it on our end.

One more thing, the problems of not being able to create the range_txn log as well as the thousands of files in the log/user directory were both fixed in the 0.9.2.2 release.  Thanks for all of your help!

- Doug

url81-1

unread,
Mar 16, 2009, 1:55:08 PM3/16/09
to Hypertable User
Doug,

This makes sense.

I'm running the import now, using the looping mechanism over small (<
5-200MB) load files, and it appears to be working!
I will let this (hopefully) finish, then I'll try loading a giant
single file (terabyte or so) and we'll see how it goes.

Assuming this works, should I try stopping/restarting once its
complete, or will that trigger the garbage collection?

Otherwise I'm assuming I'm sitting on a timebomb until the GC interval
hits?

Best,
earle.

On Mar 15, 11:05 am, Doug Judd <nuggetwh...@gmail.com> wrote:
> Hi Earle,
>
> The Master server has a file garbage collection process that runs
> periodically.  It is responsible for removing old unreferenced CellStore
> files.  I suspect what's going on is that the Master garbage collector is
> removing CellStores that are actively being referenced.  This would explain
> the "Could not obtain block blk_-8476058389477224798_1789 from any node:
>  java.io.IOException: No live nodes contain current block" errors that
> you're seeing in the DfsBroker log.  One of the bugs that got fixed in
> 0.9.2.2 was a problem where the Master GC interval was being interpreted as
> seconds<http://github.com/nuggetwheat/hypertable/commit/d5b01c83d1e8714b96b29...>instead

url81-1

unread,
Mar 16, 2009, 2:04:36 PM3/16/09
to Hypertable User
Doug,

I clearly jinxed us. Immediately after posting that I started getting
errors... However, I'm not showing any Dfserrors anymore...
Everything was fine until an error loading a range getting a
timeout.... I have tar'd up the logs for you to take a look let me
know where to send them

On Master I see:
1237226147 ERROR Hypertable.Master : (/home/hadoop/dist/hypertable/src/
cc/Hypertable/Master/Master.cc:783) Problem issuing 'load range' c
ommand for events[2009-01-16 08:59:00.0000068824:ÿÿ] at server
192.168.114.27:41730 - HYPERTABLE request timeout
on Master Rangeserver I see:
1237226142 INFO Hypertable.RangeServer : (/home/hadoop/dist/hypertable/
src/cc/Hypertable/RangeServer/Range.cc:978) Replayed 1369822 updat
es (257 blocks) from split log '/hypertable/servers/
192.168.114.24_38060/log/802D19C3E5852CD852954FFF' into events
[2009-01-16 08:59:00.00
00068824..ÿÿ]
Hypertable.RangeServer: /home/hadoop/dist/hypertable/src/cc/Hypertable/
RangeServer/TableInfo.cc:139: void Hypertable::TableInfo::add_rang
e(Hypertable::RangePtr&): Assertion `iter == m_range_map.end()'
failed.

Doug Judd

unread,
Mar 16, 2009, 2:23:46 PM3/16/09
to hyperta...@googlegroups.com
Hi Earle,

Is there any way you could tar up all your log files and send them to me directly (doug at zvents dot com)?  Or you can post them here:  http://groups.google.com/group/hypertable-user/files

- Doug
Reply all
Reply to author
Forward
0 new messages