Using HBaseTap as a sink - strange error with "Relative path in absolute URI"

645 views
Skip to first unread message

Anton

unread,
May 8, 2009, 12:23:18 AM5/8/09
to cascading-user
Hey all

I'm trying to write to HBase as output of my job, but am getting this
strange error:

Exception in thread "flow organicCount*paidCount"
java.lang.IllegalArgumentException: java.net.URISyntaxException:
Relative path in absolute URI: hbase:url_summary

Originally I had my family names in the fields, and thought the colons
in the field names were the problem (ie, the tap was set up like so):

String table = "url_summary";
Fields key = new Fields ( "urlid_day" );
Fields[] fields = new Fields[]{ new Fields ( "default:organic",
"default:paid" ) };
HBaseScheme scheme = new HbaseScheme( key, fields );
Tap logHBaseSinkTap = new HBaseTap( table, scheme );

I change that it to be set up like this

String table = "url_summary";
Fields key = new Fields ( "urlid_day" );
String[] family = { "default" };
Fields[] fields = new Fields[]{ new Fields ( "organic", "paid" ) };
HBaseScheme scheme = new HbaseScheme( key, family, fields );
Tap logHBaseSinkTap = new HBaseTap( table, scheme );

(basing off the examples in the cascading.hbase unit test at
http://github.com/cwensel/cascading.hbase/blob/a8e188021120bc3e58463aaf09ce5b6df6a3884e/src/test/cascading/hbase/MultiFamilyHBaseTest.java)

but I still get the same error.

I googled around a bit and came across this bug in Hadoop:
https://issues.apache.org/jira/browse/HADOOP-2066 - could that be
related? Not really sure what else to try.

full log below in case that's useful:

hadoop jar ./build/loganalysis.jar
09/05/07 20:56:58 INFO flow.MultiMapReducePlanner: using application
jar: /home/anton/loganalysis/./build/loganalysis.jar
09/05/07 20:56:58 INFO cascade.Cascade: [organicCount*paidCount]
starting
09/05/07 20:56:58 INFO cascade.Cascade: [organicCount*paidCount]
starting flows: 1
09/05/07 20:56:58 INFO cascade.Cascade: [organicCount*paidCount]
allocating threads: 1
09/05/07 20:56:58 INFO cascade.Cascade: [organicCount*paidCount]
starting flow: organicCount*paidCount
09/05/07 20:56:58 INFO flow.Flow: [organicCount*paidCount] atleast one
sink is marked for delete
09/05/07 20:56:58 INFO flow.Flow: [organicCount*paidCount] sink oldest
modified date: Wed Dec 31 15:59:59 PST 1969
09/05/07 20:56:58 INFO flow.Flow: [organicCount*paidCount] starting
09/05/07 20:56:58 INFO flow.Flow: [organicCount*paidCount] source: Dfs
["TextLine[['offset', 'line']->[ALL]]"]["/logs/short-20090504.log"]"]
09/05/07 20:56:58 INFO flow.Flow: [organicCount*paidCount] sink:
cascading.hbase.HBaseTap@62ca6f60
09/05/07 20:56:58 INFO zookeeper.ZooKeeperWrapper: Quorum servers:
10.10.20.15:2181,10.10.20.14:2181,10.10.20.13:2181,10.10.20.19:2181,10.10.20.17:2181
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.1.0--1, built on 03/05/2009 20:16 GMT
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:host.name=face
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_14-ea
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_14/jre
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/usr/local/hadoop/bin/../conf:/usr/java/
jdk1.6.0_14/lib/tools.jar:/usr/local/hadoop/bin/..:/usr/local/hadoop/
bin/../hadoop-0.20.0-core.jar:/usr/local/hadoop/bin/../lib/commons-
cli-2.0-SNAPSHOT.jar:/usr/local/hadoop/bin/../lib/commons-
codec-1.3.jar:/usr/local/hadoop/bin/../lib/commons-el-1.0.jar:/usr/
local/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hadoop/
bin/../lib/commons-logging-1.0.4.jar:/usr/local/hadoop/bin/../lib/
commons-logging-api-1.0.4.jar:/usr/local/hadoop/bin/../lib/commons-
net-1.4.1.jar:/usr/local/hadoop/bin/../lib/core-3.1.1.jar:/usr/local/
hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/usr/local/hadoop/bin/../lib/
jasper-compiler-5.5.12.jar:/usr/local/hadoop/bin/../lib/jasper-
runtime-5.5.12.jar:/usr/local/hadoop/bin/../lib/jets3t-0.6.1.jar:/usr/
local/hadoop/bin/../lib/jetty-6.1.14.jar:/usr/local/hadoop/bin/../lib/
jetty-util-6.1.14.jar:/usr/local/hadoop/bin/../lib/junit-3.8.1.jar:/
usr/local/hadoop/bin/../lib/kfs-0.2.2.jar:/usr/local/hadoop/bin/../lib/
log4j-1.2.15.jar:/usr/local/hadoop/bin/../lib/oro-2.0.8.jar:/usr/local/
hadoop/bin/../lib/servlet-api-2.5-6.1.14.jar:/usr/local/hadoop/bin/../
lib/slf4j-api-1.4.3.jar:/usr/local/hadoop/bin/../lib/slf4j-
log4j12-1.4.3.jar:/usr/local/hadoop/bin/../lib/xmlenc-0.52.jar:/usr/
local/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop/bin/../
lib/jsp-2.1/jsp-api-2.1.jar:/usr/local/hbase/conf:/usr/local/hbase/
build/hbase-0.20.0-dev.jar:/usr/local/hbase/lib/zookeeper-3.1.0-
hbase-1241.jar
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/usr/local/hadoop/bin/../lib/native/
Linux-amd64-64
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:os.name=Linux
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:os.arch=amd64
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.24-23-server
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:user.name=anton
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:user.home=/home/anton
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/home/anton/loganalysis
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Initiating client
connection,
host=10.10.20.15:2181,10.10.20.14:2181,10.10.20.13:2181,10.10.20.19:2181,10.10.20.17:2181
sessionTimeout=10000
watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@92f1bf0
09/05/07 20:56:58 INFO zookeeper.ClientCnxn:
zookeeper.disableAutoWatchReset is false
09/05/07 20:56:58 INFO zookeeper.ClientCnxn: Attempting connection to
server /10.10.20.13:2181
09/05/07 20:56:58 INFO zookeeper.ClientCnxn: Priming connection to
java.nio.channels.SocketChannel[connected local=/10.10.20.42:38554
remote=/10.10.20.13:2181]
09/05/07 20:56:58 INFO zookeeper.ClientCnxn: Server connection
successful
09/05/07 20:56:59 INFO flow.Flow: [organicCount*paidCount] parallel
execution is enabled: true
09/05/07 20:56:59 INFO flow.Flow: [organicCount*paidCount] starting
jobs: 4
09/05/07 20:56:59 INFO flow.Flow: [organicCount*paidCount] allocating
threads: 4
09/05/07 20:56:59 INFO flow.FlowStep: [organicCount*paidCount]
starting step: (1/4) TempHfs["SequenceFile[['day', 'urlid',
'method']]"][import/26762/]
09/05/07 20:56:59 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/05/07 20:56:59 INFO mapred.FileInputFormat: Total input paths to
process : 1
09/05/07 20:57:34 INFO flow.FlowStep: [organicCount*paidCount]
starting step: (2/4) TempHfs["SequenceFile[['day', 'urlid', 'count']]"]
[organicCount/44134/]
09/05/07 20:57:34 INFO flow.FlowStep: [organicCount*paidCount]
starting step: (3/4) TempHfs["SequenceFile[['paid_day', 'paid_urlid',
'paid_count']]"][paidCount/41830/]
09/05/07 20:57:34 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/05/07 20:57:34 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/05/07 20:57:34 INFO mapred.FileInputFormat: Total input paths to
process : 200
09/05/07 20:57:34 INFO mapred.FileInputFormat: Total input paths to
process : 200
09/05/07 20:59:00 INFO flow.FlowStep: [organicCount*paidCount]
starting step: (4/4) cascading.hbase.HBaseTap@62ca6f60
09/05/07 20:59:00 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/05/07 20:59:00 INFO mapred.FileInputFormat: Total input paths to
process : 79
09/05/07 20:59:00 INFO mapred.FileInputFormat: Total input paths to
process : 79
09/05/07 20:59:00 INFO mapred.FileInputFormat: Total input paths to
process : 79
09/05/07 20:59:00 INFO mapred.FileInputFormat: Total input paths to
process : 79
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount]
completion events count: 10
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000245_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000005_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000006_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000012_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000015_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000000_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000009_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000010_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000016_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000008_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.Flow: stopping jobs
09/05/07 21:00:35 INFO flow.FlowStep: [organicCount*paidCount]
stopping: (4/4) cascading.hbase.HBaseTap@62ca6f60
09/05/07 21:00:35 INFO flow.FlowStep: [organicCount*paidCount]
stopping: (3/4) TempHfs["SequenceFile[['paid_day', 'paid_urlid',
'paid_count']]"][paidCount/41830/]
09/05/07 21:00:35 INFO flow.FlowStep: [organicCount*paidCount]
stopping: (2/4) TempHfs["SequenceFile[['day', 'urlid', 'count']]"]
[organicCount/44134/]
09/05/07 21:00:35 INFO flow.FlowStep: [organicCount*paidCount]
stopping: (1/4) TempHfs["SequenceFile[['day', 'urlid', 'method']]"]
[import/26762/]
09/05/07 21:00:35 WARN flow.Flow: stopped jobs
09/05/07 21:00:35 WARN flow.Flow: shutting down job executor
09/05/07 21:00:35 WARN flow.Flow: shutdown complete
Exception in thread "flow organicCount*paidCount"
java.lang.IllegalArgumentException: java.net.URISyntaxException:
Relative path in absolute URI: hbase:url_summary
at org.apache.hadoop.fs.Path.initialize(Path.java:140)
at org.apache.hadoop.fs.Path.<init>(Path.java:126)
at cascading.hbase.HBaseTap.getPath(Unknown Source)
at cascading.tap.hadoop.Hadoop18TapUtil.cleanupTap
(Hadoop18TapUtil.java:166)
at cascading.flow.FlowStep.cleanTap(FlowStep.java:363)
at cascading.flow.FlowStep.clean(FlowStep.java:348)
at cascading.flow.Flow.cleanTemporaryFiles(Flow.java:1007)
at cascading.flow.Flow.run(Flow.java:839)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.URISyntaxException: Relative path in absolute URI:
hbase:url_summary
at java.net.URI.checkPath(URI.java:1787)
at java.net.URI.<init>(URI.java:735)
at org.apache.hadoop.fs.Path.initialize(Path.java:137)
... 8 more
09/05/07 21:00:35 WARN cascade.Cascade: [organicCount*paidCount] flow
failed: organicCount*paidCount
cascading.flow.FlowException: step failed: (4/4)
cascading.hbase.HBaseTap@62ca6f60
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:478)
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:409)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
09/05/07 21:00:35 WARN cascade.Cascade: [organicCount*paidCount]
stopping flows
09/05/07 21:00:35 INFO cascade.Cascade: [organicCount*paidCount]
stopping flow: organicCount*paidCount
09/05/07 21:00:35 WARN cascade.Cascade: [organicCount*paidCount]
stopped flows
09/05/07 21:00:35 WARN cascade.Cascade: [organicCount*paidCount]
shutting down flow executor
09/05/07 21:00:35 WARN cascade.Cascade: [organicCount*paidCount]
shutdown complete
Exception in thread "main" cascading.cascade.CascadeException: flow
failed: organicCount*paidCount
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:428)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:369)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: cascading.flow.FlowException: step failed: (4/4)
cascading.hbase.HBaseTap@62ca6f60
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:478)
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:409)
... 5 more

Chris K Wensel

unread,
May 8, 2009, 11:24:53 AM5/8/09
to cascadi...@googlegroups.com
Hey Anton

Sorry for the troubles.

In part this is a 'bug' introduces by bug fixes in recent releases.
The quick fix is to have the getPath() method on the Tap return
something like hbase://.... instead of hbase:...

This value should be unused except for identity, but to get around
sticky speculative execution issues, recent patches have blindly asked
for this to perform some cleanup.

Let me see if I can get a more stable fix out sometime.
--
Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/

Anton Stroganov

unread,
May 8, 2009, 2:35:57 PM5/8/09
to cascadi...@googlegroups.com
Hey Chris

I changed getPath() to return SCHEME + "://" + tableName;, but still get:

Exception in thread "flow organicCount*paidCount"
java.lang.IllegalArgumentException: java.net.URISyntaxException:
Relative path in absolute URI: hbase://url_summary_temporary

Trying to dig through the source to find the reason for the problem,
will let you know if I find anything.

Anton

Chris K Wensel

unread,
May 9, 2009, 6:37:30 PM5/9/09
to cascadi...@googlegroups.com
try one slash maybe, i'm having luck with jdbc:/foo/bar

ckw

Anton Stroganov

unread,
May 13, 2009, 4:34:21 PM5/13/09
to cascadi...@googlegroups.com
So, we tracked it down to an underlying problem - the real reason it
was failing was because it was trying to write null values to hbase.
Somehow that messed things up in a weird way so that the error message
I got was the one at the beginning of this thread. Would've been
helpful to see the real reason for the failure, but I suppose no easy
way to do that with many distribued workers.

Anyway, if anybody else runs into this problem, double check that you
are either filtering out tuples with nulls, or changing nulls to
zeroes. You can get nulls in your tuple stream when you do an outer
join, so it's something to watch out for.

Anton

Chris K Wensel

unread,
May 13, 2009, 4:55:55 PM5/13/09
to cascadi...@googlegroups.com
very interesting. thanks for the note.

btw, which branch from GitHub are you using?

ckw

Anton Stroganov

unread,
May 13, 2009, 5:00:09 PM5/13/09
to cascadi...@googlegroups.com
Your official one.

Chris K Wensel

unread,
May 13, 2009, 5:06:34 PM5/13/09
to cascadi...@googlegroups.com
well, 'official' is relative. but i'll assume you are on 'master'.

you might poke inside 'allmerged', i made an attempt to pull in
changes from other forks (specifically from one of the HBase guys).

he uses Hbase in a very meta way, so I was curious if the interfaces
avail in the 'allmerged' branch were more useful for you.

ckw

mlimotte

unread,
Jun 24, 2009, 12:16:35 PM6/24/09
to cascading-user
We have no nulls in our tuples, but were still seeing this problem.

Chris's suggestion of a single "/" worked for us, though. We modified
the HBaseTap with a new getPath():

public Path getPath() {
// Original: return new Path(getURI().toString());
URI uri = getURI();
return new Path(uri.getScheme() + ":/" + uri.getSchemeSpecificPart
());
}

Seems to work. Thanks for the suggestion, Chris.

Marc


On May 13, 1:34 pm, Anton Stroganov <strogano...@gmail.com> wrote:
> So, we tracked it down to an underlying problem - the real reason it
> was failing was because it was trying to write null values tohbase.
> Somehow that messed things up in a weird way so that the error message
> I got was the one at the beginning of this thread. Would've been
> helpful to see the real reason for the failure, but I suppose no easy
> way to do that with many distribued workers.
>
> Anyway, if anybody else runs into this problem, double check that you
> are either filtering out tuples with nulls, or changing nulls to
> zeroes. You can get nulls in your tuple stream when you do an outer
> join, so it's something to watch out for.
>
> Anton
>
> On Sat, May 9, 2009 at 3:37 PM, Chris K Wensel <ch...@wensel.net> wrote:
>
>
>
> > try one slash maybe, i'm having luck with jdbc:/foo/bar
>
> > ckw
>
> > On May 8, 2009, at 11:35 AM, Anton Stroganov wrote:
>
> >> Hey Chris
>
> >> I changedgetPath() to return SCHEME + "://" + tableName;, but still
> >> get:
>
> >> Exception in thread "flow organicCount*paidCount"
> >> java.lang.IllegalArgumentException: java.net.URISyntaxException:
> >> Relative path in absolute URI:hbase://url_summary_temporary
>
> >> Trying to dig through the source to find the reason for the problem,
> >> will let you know if I find anything.
>
> >> Anton
>
> >> On Fri, May 8, 2009 at 8:24 AM, Chris K Wensel <ch...@wensel.net>
> >> wrote:
>
> >>> Hey Anton
>
> >>> Sorry for the troubles.
>
> >>> In part this is a 'bug' introduces by bug fixes in recent releases.
> >>> The quick fix is to have thegetPath() method on theTapreturn
> >>> something likehbase://.... instead ofhbase:...
>
> >>> This value should be unused except for identity, but to get around
> >>> sticky speculative execution issues, recent patches have blindly
> >>> asked
> >>> for this to perform some cleanup.
>
> >>> Let me see if I can get a more stable fix out sometime.
>
> >>> On May 7, 2009, at 9:23 PM, Anton wrote:
>
> >>>> Hey all
>
> >>>> I'm trying to write toHBaseas output of my job, but am getting
> >>>> this
> >>>> strange error:
>
> >>>> Exception in thread "flow organicCount*paidCount"
> >>>> java.lang.IllegalArgumentException: java.net.URISyntaxException:
> >>>> Relative path in absolute URI:hbase:url_summary
>
> >>>> Originally I had my family names in the fields, and thought the
> >>>> colons
> >>>> in the field names were the problem (ie, thetapwas set up like
> >>>> so):
>
> >>>> String table = "url_summary";
> >>>> Fields key = new Fields ( "urlid_day" );
> >>>> Fields[] fields = new Fields[]{ new Fields ( "default:organic",
> >>>> "default:paid" ) };
> >>>> HBaseScheme scheme = new HbaseScheme( key, fields );
> >>>>TaplogHBaseSinkTap = new HBaseTap( table, scheme );
>
> >>>> I change that it to be set up like this
>
> >>>> String table = "url_summary";
> >>>> Fields key = new Fields ( "urlid_day" );
> >>>> String[] family = { "default" };
> >>>> Fields[] fields = new Fields[]{ new Fields ( "organic", "paid" ) };
> >>>> HBaseScheme scheme = new HbaseScheme( key, family, fields );
> >>>>TaplogHBaseSinkTap = new HBaseTap( table, scheme );
>
> >>>> (basing off the examples in the cascading.hbaseunit test at
> >>>>http://github.com/cwensel/cascading.hbase/blob/a8e188021120bc3e58463a...)
>
> >>>> but I still get the same error.
>
> >>>> I googled around a bit and came across this bug in Hadoop:
> >>>>https://issues.apache.org/jira/browse/HADOOP-2066- could that be
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages