Hey all
I'm trying to write to HBase as output of my job, but am getting this
strange error:
Exception in thread "flow organicCount*paidCount"
java.lang.IllegalArgumentException: java.net.URISyntaxException:
Relative path in absolute URI: hbase:url_summary
Originally I had my family names in the fields, and thought the colons
in the field names were the problem (ie, the tap was set up like so):
String table = "url_summary";
Fields key = new Fields ( "urlid_day" );
Fields[] fields = new Fields[]{ new Fields ( "default:organic",
"default:paid" ) };
HBaseScheme scheme = new HbaseScheme( key, fields );
Tap logHBaseSinkTap = new HBaseTap( table, scheme );
I change that it to be set up like this
String table = "url_summary";
Fields key = new Fields ( "urlid_day" );
String[] family = { "default" };
Fields[] fields = new Fields[]{ new Fields ( "organic", "paid" ) };
HBaseScheme scheme = new HbaseScheme( key, family, fields );
Tap logHBaseSinkTap = new HBaseTap( table, scheme );
(basing off the examples in the cascading.hbase unit test at
http://github.com/cwensel/cascading.hbase/blob/a8e188021120bc3e58463aaf09ce5b6df6a3884e/src/test/cascading/hbase/MultiFamilyHBaseTest.java)
but I still get the same error.
I googled around a bit and came across this bug in Hadoop:
https://issues.apache.org/jira/browse/HADOOP-2066 - could that be
related? Not really sure what else to try.
full log below in case that's useful:
hadoop jar ./build/loganalysis.jar
09/05/07 20:56:58 INFO flow.MultiMapReducePlanner: using application
jar: /home/anton/loganalysis/./build/loganalysis.jar
09/05/07 20:56:58 INFO cascade.Cascade: [organicCount*paidCount]
starting
09/05/07 20:56:58 INFO cascade.Cascade: [organicCount*paidCount]
starting flows: 1
09/05/07 20:56:58 INFO cascade.Cascade: [organicCount*paidCount]
allocating threads: 1
09/05/07 20:56:58 INFO cascade.Cascade: [organicCount*paidCount]
starting flow: organicCount*paidCount
09/05/07 20:56:58 INFO flow.Flow: [organicCount*paidCount] atleast one
sink is marked for delete
09/05/07 20:56:58 INFO flow.Flow: [organicCount*paidCount] sink oldest
modified date: Wed Dec 31 15:59:59 PST 1969
09/05/07 20:56:58 INFO flow.Flow: [organicCount*paidCount] starting
09/05/07 20:56:58 INFO flow.Flow: [organicCount*paidCount] source: Dfs
["TextLine[['offset', 'line']->[ALL]]"]["/logs/short-20090504.log"]"]
09/05/07 20:56:58 INFO flow.Flow: [organicCount*paidCount] sink:
cascading.hbase.HBaseTap@62ca6f60
09/05/07 20:56:58 INFO zookeeper.ZooKeeperWrapper: Quorum servers:
10.10.20.15:2181,
10.10.20.14:2181,
10.10.20.13:2181,
10.10.20.19:2181,
10.10.20.17:2181
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.1.0--1, built on 03/05/2009 20:16 GMT
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:
host.name=face
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_14-ea
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/java/jdk1.6.0_14/jre
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/usr/local/hadoop/bin/../conf:/usr/java/
jdk1.6.0_14/lib/tools.jar:/usr/local/hadoop/bin/..:/usr/local/hadoop/
bin/../hadoop-0.20.0-core.jar:/usr/local/hadoop/bin/../lib/commons-
cli-2.0-SNAPSHOT.jar:/usr/local/hadoop/bin/../lib/commons-
codec-1.3.jar:/usr/local/hadoop/bin/../lib/commons-el-1.0.jar:/usr/
local/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/hadoop/
bin/../lib/commons-logging-1.0.4.jar:/usr/local/hadoop/bin/../lib/
commons-logging-api-1.0.4.jar:/usr/local/hadoop/bin/../lib/commons-
net-1.4.1.jar:/usr/local/hadoop/bin/../lib/core-3.1.1.jar:/usr/local/
hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/usr/local/hadoop/bin/../lib/
jasper-compiler-5.5.12.jar:/usr/local/hadoop/bin/../lib/jasper-
runtime-5.5.12.jar:/usr/local/hadoop/bin/../lib/jets3t-0.6.1.jar:/usr/
local/hadoop/bin/../lib/jetty-6.1.14.jar:/usr/local/hadoop/bin/../lib/
jetty-util-6.1.14.jar:/usr/local/hadoop/bin/../lib/junit-3.8.1.jar:/
usr/local/hadoop/bin/../lib/kfs-0.2.2.jar:/usr/local/hadoop/bin/../lib/
log4j-1.2.15.jar:/usr/local/hadoop/bin/../lib/oro-2.0.8.jar:/usr/local/
hadoop/bin/../lib/servlet-api-2.5-6.1.14.jar:/usr/local/hadoop/bin/../
lib/slf4j-api-1.4.3.jar:/usr/local/hadoop/bin/../lib/slf4j-
log4j12-1.4.3.jar:/usr/local/hadoop/bin/../lib/xmlenc-0.52.jar:/usr/
local/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop/bin/../
lib/jsp-2.1/jsp-api-2.1.jar:/usr/local/hbase/conf:/usr/local/hbase/
build/hbase-0.20.0-dev.jar:/usr/local/hbase/lib/zookeeper-3.1.0-
hbase-1241.jar
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/usr/local/hadoop/bin/../lib/native/
Linux-amd64-64
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:
os.name=Linux
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:os.arch=amd64
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.24-23-server
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:
user.name=anton
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:user.home=/home/anton
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/home/anton/loganalysis
09/05/07 20:56:58 INFO zookeeper.ZooKeeper: Initiating client
connection,
host=
10.10.20.15:2181,
10.10.20.14:2181,
10.10.20.13:2181,
10.10.20.19:2181,
10.10.20.17:2181
sessionTimeout=10000
watcher=org.apache.hadoop.hbase.zookeeper.WatcherWrapper@92f1bf0
09/05/07 20:56:58 INFO zookeeper.ClientCnxn:
zookeeper.disableAutoWatchReset is false
09/05/07 20:56:58 INFO zookeeper.ClientCnxn: Attempting connection to
server /
10.10.20.13:2181
09/05/07 20:56:58 INFO zookeeper.ClientCnxn: Priming connection to
java.nio.channels.SocketChannel[connected local=/
10.10.20.42:38554
remote=/
10.10.20.13:2181]
09/05/07 20:56:58 INFO zookeeper.ClientCnxn: Server connection
successful
09/05/07 20:56:59 INFO flow.Flow: [organicCount*paidCount] parallel
execution is enabled: true
09/05/07 20:56:59 INFO flow.Flow: [organicCount*paidCount] starting
jobs: 4
09/05/07 20:56:59 INFO flow.Flow: [organicCount*paidCount] allocating
threads: 4
09/05/07 20:56:59 INFO flow.FlowStep: [organicCount*paidCount]
starting step: (1/4) TempHfs["SequenceFile[['day', 'urlid',
'method']]"][import/26762/]
09/05/07 20:56:59 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/05/07 20:56:59 INFO mapred.FileInputFormat: Total input paths to
process : 1
09/05/07 20:57:34 INFO flow.FlowStep: [organicCount*paidCount]
starting step: (2/4) TempHfs["SequenceFile[['day', 'urlid', 'count']]"]
[organicCount/44134/]
09/05/07 20:57:34 INFO flow.FlowStep: [organicCount*paidCount]
starting step: (3/4) TempHfs["SequenceFile[['paid_day', 'paid_urlid',
'paid_count']]"][paidCount/41830/]
09/05/07 20:57:34 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/05/07 20:57:34 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/05/07 20:57:34 INFO mapred.FileInputFormat: Total input paths to
process : 200
09/05/07 20:57:34 INFO mapred.FileInputFormat: Total input paths to
process : 200
09/05/07 20:59:00 INFO flow.FlowStep: [organicCount*paidCount]
starting step: (4/4) cascading.hbase.HBaseTap@62ca6f60
09/05/07 20:59:00 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
09/05/07 20:59:00 INFO mapred.FileInputFormat: Total input paths to
process : 79
09/05/07 20:59:00 INFO mapred.FileInputFormat: Total input paths to
process : 79
09/05/07 20:59:00 INFO mapred.FileInputFormat: Total input paths to
process : 79
09/05/07 20:59:00 INFO mapred.FileInputFormat: Total input paths to
process : 79
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount]
completion events count: 10
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000245_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000005_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000006_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000012_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000015_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000000_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000009_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000010_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000016_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.FlowStep: [organicCount*paidCount] event =
Task Id : attempt_200905020157_0109_m_000008_0, Status : SUCCEEDED
09/05/07 21:00:35 WARN flow.Flow: stopping jobs
09/05/07 21:00:35 INFO flow.FlowStep: [organicCount*paidCount]
stopping: (4/4) cascading.hbase.HBaseTap@62ca6f60
09/05/07 21:00:35 INFO flow.FlowStep: [organicCount*paidCount]
stopping: (3/4) TempHfs["SequenceFile[['paid_day', 'paid_urlid',
'paid_count']]"][paidCount/41830/]
09/05/07 21:00:35 INFO flow.FlowStep: [organicCount*paidCount]
stopping: (2/4) TempHfs["SequenceFile[['day', 'urlid', 'count']]"]
[organicCount/44134/]
09/05/07 21:00:35 INFO flow.FlowStep: [organicCount*paidCount]
stopping: (1/4) TempHfs["SequenceFile[['day', 'urlid', 'method']]"]
[import/26762/]
09/05/07 21:00:35 WARN flow.Flow: stopped jobs
09/05/07 21:00:35 WARN flow.Flow: shutting down job executor
09/05/07 21:00:35 WARN flow.Flow: shutdown complete
Exception in thread "flow organicCount*paidCount"
java.lang.IllegalArgumentException: java.net.URISyntaxException:
Relative path in absolute URI: hbase:url_summary
at org.apache.hadoop.fs.Path.initialize(Path.java:140)
at org.apache.hadoop.fs.Path.<init>(Path.java:126)
at cascading.hbase.HBaseTap.getPath(Unknown Source)
at cascading.tap.hadoop.Hadoop18TapUtil.cleanupTap
(Hadoop18TapUtil.java:166)
at cascading.flow.FlowStep.cleanTap(FlowStep.java:363)
at cascading.flow.FlowStep.clean(FlowStep.java:348)
at cascading.flow.Flow.cleanTemporaryFiles(Flow.java:1007)
at cascading.flow.Flow.run(Flow.java:839)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.URISyntaxException: Relative path in absolute URI:
hbase:url_summary
at java.net.URI.checkPath(URI.java:1787)
at java.net.URI.<init>(URI.java:735)
at org.apache.hadoop.fs.Path.initialize(Path.java:137)
... 8 more
09/05/07 21:00:35 WARN cascade.Cascade: [organicCount*paidCount] flow
failed: organicCount*paidCount
cascading.flow.FlowException: step failed: (4/4)
cascading.hbase.HBaseTap@62ca6f60
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:478)
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:409)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
09/05/07 21:00:35 WARN cascade.Cascade: [organicCount*paidCount]
stopping flows
09/05/07 21:00:35 INFO cascade.Cascade: [organicCount*paidCount]
stopping flow: organicCount*paidCount
09/05/07 21:00:35 WARN cascade.Cascade: [organicCount*paidCount]
stopped flows
09/05/07 21:00:35 WARN cascade.Cascade: [organicCount*paidCount]
shutting down flow executor
09/05/07 21:00:35 WARN cascade.Cascade: [organicCount*paidCount]
shutdown complete
Exception in thread "main" cascading.cascade.CascadeException: flow
failed: organicCount*paidCount
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:428)
at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:369)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: cascading.flow.FlowException: step failed: (4/4)
cascading.hbase.HBaseTap@62ca6f60
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:478)
at cascading.flow.FlowStep$FlowStepJob.call(FlowStep.java:409)
... 5 more