Defining a job for a daily pull

142 views
Skip to first unread message

lbe...@gmail.com

unread,
Apr 13, 2015, 11:03:02 AM4/13/15
to gobbli...@googlegroups.com

Hi,

Let's assume, that a full dump on a table was done by using the following settings:
...
extract.is.full=true
#can we use snapshot_append here as defined in one of the examples?
extract.table.type=snapshot_only 
source.querybased.extract.type=snapshot
...

Now if I want to have a daily load, what's the way to define that job?
According to the documentation we can have two options:

1. append:
...
source.querybased.extract.type=append_daily
extract.table.type=append_only
source.querybased.append.max.watermark.limit=CURRENTDATE-1
...


2. incremental:
...
extract.table.type=snapshot_append
source.querybased.extract.type=snapshot
source.querybased.low.watermark.backup.secs=86400
...


What is the difference betweeen the two and when to use what?

On a fresh project checkout (built with Hadoop2 profile) I wasn't able to run neither of them due to the following exception:
ERROR [AbstractJobLauncher] Failed to get work units for job job_mysql_import_mytable_1428935941024
java.lang.IllegalArgumentException: Invalid format: "0" is too short
    at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:673)
    at gobblin.source.extractor.utils.Utils.toDateTime(Utils.java:287)
    at gobblin.source.extractor.utils.Utils.toDateTime(Utils.java:299)
    at gobblin.source.extractor.partition.Partitioner.getAppendLowWatermark(Partitioner.java:183)
    at gobblin.source.extractor.partition.Partitioner.getLowWatermark(Partitioner.java:130)
    at gobblin.source.extractor.partition.Partitioner.getPartitions(Partitioner.java:75)
    at gobblin.source.extractor.extract.QueryBasedSource.getWorkunits(QueryBasedSource.java:67)

I remarked that the full dump didn't produce a task state file (.tst) under /state-store in HDFS.
Is there anything here I'm not aware of?


Thanks,
Lorand

Yinan Li

unread,
Apr 13, 2015, 1:28:15 PM4/13/15
to gobbli...@googlegroups.com
 Lorand,

Regarding the .tst task state file. We stopped persisting task states into .tst files since a job state already includes the task states and is already persisted, so this is redundant. You should still see .jst files being written into the state store.

Yinan 
Reply all
Reply to author
Forward
0 new messages