Gobblin MySql implementation

584 views
Skip to first unread message

Divya Krishnan

unread,
Oct 12, 2015, 6:37:09 AM10/12/15
to gobblin-users
Hi all,


       I'm trying to implement data ingestion from mysql.I changed the job configuration files accordingly.But still,its not working.Can someone explain the flow and the changes that I have to make in configuration?
      Thanks!

Chavdar Botev

unread,
Oct 12, 2015, 12:48:11 PM10/12/15
to Divya Krishnan, gobblin-users
Hi Divya,

Can you please provide more details about your changes, what is not
working and any error messages that you see?
> --
> You received this message because you are subscribed to the Google Groups
> "gobblin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gobblin-user...@googlegroups.com.
> To post to this group, send email to gobbli...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gobblin-users/09ed208b-ad58-4c66-b4f4-7a73ff463c9a%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Divya Krishnan

unread,
Oct 12, 2015, 11:59:32 PM10/12/15
to gobblin-users


Hi,
nohup.out is showing the following.So I couldnt see the error messages.Kindly help

Java HotSpot(TM) Client VM warning: Cannot open file /gobblin-gc.log due to Permission denied

log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /gobblin-current.log (Permission denied)
    at java.io.FileOutputStream.open0(Native Method)
    at java.io.FileOutputStream.open(FileOutputStream.java:270)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:133)
    at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
    at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
    at org.apache.log4j.rolling.RollingFileAppender.activateOptions(RollingFileAppender.java:180)
    at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
    at org.apache.log4j.xml.DOMConfigurator.parseAppender(DOMConfigurator.java:295)
    at org.apache.log4j.xml.DOMConfigurator.findAppenderByName(DOMConfigurator.java:176)
    at org.apache.log4j.xml.DOMConfigurator.findAppenderByReference(DOMConfigurator.java:191)
    at org.apache.log4j.xml.DOMConfigurator.parseChildrenOfLoggerElement(DOMConfigurator.java:523)
    at org.apache.log4j.xml.DOMConfigurator.parseRoot(DOMConfigurator.java:492)
    at org.apache.log4j.xml.DOMConfigurator.parse(DOMConfigurator.java:1006)
    at org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:872)
    at org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:778)
    at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
    at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
    at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66)
    at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:284)
    at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:304)
    at gobblin.scheduler.SchedulerDaemon.<clinit>(SchedulerDaemon.java:51)



Sahil Takiar

unread,
Oct 13, 2015, 12:34:06 AM10/13/15
to Divya Krishnan, gobblin-users
Hey Divya,

Are you using the gobblin-standalone.sh script? If so, be sure to set the "--logdir" option to a directory you have write access to (for example, "sh gobblin-standalone.sh --logdir myGobblinDir ...").

Currently, it looks like this property is set to your root directory, so Gobblin is trying to write its logs to the file /gobblin-current.log. Typically you need sudo permssions to write to the root directory, which is why you see the "Permission denied" exception.

Let us know if that works!

Thanks

--Sahil

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.

Divya Krishnan

unread,
Oct 13, 2015, 1:51:12 AM10/13/15
to gobblin-users
Hey Sahil,

Thanks a lot! That issue got resolved.I'm using http://github.com/linkedin/gobblin..Since this my first attempt,I couldnt find out what is missing.I'm getting some errors.

mysql.properties

# Source properties - source class to extract data from Mysql Source
source.class=gobblin.source.extractor.extract.jdbc.MysqlSource

# Source properties
source.max.number.of.partitions=1
source.querybased.partition.interval=1
source.querybased.is.compression=true
source.querybased.watermark.type=timestamp

# Source connection properties
source.conn.driver=com.mysql.jdbc.Driver
source.conn.username=root
source.conn.password=701569
source.conn.host=localhost
source.conn.port=3306
source.conn.timeout=10

# Converter properties - Record from mysql source will be processed by the below series of converters
converter.classes=gobblin.converter.avro.JsonIntermediateToAvroConverter

# date columns format
converter.avro.timestamp.format=yyyy-MM-dd HH:mm:ss'.0'
converter.avro.date.format=yyyy-MM-dd
converter.avro.time.format=HH:mm:ss

# Qualitychecker properties
qualitychecker.task.policies=gobblin.policies.count.RowCountPolicy,gobblin.policies.schema.SchemaCompatibilityPolicy
qualitychecker.task.policy.types=OPTIONAL,OPTIONAL

# Publisher properties
data.publisher.type=gobblin.publisher.BaseDataPublisher

table_full.pull

# Job properties
job.name=GobblinMySql
job.group=MySql
job.description=Data pull from MySql

# Extract properties
extract.namespace=gobblin.source.extractor.extract.jdbc.MysqlExtractor
extract.table.type=snapshot_append
extract.delta.fields=<delta fields>
extract.primary.key.fields=<primary key columns>

# Property to consider the extract as full dump
extract.is.full=true

# Source properties
source.querybased.schema=students
source.entity=Students
source.querybased.extract.type=snapshot


Errors :

org.apache.commons.dbcp.SQLNestedException: Cannot load JDBC driver class 'com.mysql.jdbc.Driver'


2015-10-13 11:13:06 IST ERROR [TaskExecutor-0] gobblin.source.extractor.extract.jdbc.MysqlSource [students_Students_1444714984612_0] 42 - Failed to prepare extractor: error - Failed to get schema for this object; error - Failed to get metadata using JDBC; error - Failed to get schema from Mysql; error - null

2015-10-13 11:13:06 IST ERROR [TaskExecutor-0] gobblin.source.extractor.extract.jdbc.JdbcExtractor [students_Students_1444714984612_0] 694 - Failed to execute sql:select  col.column_name,  col.data_type,  case when CHARACTER_OCTET_LENGTH is null then 0 else 0 end as length,  case when NUMERIC_PRECISION is null then 0 else NUMERIC_PRECISION end as precesion,  case when NUMERIC_SCALE is null then 0 else NUMERIC_SCALE end as scale,  case when is_nullable='NO' then 'false' else 'true' end as nullable,  '' as format,  case when col.column_comment is null then '' else col.column_comment end as comment  from information_schema.COLUMNS col  WHERE upper(col.table_name)=upper(?) AND upper(col.table_schema)=upper(?)  order by col.ORDINAL_POSITION




Sahil Takiar

unread,
Oct 13, 2015, 2:46:56 PM10/13/15
to Divya Krishnan, gobblin-users
Hey Divya,

It seems that a MySQL Driver is not automatically added to the Gobblin package when you build the project. Can you download a release of the mysql-java-connector (http://mvnrepository.com/artifact/mysql/mysql-connector-java) and add the jar to the gobblin-dist/lib/ folder?

--Sahil

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.

Divya Krishnan

unread,
Oct 14, 2015, 1:13:39 AM10/14/15
to gobblin-users


Hey Sahil,

Thank you! The error got resolved after adding the jar.Its fetching the no of records in the db.But,now there is one more error.

2015-10-14 10:31:45 IST ERROR [TaskExecutor-0] gobblin.runtime.Task [students_samp_1444798897862_0] 249 - Task task_GobblinMySql_1444798897862_0 failed
org.apache.avro.SchemaParseException: Empty name
    at org.apache.avro.Schema.validateName(Schema.java:1076)
    at org.apache.avro.Schema.access$200(Schema.java:79)
    at org.apache.avro.Schema$Name.<init>(Schema.java:436)
    at org.apache.avro.Schema.createRecord(Schema.java:145)
    at gobblin.converter.avro.JsonIntermediateToAvroConverter.convertSchema(JsonIntermediateToAvroConverter.java:91)
    at gobblin.converter.avro.JsonIntermediateToAvroConverter.convertSchema(JsonIntermediateToAvroConverter.java:52)
    at gobblin.instrumented.converter.InstrumentedConverterDecorator.convertSchema(InstrumentedConverterDecorator.java:74)

Sahil Takiar

unread,
Oct 14, 2015, 2:05:07 AM10/14/15
to Divya Krishnan, gobblin-users
Hey Divya,

Can you try setting the config "extract.table.name" in your .job file? You can set it something simple like extract.table.name=testGobblinJob

--Sahil

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.

Divya Krishnan

unread,
Oct 14, 2015, 2:30:38 AM10/14/15
to gobblin-users


Hey Sahil,

I tried changing it.But still,its showing the same error.

Sahil Takiar

unread,
Oct 14, 2015, 2:07:20 PM10/14/15
to Divya Krishnan, gobblin-users
Hey Divya,

Can you send me all your configuration files, along with their locations, and the command you are using to launch Gobblin? I want to see if I can replicate the exception.

Thanks

--Sahil

On Tue, Oct 13, 2015 at 11:30 PM, Divya Krishnan <divyaaa...@gmail.com> wrote:


Hey Sahil,

I tried changing it.But still,its showing the same error.

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.

Divya Krishnan

unread,
Oct 15, 2015, 3:43:47 AM10/15/15
to gobblin-users

Hey Sahil,
             I have attached the configuration files.
            
             And I used  bin/gobblin-standalone.sh start --workdir /home/divya/gobblinwork/ --conf /home/divya/gobblinjob/ --logdir /home/divya/gobblinlog/ where gobblinjob directory holds mysql.properties and table.pull.
                
              And the source and the converter classes are here : https://github.com/linkedin/gobblin/tree/master/gobblin-core/src/main (I have a local copy of this codebase)

           
Thanks,
Divya


mysql.properties
table_full.pull

Divya Krishnan

unread,
Oct 16, 2015, 12:30:00 AM10/16/15
to gobblin-users


Hey Sahil,
              
             I resolved the issue by myself.Thanks for you help!
            

Sahil Takiar

unread,
Oct 16, 2015, 1:46:33 PM10/16/15
to Divya Krishnan, gobblin-users
Hey Divya,

Glad the issue got resolved! Just curios, what was the problem?

--Sahil

On Thu, Oct 15, 2015 at 9:29 PM, Divya Krishnan <divyaaa...@gmail.com> wrote:


Hey Sahil,
              
             I resolved the issue by myself.Thanks for you help!
            

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.

Divya Krishnan

unread,
Oct 18, 2015, 2:43:53 AM10/18/15
to gobblin-users
Hey Sahil,


            It was a very silly mistake.I missed out the extract namespace.Now its working.And how to make it write to hdfs? Currently,its writing to local fs.
            Thanks!

Sahil Takiar

unread,
Oct 18, 2015, 9:32:20 PM10/18/15
to Divya Krishnan, gobblin-users
Hey Divya,

In your config files you should see a property named "fs.uri" simply change that property to the NameNode URI of the HDFS FileSystem you want to write to (for example, hdfs://localhost:8020).

--Sahil

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.

Divya Krishnan

unread,
Oct 19, 2015, 12:28:29 AM10/19/15
to gobblin-users
Hey Sahil,

             Do you mean I have to include fs.uri=hdfs://localhost:9000 in my job configuration? Should I make any changes in the configuration keys?

Divya Krishnan

unread,
Oct 19, 2015, 3:08:02 AM10/19/15
to gobblin-users


Hey Sahil,
 
               I tried it and got the following error.It created the workdir (gobblinworkpath) in hdfs but failed to create output.

2015-10-19 12:22:53 IST ERROR [ForkExecutor-0] gobblin.runtime.Fork  167 - Fork 0 of task task_GobblinDemo_1445237570275_0 failed to process data records
java.io.IOException: Mkdirs failed to create /gobblinworkpath/task-staging/gobblin/example/simplejson/ExampleTable/20151019065253_append
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378)
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:364)
Reply all
Reply to author
Forward
0 new messages