gobblin scalability without hadoop

Lian Jiang

unread,

Jun 14, 2017, 6:39:07 PM6/14/17

to gobblin-users

Hi,

I am new to gobblin and has some questions after reading the document.

1. gobblin has standalone and hadoop deployment. In my case, I cannot setup a hadoop cluster which is too expensive. However, I do need scalable more than standalone. Is there a middle way between standalone and hadoop cluster?

2. Many source sections are empty (e.g. https://gobblin.readthedocs.io/en/latest/sources/OracleSource/). Is there any timeline for supporting more sources?

3. Gobblin currently writes to hdfs. However, we may need to write some blob storage. Is there any document on extending Gobblin using plugin for more sources and sinks?

Thanks a lot for help!

Shirshanka Das

unread,

Jun 15, 2017, 1:18:12 PM6/15/17

to Lian Jiang, gobblin-users

Hi Lian,

1. Gobblin supports a standalone cluster concept as well as integration with AWS (ASG) to support this use-case.

2. We actually support all the sources and sinks listed in the documentation tree and are lagging behind in updating the documentation for the sources. Which specific Source are you interested in?

3. Which blob storage are you interested in writing to? We support S3 and Ambry right now and have extensible classes to support additional blob stores.

Shirshanka

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/60ac60df-4249-4650-9f53-e7af3fa58d62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lian Jiang

unread,

Jun 15, 2017, 1:39:03 PM6/15/17

to gobblin-users, jiang...@gmail.com

Hi Shirshanka,

Thanks for clarifying.

I am confused about "standalone cluster". Below statement says standalone mode runs on a single jvm:
"In the standalone mode, a Gobblin instance runs in a single JVM and tasks run in a thread pool, the size of which is configurable."

Is there any document talking about "standalone cluster mode" where gobblin is deployed in multiple machines (so that it can horizontally scale) without relying on hadoop?

I am interested in the source for oracle database and the sink for oracle object storage.

Thanks
Lian

Reply all

Reply to author

Forward