Brainstorming: documentation, getting started more easily etc

230 views
Skip to first unread message

Thibaut Barrère

unread,
Mar 9, 2012, 4:05:28 AM3/9/12
to activewareh...@googlegroups.com
Hi folks,

I'd like to pick your brains on this: now that 1.0.0.rc1 has been shipped, my current goal is to ensure we can provide a decent way for people to learn about how to use activewarehouse-etl (other gems will come later once up to date).

I won't ship the final 1.0.0 until we have something fine for new users, because I won't be able to handle the load otherwise!

Questions to newcomers

Alexandre and others: what have been (what are) your pain points when getting started? What is confusing? Any remark will be very useful (please dump your brain :-).

Once you get started it's fairly easy to use (and even extend) IMO, but getting started is not easy; I know some pain points already but please highlight the obvious!

The bare minimum

Apart from the etl-sample, I plan this bare minimum at least:
- having an online and offline up-to-date rdoc (or similar) to be used by the experienced user as a reference
- centralized place (http://www.activewarehouse.info) which will need some SEO and links to rdoc etc
- up to date READMEs on all gems

What I'd like to reach

I'd like a community-edited, up-to-date series of guides, properly advertised on http://www.activewarehouse.info.

I find the following websites inspiring:

If you read this, what do you think? Please share any insight you'll have.

-- Thibaut

Stefan Urbanek

unread,
Mar 12, 2012, 6:06:42 PM3/12/12
to ActiveWarehouse Discuss
Hi,

On Mar 9, 10:05 am, Thibaut Barrère <thibaut.barr...@gmail.com> wrote:
> Hi folks,
>
> I'd like to pick your brains on this: now that 1.0.0.rc1 has been shipped,
> my current goal is to ensure we can provide a decent way for people to
> learn about how to use activewarehouse-etl (other gems will come later once
> up to date).
>
> I won't ship the final 1.0.0 until we have something fine for new users,
> because I won't be able to handle the load otherwise!
>
> *Questions to newcomers*
>
> Alexandre and others: what have been (what are) your pain points when
> getting started? What is confusing? Any remark will be very useful (please
> dump your brain :-).
>
> Once you get started it's fairly easy to use (and even extend) IMO, but
> getting started is not easy; I know some pain points already but please
> highlight the obvious!
>
> *The bare minimum*
>
> Apart from the etl-sample<https://github.com/activewarehouse/activewarehouse-etl-sample>,

... I would stay for a while here. Few notes:

The sample is too complex for a beginner, even those who know what ETL
is, it has too many files all over the place. Those who do not know
anything about ETL might be really confused. It would be much better
if there were multiple "levels" of sample, from a sample with up to
five lines to such complex.

Example of whole ETL samples:

Example 1: load file from CSV to a database table (nothing more)
Example 2: Ex 1 + do some field transformation
Example 3: use two sources, for example CSV + table --> table
Example 4: ...

Something like this: http://flask.pocoo.org/

I know that in the case of ETL it can not be done with one file, but
anyway - as simple as possible. I've learned that from my tutorials
for Cubes - I thought they were simple enough and obvious. Yeah, sure,
for me and myself.

You might have some steps in the examples, that you will just say "do
this, you will learn later what it is". If you do, then encapsulate
them in one single step for the user - put them in a separate script.
For example data preparation. You can explain that later, in another
example.

I started writing cubes "Hello World! - Aggregation and OLAP
Server" [1] and I am still not satisfied with its complexity for new
users, however the goal is to get the impression what the framework
does in general.

[1] https://github.com/Stiivi/cubes/tree/master/examples/hello_world

> I plan this bare minimum at least:
> - having an online and offline up-to-date rdoc (or similar) to be used by
> the experienced user as a reference

Concerning docs, I like approach of SQLAlchemy (see the right side
mostly - about Core):

http://docs.sqlalchemy.org/en/latest/

It is mostly "how-to" based with separate class reference. I do not
need to know all the switches/parameters/methods/whatever, I just want
to play to get the idea and then apply it to my data. Best thing is
commented examples or commented whole example simple work-flow.

> - centralized place (http://www.activewarehouse.info) which will need some
> SEO and links to rdoc etc

I do not think this should be a priority, as google yields pretty good
results for "ruby etl" ;-)

> - up to date READMEs on all gems
>
> *What I'd like to reach*
>
> I'd like a community-edited, up-to-date series of guides, properly
> advertised onhttp://www.activewarehouse.info.
Nice, inspiring guides.

> If you read this, what do you think? Please share any insight you'll have.
>

I think aw-etl has plenty of resources available, just needs good
"quick start" and 3-4 incremental primitive examples. Also think of
those who need ETL but do not know what ETL patterns are. "Why I
should use this instead of bunch of SQL scripts?"

Regards,

Stefan Urbanek
data analyst and data brewmaster

Twitter: @Stiivi
Home: http://stiivi.com
Brewery: http://databrewery.org
Github: https://github.com/Stiivi

Thibaut Barrère

unread,
Mar 13, 2012, 3:55:13 PM3/13/12
to activewareh...@googlegroups.com
Hi Stefan,

very good points - thanks a lot for your appreciated feedback!

The sample is definitely for advanced users, so I created https://github.com/activewarehouse/activewarehouse-etl/issues/76 to track down your suggestion on beginner level samples.

I know that in the case of ETL it can not be done with one file, but
anyway - as simple as possible. I've learned that from my tutorials
for Cubes - I thought they were simple enough and obvious. Yeah, sure,
for me and myself.

Oh I definitely agree and do realize this :)
 
http://docs.sqlalchemy.org/en/latest/

It is mostly "how-to" based with separate class reference. I do not
need to know all the switches/parameters/methods/whatever, I just want
to play to get the idea and then apply it to my data. Best thing is
commented examples or commented whole example simple work-flow.

I totally agree: I want to keep the existing "RDoc" (or similar) for the switches/parameters detail, and have more guide-ish stuff to help people discover the ETL.
 
> - centralized place (http://www.activewarehouse.info) which will need some
> SEO and links to rdoc etc

I do not think this should be a priority, as google yields pretty good
results for "ruby etl" ;-)

Yeah actually it's just that the website itself is not properly referenced and I want to ensure it will become the centralized place for newcomers. I just need to work on this a bit, not a big deal.
 
I think aw-etl has plenty of resources available, just needs good
"quick start" and 3-4 incremental primitive examples. Also think of
those who need ETL but do not know what ETL patterns are. "Why I
should use this instead of bunch of SQL scripts?"

Very good point, again: I came across a lot of people recently who only realized after a couple of years of practice that they were doing ETL work, and feel like they are discovering everything from the ground (like: why no SQL scripts?). They end up finding the Ralph Kimball ETL book (or similar) with plenty of patterns and have this "a-ha" moment :)

So part of the mission here (I think) is to ensure people willing to munge data in a code-centric approach will be able to realize there's this project able to help them (without necessarily knowing what ETL is beforehand).

Thanks a ton for the feedback! Really appreciated and this gives me food for thought.

Thibaut
--

Alexandre Mendes

unread,
Mar 17, 2012, 5:57:05 PM3/17/12
to activewareh...@googlegroups.com
Hi guys,  Here goes some ideas that I though about the documentation.

I think some people find activewarehouse-etl and others gens from google, it would be cool if every project at the first page send the user to the website http://www.activewarehouse.info/ that should have a link to the official documentation, so that guys can know if it reallyworks for their problems.

At the current documentation they can learn about how to install, I think there should have something about how to update gem  from a not realesed gem yet, like at my case that I were using the gem not released 1.0.0rc  and the one released was the 0.9.5.

Must of my problems were configuring the environment, I think put some links to expected errors that happens with the pre requisits

As Stefan said before short levels of exemples make it easy to understand...

Teach how to custom or rack the etl files I mean, what happens like... you can put here some ruby or rails code then you can do whatever you want.


Actually im working on my own project then I dont have to much time but in a while im going to send a lot of ex that you can do with source in and out with mysql and kind of exemples with querys..

Well, I can help with front end and the website http://www.activewarehouse.info/ if you trying do something can count on me.

Thats it I dont remember some thing bigger then that but I think it help...

A little problem :D

when I went to this link


the Processing text goes like (wrong version)

activewarehouse-etl (0.9.1) is being processed. You'll be redirected when the pages are built, it shouldn't take much longer.

Alexandre Mendes


-- Thibaut

--
You received this message because you are subscribed to the Google Groups "ActiveWarehouse Discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/activewarehouse-discuss/-/AFgUbJPjjZ4J.
To post to this group, send email to activewareh...@googlegroups.com.
To unsubscribe from this group, send email to activewarehouse-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/activewarehouse-discuss?hl=en.

Thibaut Barrère

unread,
Mar 18, 2012, 5:40:02 AM3/18/12
to activewareh...@googlegroups.com
Hi Alexandre!

thanks for your input, very good points indeed! So good that I added a "documentation" label.

All the issues tagged "documentation" will have to be solved before I publish the final 1.0.0; see: https://github.com/activewarehouse/activewarehouse-etl/issues?milestone=3&state=open

I think some people find activewarehouse-etl and others gens from google, it would be cool if every project at the first page send the user to the website http://www.activewarehouse.info/ that should have a link to the official documentation, so that guys can know if it reallyworks for their problems.

I asked for help on rubyforge and will set up a redirect to the new website. Follow this at: https://github.com/activewarehouse/activewarehouse-etl/issues/78 

At the current documentation they can learn about how to install, I think there should have something about how to update gem  from a not realesed gem yet, like at my case that I were using the gem not released 1.0.0rc  and the one released was the 0.9.5.
Must of my problems were configuring the environment, I think put some links to expected errors that happens with the pre requisits

Very good point, especially for people who don't use Ruby usually or don't have an existing Rails app to plug into. Will track this at https://github.com/activewarehouse/activewarehouse-etl/issues/79 (please add any suggestions you may come up with later on!)
 
As Stefan said before short levels of exemples make it easy to understand...
Teach how to custom or rack the etl files I mean, what happens like... you can put here some ruby or rails code then you can do whatever you want.
Actually im working on my own project then I dont have to much time but in a while im going to send a lot of ex that you can do with source in and out with mysql and kind of exemples with querys..

 
Well, I can help with front end and the website http://www.activewarehouse.info/ if you trying do something can count on me.

Will definitely ask for help :) So you know, the website code is hosted at:


(contact me before doing something significant because I have plans for better stuff here, we'd better sync first).

A little problem :D
when I went to this link
the Processing text goes like (wrong version)
activewarehouse-etl (0.9.1) is being processed. You'll be redirected when the pages are built, it shouldn't take much longer.

Very good point again: I opened this to track it:



(but not at rubydoc).

We need to ensure everything is up to date, maybe a minor issue somewhere.

Thanks for all your feedback, much appreciated!

Thibaut
--

opensourcechris

unread,
Mar 30, 2012, 9:52:04 PM3/30/12
to ActiveWarehouse Discuss
Documentation on installing the gem from the current github master is
becoming essential to me. Could you explain it?

Chris

On Mar 18, 5:40 am, Thibaut Barrère <thibaut.barr...@gmail.com> wrote:
> Hi Alexandre!
>
> thanks for your input, very good points indeed! So good that I added a
> "documentation" label.
>
> All the issues tagged "documentation" will have to be solved before I
> publish the final 1.0.0;
> see:https://github.com/activewarehouse/activewarehouse-etl/issues?milesto...
>
> I think some people find activewarehouse-etl and others gens from google,
>
> > it would be cool if every project at the first page send the user to the
> > websitehttp://www.activewarehouse.info/that should have a link to the
> > official documentation, so that guys can know if it reallyworks for their
> > problems.
>
> I asked for help on rubyforge and will set up a redirect to the new
> website. Follow this at:https://github.com/activewarehouse/activewarehouse-etl/issues/78
>
> At the current documentation they can learn about how to install, I think
>
> > there should have something about how to update gem  from a not realesed
> > gem yet, like at my case that I were using the gem not released 1.0.0rc
> >  and the one released was the 0.9.5.
> > Must of my problems were configuring the environment, I think put some
> > links to expected errors that happens with the pre requisits
>
> Very good point, especially for people who don't use Ruby usually or don't
> have an existing Rails app to plug into. Will track this athttps://github.com/activewarehouse/activewarehouse-etl/issues/79(please
> add any suggestions you may come up with later on!)
>
> > As Stefan said before short levels of exemples make it easy to
> > understand...
> > Teach how to custom or rack the etl files I mean, what happens like... you
> > can put here some ruby or rails code then you can do whatever you want.
> > Actually im working on my own project then I dont have to much time but in
> > a while im going to send a lot of ex that you can do with source in and out
> > with mysql and kind of exemples with querys..
>
> Sure! Please pour any suggestions athttps://github.com/activewarehouse/activewarehouse-etl/issues/76
>
> > Well, I can help with front end and the website
> >http://www.activewarehouse.info/if you trying do something can count on

Thibaut Barrère

unread,
Mar 31, 2012, 2:51:09 AM3/31/12
to activewareh...@googlegroups.com
Hi Chris!

(sidenote: did you see my reply to your other question here: https://github.com/activewarehouse/activewarehouse-etl/issues/84 ? let me know if it worked for you or not)

Documentation on installing the gem from the current github master is
becoming essential to me. Could you explain it?

I will explain right away, absolutely :)

But let me ask first - which platform are you on? Windows, Mac, Linux? (Windows may require a bit of tweaking but I can still help, as I deploy on all platforms).

1/ the recommended way of using the current github master is to use bundler (http://gembundler.com/) and its git support.

gem 'activewarehouse-etl', :git => "git://github.com/activewarehouse/activewarehouse-etl.git"

You can specify a given commit or a branch to avoid getting unreviewed updates.

Be sure to have a look at the activewarehouse-etl-sample which uses bundler, too.

Once you have bundler installed, don't forget to use "bundle exec etl" instead of "etl" to run the command.

2/ alternatively, you can also work without bundler but this is a bit more work

This will require you to check out the source from git then package the gem and install it from there.

Can you try out bundler first and let me know if it doesn't work for you? (I think it's usually easier to do).

Thibaut
--

Thibaut Barrère

unread,
Mar 31, 2012, 3:53:40 AM3/31/12
to activewareh...@googlegroups.com
Hi again,

my previous answer assumed that you already know how to install a "release candidate", but in case you don't:

http://rubygems.org/gems/activewarehouse-etl/versions/1.0.0.rc1

gem install activewarehouse-etl --pre

the differences between master and 1.0.0.rc1 are somewhat small:
- new support for on_error callbacks
- fixes for the mysql streaming mode

This is even simpler to install in this case :)

Thibaut
--

Chris Roberts

unread,
Mar 31, 2012, 4:00:53 PM3/31/12
to activewareh...@googlegroups.com
Thibaut, Thank you for your replies, while I have not had time to sit down and go through them yet.

My initial thought on removing .ctl all together from the filename is no, I like the designation between batch and ctl files.

I would also extend the .rb to batch files:

Control file example: control_file.ctl.rb
batch file example: batch_file.ebf.rb


I am using OS X for development and Ubuntu 10.04 for prod deployment. I am going to document my issues and suggestions for documentation once I get the current master set up and running on my dev machine. I'm excited, it will be my first open source contribution, even though it is just some documentation, I suspect if activewarehouse etl works for me I will be trying to contribute to the actual code base, at first probably in the form of some custom transformations i'm working on.

If there are any code related changes that you want or need some help with I am looking for some direction on what to work on for active warehouse etl.

Chris

--
You received this message because you are subscribed to the Google Groups "ActiveWarehouse Discuss" group.

To post to this group, send email to activewareh...@googlegroups.com.
To unsubscribe from this group, send email to activewarehouse-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/activewarehouse-discuss?hl=en.



--
CR

Thibaut Barrère

unread,
Apr 1, 2012, 11:21:02 AM4/1/12
to activewareh...@googlegroups.com
Thibaut, Thank you for your replies, while I have not had time to sit down and go through them yet.

Sure no worries :) Each one his own pace.
 
My initial thought on removing .ctl all together from the filename is no, I like the designation between batch and ctl files.
I would also extend the .rb to batch files:
Control file example: control_file.ctl.rb
batch file example: batch_file.ebf.rb

I've updated issue https://github.com/activewarehouse/activewarehouse-etl/issues/86 with this - sounds like a good idea. I struggled myself a bit with SublimeText at first to get the coloring right, here.

Let me know if that's something you want to tackle, I'll give you more help (ie: how to run tests etc, recommendations on working on a separate feature branch etc).
 
I am using OS X for development and Ubuntu 10.04 for prod deployment. I am going to document my issues and suggestions for documentation once I get the current master set up and running on my dev machine. I'm excited, it will be my first open source contribution, even though it is just some documentation, I suspect if activewarehouse etl works for me I will be trying to contribute to the actual code base, at first probably in the form of some custom transformations i'm working on.
If there are any code related changes that you want or need some help with I am looking for some direction on what to work on for active warehouse etl.

That's great to hear!

Well let me be clear: by far, the need for contributions is on documentation and helping newcomers get started (much more important than custom transforms or fixes).

Here are my 2 immediate suggestions:
- write down each little or larger hurdle, thing that make you think twice etc as you go: it's the most precious thing you can bring to this project currently
- join the conversation on my next email to the list, where I'll suggest a plan for the documentation before we start writing down those guides
- contribute on the docs once we'll have the plan set right

What do you think?

Thibaut
--

Thibaut Barrère

unread,
Apr 3, 2012, 10:43:41 AM4/3/12
to Chris Roberts, ActiveWarehouse Discuss
Hi Chris,
 
(responding to the list in case it helps someone else, or anybody wants to chime in)

I've installed the gem to my computer using 'gem install activewarehouse-etl --pre' from your most recent instruction, however, the first instructions seem like they are for rails.
When I am using bundler to run etl, do I have to start the rails server or is being in the rails app root enough?

You don't need to start the Rails server to run the scripts: if you have a Rails app, you should be able to start the etl command from the root, or from a subfolder if you want to organize things under a subfolder (I do that often, something like "etl" or "db/etl" depending on the projects).

A common pattern is to stick a "common.rb" file there too, with ETL-related code you'd like to share between the various control files (like I did there - not a rails app though).

I haven't played with a Rails app based recently though (I create standalone ETL projects instead, but because there isn't a companion Rails app, otherwise it makes sense to keep it at the same place): you will likely have to require the app environment from "common.rb".

To load the Rails environment you will have to do something along the lines of:

    require File.dirname(__FILE__) + "/../config/application"
    Rails.application.require_environment!

Hope this helps, and let us know if you're stuck - the "getting started" part really needs a guide for all you're going through.

Thibaut
--

Chris Roberts

unread,
Apr 3, 2012, 11:42:50 AM4/3/12
to Thibaut Barrère, ActiveWarehouse Discuss
This is what I have done so far:

  1. 'rails new etltester -d mysql'
  2. gemfile added: 'gem 'activewarehouse-etl', :git => "git://github.com/activewarehouse/activewarehouse-etl.git"'
    1. Question: when I push my acceptable file-name changes (I think I have them figured out) to my fork of aw-etl will my gemfile line be::git => "git://github.com/chrisgogreen/activewarehouse-etl.git"?
  3. created folder 'etl' in rails root resulting in etl directory path of 'etltester/etl'
  4. placed two files: 'megamillions.csv' and 'megamillions.ctl' in directory 'etltester/etl'
  5. added this database configuration to the rails generated database.yml
    etl_execution:
      adapter: mysql2
      encoding: utf8
      reconnect: false
      database: etl_execution
      pool: 5
      username: root
      password: secret
      host: localhost
  6. From the root or the rails app i run 'bundle exec etl etl/megamillions.ctl'
  7. Here is where I get an access denied for root user, I verified my password in config/database.yml is correct.

I tried this on Windows, so for now I will chalk step 7 error up to windows. But I wanted to document my procedure and get your feedback on if my steps are correct for a rails installation, this can serve as the working instructions for Rails installation.



Chris

--
CR

Thibaut Barrère

unread,
Apr 4, 2012, 4:26:41 PM4/4/12
to Chris Roberts, ActiveWarehouse Discuss
Hi Chris,
  1. 'rails new etltester -d mysql'
  2. gemfile added: 'gem 'activewarehouse-etl', :git => "git://github.com/activewarehouse/activewarehouse-etl.git"'
    1. Question: when I push my acceptable file-name changes (I think I have them figured out) to my fork of aw-etl will my gemfile line be::git => "git://github.com/chrisgogreen/activewarehouse-etl.git"?
  3. created folder 'etl' in rails root resulting in etl directory path of 'etltester/etl'
  4. placed two files: 'megamillions.csv' and 'megamillions.ctl' in directory 'etltester/etl'
  5. added this database configuration to the rails generated database.yml
    etl_execution:
      adapter: mysql2
      encoding: utf8
      reconnect: false
      database: etl_execution
      pool: 5
      username: root
      password: secret
      host: localhost
  6. From the root or the rails app i run 'bundle exec etl etl/megamillions.ctl'
  7. Here is where I get an access denied for root user, I verified my password in config/database.yml is correct.

I tried this on Windows, so for now I will chalk step 7 error up to windows. But I wanted to document my procedure and get your feedback on if my steps are correct for a rails installation, this can serve as the working instructions for Rails installation.

Thanks a lot for writing this down.

I think you will have to specify another database section in your database.yml - this one is to store jobs informations and must be kept separate (but you have to have it already).

Let me know later if you're stuck with this, I'll try to help.

Thibaut
--

stellard

unread,
Apr 5, 2012, 12:26:22 PM4/5/12
to ActiveWarehouse Discuss
Hey Thibaut,

Just saw this the other day and thought it might intrest you

https://github.com/blog/1081-instantly-beautiful-project-pages

Cheers!
-scott

On Mar 9, 10:05 am, Thibaut Barrère <thibaut.barr...@gmail.com> wrote:
> Hi folks,
>
> I'd like to pick your brains on this: now that 1.0.0.rc1 has been shipped,
> my current goal is to ensure we can provide a decent way for people to
> learn about how to use activewarehouse-etl (other gems will come later once
> up to date).
>
> I won't ship the final 1.0.0 until we have something fine for new users,
> because I won't be able to handle the load otherwise!
>
> *Questions to newcomers*
>
> Alexandre and others: what have been (what are) your pain points when
> getting started? What is confusing? Any remark will be very useful (please
> dump your brain :-).
>
> Once you get started it's fairly easy to use (and even extend) IMO, but
> getting started is not easy; I know some pain points already but please
> highlight the obvious!
>
> *The bare minimum*
>
> Apart from the etl-sample<https://github.com/activewarehouse/activewarehouse-etl-sample>,
> I plan this bare minimum at least:
> - having an online and offline up-to-date rdoc (or similar) to be used by
> the experienced user as a reference
> - centralized place (http://www.activewarehouse.info) which will need some
> SEO and links to rdoc etc
> - up to date READMEs on all gems
>
> *What I'd like to reach*
>
> I'd like a community-edited, up-to-date series of guides, properly
> advertised onhttp://www.activewarehouse.info.
>
> I find the following websites inspiring:
> -http://freelancing-god.github.com/ts/en/
> -http://guides.rubyonrails.org/
> -http://vagrantup.com/docs/index.html

Thibaut Barrère

unread,
Apr 6, 2012, 12:18:23 PM4/6/12
to activewareh...@googlegroups.com
Hi Scott!


> Just saw this the other day and thought it might intrest you
https://github.com/blog/1081-instantly-beautiful-project-pages

The site is now on Heroku but I may steal a theme there, yes! Thanks!

Thibaut
--


On Thu, Apr 5, 2012 at 6:26 PM, stellard <sc...@kujilabs.com> wrote:
Hey Thibaut,
--
You received this message because you are subscribed to the Google Groups "ActiveWarehouse Discuss" group.
Reply all
Reply to author
Forward
0 new messages