Call for contributions: what do you want in the guides?

Thibaut Barrère

unread,

Apr 1, 2012, 11:34:13 AM4/1/12

to activewareh...@googlegroups.com

Hi folks,

I need your help to figure out a rough plan of what we're going to put in the new guides (à la vagrant: http://vagrantup.com/docs/index.html).

Here's a couple of raw ideas without a specific order - can you brain dump what you have in mind?

- getting started: how to install/setup

- a hello world: transforming a sample csv into another csv

- a more advanced hello world: loading data into activerecord with migrations, upsert etc on 2 tables maybe

- the control file lifecycle explained (what happens when a file is loaded, declarations of sources, transforms, before_read/after_write, screens, error handlers...)

- the engine lifecycle explained (how rows are processed under the hood once the control file is loaded)

- activewarehouse-etl for non rubyists (ie: explanations about ruby, bundler, gems etc)

- extending activewarehouse-etl: how to create custom sources, destinations, transforms...

- a chapter on real-life use-cases

I would really stick to something like Vagrant (see https://github.com/mitchellh/vagrant/tree/docs for the code).

We could also join:

- code samples

- contributed transforms etc in a separate repo

Voilà - what do you think?

Any idea is most welcome - time to create real guides already.

Thibaut

--

http://www.logeek.fr

Giovanni Messina

unread,

Apr 1, 2012, 12:11:30 PM4/1/12

to activewareh...@googlegroups.com

Hi all!

I think for the structure and readability of the guides could always use a breakdown in the three key concepts: Extraction, Transformation and Loading, and in the two intermediate steps: E => T and T => L.

It should become more clear what each component does and what time.

As the level of detail increases, the details increases and decreases the "magic".

In particular, as this would help to add documentation for the critical steps

my 2 cents

Gio

2012/4/1 Thibaut Barrère <thibaut...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "ActiveWarehouse Discuss" group.
To view this discussion on the web visit https://groups.google.com/d/msg/activewarehouse-discuss/-/IbzauQX5KPMJ.
To post to this group, send email to activewareh...@googlegroups.com.
To unsubscribe from this group, send email to activewarehouse-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/activewarehouse-discuss?hl=en.

Chris Roberts

unread,

Apr 1, 2012, 12:26:22 PM4/1/12

to activewareh...@googlegroups.com

A general requirement: for each feature or type of that feature include a commented code sample with source, transformations and destination.
Managing and scheduling jobs, How is this done now? with a gem like delayed job?

--
CR

Chris Roberts

unread,

Apr 1, 2012, 12:42:10 PM4/1/12

to activewareh...@googlegroups.com

When contributing a specific piece of documentation, what are the steps?

I found this page: https://github.com/activewarehouse/activewarehouse-etl/wiki/_access

It looks like the wiki is it's own repository but I can't see how to create my own fork. Could someone explain the contribution steps?

Chris

--
CR

Chris Roberts

unread,

Apr 1, 2012, 2:09:43 PM4/1/12

to activewareh...@googlegroups.com

Another documentation request:

*An example of the Decode Transformation table - Column Name requirements, number of columns?

--
CR

Chris Roberts

unread,

Apr 1, 2012, 2:13:28 PM4/1/12

to activewareh...@googlegroups.com

And Another:

Multiple input sources but with different input schemas(this might be the built in resolver?).

--
CR

Thibaut Barrère

unread,

Apr 1, 2012, 4:52:07 PM4/1/12

to activewareh...@googlegroups.com

Hi folks,

quick reply before going to bed: first thanks for your replies :)

Giovanni: my first idea was to split the docs in two parts: the guides for which I proposed a plan above (high level, concepts - this would describe the high level of E, T and L too, but without getting into too much details), and the API doc aka RDoc (which would give an up-to-date list of all the actual available source, destinations, transforms, with all the parameters etc).

=> do you mean for instance the guides E, T and L section could lead to the API detailed documentations? That's a great idea, so the user could dive in progressively as needed, straight from the high level doc... Let me know what you think!

Chris, hopefully adressing all your points:

> A general requirement: for each feature or type of that feature include a commented code sample with source, transformations and destination.

Very good idea; I think this would probably belong to the API/RDoc part which is usually written as comments in the code directly. We should make sure that each actual piece of code has an example of use.

> Managing and scheduling jobs, How is this done now? with a gem like delayed job?

There is no standardized way to manage and launch jobs in aw-etl. Some people use cron, some use other specific schedulers, some on windows use system tasks etc. It's up to the developer to do that currently.

=> do you think having a specific guide on this would be interesting? Ie: how to manage, schedule jobs, how to provide some web reporting, emails on exit, common patterns etc?

> When contributing a specific piece of documentation, what are the steps?

> I found this page: https://github.com/activewarehouse/activewarehouse-etl/wiki/_access

> It looks like the wiki is it's own repository but I can't see how to create my own fork. Could someone explain the contribution steps?

The docs are currently in a very bad shape IMO.

You have two things:

- the wiki, which I'd like to replace by the "guides" mentioned above which will directly be used to generate the website

- the rdoc/api doc, generated from comments in the code (see eg here)

The thing is that the wiki is largely out of date and not very structured, and the RDoc quality is not very good.

Most of the time the best documentation is to read the code, currently: this is what I want to change.

So the current doc contributions guidelines are:

- direct editing of the wiki

- changes in the RDoc and pull-requests

But I'd rather tell you to wait until:

- we have a proper plan and a doc structure (I'll work on this this week), after that you'll just have to fork and pull-request on the guides section ("docs" branch to be created)

- I have a better idea of how the current RDoc is generated etc

Hopefully this week we will have more structure where we can all contribute - in the mean time please write down your issues somewhere on your side, or edit the wiki if you meet small issues.

> *An example of the Decode Transformation table - Column Name requirements, number of columns?

I think it's:

key0:value0

key1:value1

etc

see https://github.com/activewarehouse/activewarehouse-etl/blob/master/lib/etl/transform/decode_transform.rb to be sure, until we can provide a better doc for that!

> Multiple input sources but with different input schemas(this might be the built in resolver?).

There is no doc for such a thing IMO but it works fairly well (like often actually :-); you can do it at least two ways:

- first, declare all the sources in the same control file then use the Engine.current_source property to apply processing conditionally

after_read do |row|

# here you can actually access Engine.current_source and apply different processing based on this -

row

end

- second, use several control files, the first ones to transform the data into something homogeneous with same fields and dump this into csv files (one for each kind of format), then another control file to use those (now similarly formatted) files as sources and apply a common processing.

This wasn't a quick reply after all.

Let me know if you have more suggestions!

Thibaut

--

http://www.logeek.fr

Reply all

Reply to author

Forward