Hi folks,
quick reply before going to bed: first thanks for your replies :)
Giovanni: my first idea was to split the docs in two parts: the guides for which I proposed a plan above (high level, concepts - this would describe the high level of E, T and L too, but without getting into too much details), and the
API doc aka RDoc (which would give an up-to-date list of all the actual available source, destinations, transforms, with all the parameters etc).
=> do you mean for instance the guides E, T and L section could lead to the API detailed documentations? That's a great idea, so the user could dive in progressively as needed, straight from the high level doc... Let me know what you think!
Chris, hopefully adressing all your points:
> A general requirement: for each feature or type of that feature include a commented code sample with source, transformations and destination.
Very good idea; I think this would probably belong to the
API/RDoc part which is usually written as comments in the code directly. We should make sure that each actual piece of code has an example of use.
> Managing and scheduling jobs, How is this done now? with a gem like delayed job?
There is no standardized way to manage and launch jobs in aw-etl. Some people use cron, some use other specific schedulers, some on windows use system tasks etc. It's up to the developer to do that currently.
=> do you think having a specific guide on this would be interesting? Ie: how to manage, schedule jobs, how to provide some web reporting, emails on exit, common patterns etc?
> When contributing a specific piece of documentation, what are the steps?
> It looks like the wiki is it's own repository but I can't see how to create my own fork. Could someone explain the contribution steps?
The docs are currently in a very bad shape IMO.
You have two things:
- the wiki, which I'd like to replace by the "guides" mentioned above which will directly be used to generate the website
The thing is that the wiki is largely out of date and not very structured, and the RDoc quality is not very good.
Most of the time the best documentation is to read the code, currently: this is what I want to change.
So the current doc contributions guidelines are:
- direct editing of the wiki
- changes in the RDoc and pull-requests
But I'd rather tell you to wait until:
- we have a proper plan and a doc structure (I'll work on this this week), after that you'll just have to fork and pull-request on the guides section ("docs" branch to be created)
- I have a better idea of how the current RDoc is generated etc
Hopefully this week we will have more structure where we can all contribute - in the mean time please write down your issues somewhere on your side, or edit the wiki if you meet small issues.
> *An example of the Decode Transformation table - Column Name requirements, number of columns?
I think it's:
key0:value0
key1:value1
etc
> Multiple input sources but with different input schemas(this might be the built in resolver?).
There is no doc for such a thing IMO but it works fairly well (like often actually :-); you can do it at least two ways:
- first, declare all the sources in the same control file then use the Engine.current_source property to apply processing conditionally
after_read do |row|
# here you can actually access Engine.current_source and apply different processing based on this -
row
end
- second, use several control files, the first ones to transform the data into something homogeneous with same fields and dump this into csv files (one for each kind of format), then another control file to use those (now similarly formatted) files as sources and apply a common processing.
This wasn't a quick reply after all.
Let me know if you have more suggestions!