Hi all,
Unless I missed something, this is a historic first email in the
Dremel mailing list!!!
To start developing Dremel code one needs to perform steps described
in following wiki page:
http://code.google.com/p/dremel/wiki/SettingUpTheDevEnv
Please feel free to edit it and enrich it with more information for
easier install for the next folks. If you have trouble setting up the
dev. environment feel free to ask in this mailing list for help.
Next, what tasks are available now for immediate engagement with code
base?
1. CONSOLE: Finding good console framework for argument parsing and
other stuff (ask me for advice in this mailing list) and using it to
implement nice console interface for the Dremel. Supporting
interactive mode as well as running scripts from files and a rich
argument/options support in unix best traditions. Writing a junit that
runs the console program and checks if it works correctly. Nice
beginner task. In the future it will get more complicated with
managing running several JVMs and etc...
2. PARSER: first learning ANTLR library and accompanied book. Also
AntlrWorks application is highly recommended. One task is to complete
AST (abstract syntax tree) conversion to semantic model (Query &
Expression classes). It is a great task to begin. Whoever taking it,
ask me for more information in this mailing list but in separate
mailing thread (different email subject). There are also a lot to be
improved regarding giving the caller more descriptive errors if syntax
is incorrect. ANTLR has build it facilities, one need to find them and
configure and use properly and of course write those friendly error
messages. Also as part of parser it is needed to resolve column
references in queries and etc.. So if you like parsing this is
excellent task. Leonid, you worked recently with SQL parsing do you
want to take it?
3. QUERY_ENGINE: Help here is most wanted. The bulk of core engine
code is now in one single file Drec.java (Dremel RECords) and its test
AvroTest.java. One must make himself familiar with Apache Avro
framework first (Documentation on Avro site and Dremel code
(AvroTest.java) can be used for that). Particularly Avro generic
interface. After that the most of work is to refactor functions
process/copy/importFromQuery in WriterFacade class. That is really
complicated and very intricate logic part, for those who miss a
challenge in life.
4. TEST_SUITE: The test suite is essentially a collection of all/most
possible test-cases. Each test-case is a set of files. First file
contains a BQL query, second file contains Schema for original
dataset, third file contains the original dataset itself, fourth file
contains the expected resultant schema, and fifth file contains the
expected resultant dataset. The the test justs iterated all the test-
cases and for each runs the query against original dataset and then
compares the actual results with expected results.
5. WEBSERVICE implementation. The goal of this task is to implement a
compatible web-service to Google BigQuery and of course accompanying
it with proper tests. This is in no way urgent task, but Metaxa will
not be fully completed without it. As usual if you like this task ask
em for more details so I can hand you over what is already done.
God bless open source,
Camuel.