Fwd: [VOTE] Accept Optiq into the incubator

43 views
Skip to first unread message

Julian Hyde

unread,
May 19, 2014, 7:35:48 PM5/19/14
to opti...@googlegroups.com, hash...@apache.org, tdun...@apache.org, ga...@apache.org, ste...@apache.org
Great news! 

Apache Incubator PMC has voted to accept Optiq. We are now an Apache incubator project. 

While in the incubator, there are some tasks to achieve (such as moving development to Apache git, moving email discussions to an Apache list, and checking the status of the IP). But just as important, we need to demonstrate that we, as a community, can function as an open meritocracy, just like any Apache project. And, I’d like to start bringing on new committers. More about that soon.

Being accepted into the incubator is a vote of confidence in the unique capabilities of Optiq, and also a recognition of the strength of the developer community. You should all be proud.

During incubation process, our champion is Ashutosh Chahan, and our mentors are Ted Dunning, Alan Gates, and Steven Noels. I’d like to welcome them to the Optiq community.

Julian



Begin forwarded message:

From: Ashutosh Chauhan <hash...@apache.org>
Subject: Re: [VOTE] Accept Optiq into the incubator
Date: May 19, 2014 at 4:09:03 PM PDT

With 6 +1s vote passes. Thanks everyone for taking time to vote. Vote
thread is now closed. I will proceed with next steps now.

Thanks,
Ashutosh


On Mon, May 12, 2014 at 12:53 PM, Suresh Srinivas <sur...@hortonworks.com>wrote:

+1 (binding)


On Fri, May 9, 2014 at 11:03 AM, Ashutosh Chauhan <hash...@apache.org
wrote:

Based on the results of the discussion thread (


http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E
),  I would like to call a vote on accepting Optiq into the incubator.

[ ] +1 Accept Optiq into the Incubator
[ ] +0 Indifferent to the acceptance of Stratosphere
[ ] -1 Do not accept Optiq because ...

The vote will be open until Tuesday May 13 18:00 UTC.

https://wiki.apache.org/incubator/OptiqProposal

= Optiq =
== Abstract ==

Optiq is a framework that allows efficient translation of queries
involving
heterogeneous and federated data.

== Proposal ==

Optiq is a highly customizable engine for parsing and planning queries on
data in a wide variety of formats. It allows database-like access, and in
particular a SQL interface and advanced query optimization, for data not
residing in a traditional database.

== Background ==

Databases were traditionally engineered in a monolithic stack, providing
a
data storage format, data processing algorithms, query parser, query
planner, built-in functions, metadata repository and connectivity layer.
They innovate in some areas but rarely in all.

Modern data management systems are decomposing that stack into separate
components, separating data, processing engine, metadata, and query
language support. They are highly heterogeneous, with data in multiple
locations and formats, caching and redundant data, different workloads,
and
processing occurring in different engines.

Query planning (sometimes called query optimization) has always been a
key
function of a DBMS, because it allows the implementors to introduce new
query-processing algorithms, and allows data administrators to
re-organize
the data without affecting applications built on that data. In a
componentized system, the query planner integrates the components (data
formats, engines, algorithms) without introducing unncessary coupling or
performance tradeoffs.

But building a query planner is hard; many systems muddle along without a
planner, and indeed a SQL interface, until the demand from their
customers
is overwhelming.

There is an opportunity to make this process more efficient by creating a
re-usable framework.

== Rationale ==

Optiq allows database-like access, and in particular a SQL interface and
advanced query optimization, for data not residing in a traditional
database. It is complementary to many current Hadoop and NoSQL systems,
which have innovative and performant storage and runtime systems but
lack a
SQL interface and intelligent query translation.

Optiq is already in use by several projects, including Apache Drill,
Apache
Hive and Cascading Lingual, and commercial products.

Optiq's architecture consists of:

An extensible relational algebra.
* SPIs (service-provider interfaces) for metadata (schemas and tables),
planner rules, statistics, cost-estimates, user-defined functions.
* Built-in sets of rules for logical transformations and common
data-sources.
* Two query planning engines driven by rules, statistics, etc. One
engine
is cost-based, the other rule-based.
* Optional SQL parser, validator and translator to relational algebra.
* Optional JDBC driver.

== Initial Goals ==

The initial goals are be to move the existing codebase to Apache and
integrate with the Apache development process. Once this is accomplished,
we plan for incremental development and releases that follow the Apache
guidelines.

As we move the code into the org.apache namespace, we will restructure
components as necessary to allow clients to use just the components of
Optiq that they need.

A version 1.0 release, including pre-built binaries, will foster wider
adoption.

== Current Status ==

Optiq has had over a dozen minor releases over the last 18 months. Its
core
SQL parser and validator, and its planning engine and core rules, are
mature and robust and are the basis for several production systems; but
other components and SPIs are still undergoing rapid evolution.

=== Meritocracy ===

We plan to invest in supporting a meritocracy. We will discuss the
requirements in an open forum. We encourage the companies and projects
using Optiq to discuss their requirements in an open forum and to
participate in development. We will encourage and monitor community
participation so that privileges can be extended to those that
contribute.

Optiq's pluggable architecture encourages developers to contribute
extensions such as adapters for data sources, new planning rules, and
better statistics and cost-estimation functions. We look forward to
fostering a rich ecosystem of extensions.

=== Community ===

Building a data management system requires a high degree of technical
skill, and correspondingly, the community of developers directly using
Optiq is potentially fairly small, albeit highly technical and engaged.
But
we also expect engagement from members of the communities of projects
that
use Optiq, such as Drill and Hive. And we intend to structure Optiq so
that
it can be used for lighter weight applications, such as providing a SQL
and
JDBC interface to a NoSQL system.

=== Core Developers ===

The developers on the initial committers list are all experienced open
source developers, and are actively using Optiq in their projects.

* Julian Hyde is lead developer of Mondrian, an open source OLAP engine,
and an Apache Drill committer.
* Chris Wensel is lead developer of Cascading, and of Lingual, the SQL
interface to Cascading built using Optiq.
* Jacques Nadeau is lead developer of Apache Drill, which uses Optiq.

In addition, there are several regular contributors whom we hope will
graduate to committers during the incubation process.

We realize that additional employer diversity is needed, and we will work
aggressively to recruit developers from additional companies.

=== Alignment ===

Apache, and in particular the ecosystem surrounding Hadoop, contains
several projects for building data management systems that leverage each
other's capabilities. Optiq is a natural fit for that ecosystem, and will
help foster projects meeting new challenges.

Optiq is already used by Apache Hive and Apache Drill; Optiq embeds
Apache
Spark as an optional engine; we are in discussion with Apache Phoenix
about
integrating JDBC and query planning.

== Known Risks ==

=== Orphaned Products ===

Optiq is already a key component in three independent projects, each
backed
by a different company, so the risk of being orphaned is relatively low.
We
plan to mitigate this risk by recruiting additional committers, and
promoting Optiq's adoption as a framework by other projects.

=== Inexperience with Open Source ===

The initial committers are all Apache members, some of whom have several
years in the Apache Hadoop community. The founder of the project, Julian
Hyde, has been a founder and key developer in open source projects for
over
ten years.

=== Homogenous Developers ===

The initial committers are employed by a number of companies, including
Concurrent, Hortonworks, MapR Technologies and Salesforce.com. We are
committed to recruiting additional committers from outside these
companies.

=== Reliance on Salaried Developers ===

Like most open source projects, Optiq receives substantial support from
salaried developers. This is to be expected given that it is a highly
technical framework. However, they are all passionate about the project,
and we are confident that the project will continue even if no salaried
developers contribute to the project. As a framework, the project
encourages the involvement of members of other projects, and of academic
researchers. We are committed to recruiting additional committers
including
non-salaried developers.

=== Relationships with Other Apache Products ===

As mentioned in the Alignment section, Optiq is being used by Apache Hive
and Apache Drill, and has adapters for Apache Phoenix and Apache Spark.
Optiq often operates on data in a Hadoop environment, so collaboration
with
other Hadoop projects is desirable and highly likely.

Unsurprisingly there is some overlap in capabilities between Optiq and
other Apache projects. Several projects that are databases or
database-like
have query-planning capabilities. These include Hive, Drill, Phoenix,
Spark, Apache Derby, Apache Pig, Apache Jena and Apache Tajo. Optiq's
query
planner is extensible at run time, and does not have a preferred runtime
engine on which to execute compiled queries. These capabilities, and the
large corpus of pre-built rules, are what allow Optiq to be embedded in
other projects.

Several other Apache projects access third-party data sources, including
Hive, Pig, Drill, Spark and Apache MetaModel. Optiq allows users to
optimize access to third-party data sources by writing rules to push
processing down to the data source, and provide a cost model to choose
the
optimal location for processing. That said, maintaining a library of
adapters is burdensome, and so it would make sense to collaborate with
other projects on adapter libraries, and re-use libraries where possible.

Optiq supports several front ends for submitting queries. The most
popular
is SQL, with driver connectivity via JDBC (and ODBC planned). Other
Apache
projects with a SQL parser include Hive, Spark, Phoenix, Derby, Tajo.
Drill
uses Optiq's parser and JDBC stack; both Phoenix and Drill have expressed
interest in collaborating on JDBC and ODBC. Optiq's Linq4j API is similar
to the fluent query-builder APIs in Spark and MetaModel. Use of a front
end
is not required; for instance, Hive integrates with Optiq by directly
building a graph of RelNode objects.

=== An Excessive Fascination with the Apache Brand ===

Optiq solves a real problem, as evidenced by its take-up by other
projects.
This proposal is not for the purpose of generating publicity. Rather, the
primary benefits to joining Apache are those outlined in the Rationale
section.

== Documentation ==

Additional documentation for Optiq may be found on its github site:

* [[https://github.com/julianhyde/optiq/blob/master/README.md|Overview
]]
* [[
https://github.com/julianhyde/optiq-csv/blob/master/TUTORIAL.md|Tutorial
]]
* [[https://github.com/julianhyde/optiq/blob/master/HOWTO.md|HOWTO]]
* [[

https://github.com/julianhyde/optiq/blob/master/REFERENCE.md|Referenceguide
]]

==== Presentation ====

*[[


https://github.com/julianhyde/share/blob/master/slides/optiq-richrelevance-2013.pdf?raw=true|
SQL on Big Data using Optiq]]
== Initial Source ==

The initial code codebase resides in three projects, all hosted on
github:

* https://github.com/julianhyde/optiq
* https://github.com/julianhyde/optiq-csv
* https://github.com/julianhyde/linq4j

=== Source and Intellectual Property Submission Plan ===

The initial codebase is already distributed under the Apache 2.0 License.
The owners of the IP have indicated willingness to sign the SGA.

=== External Dependencies ===

Optiq and Linq4j have the following external dependencies.

* Java 1.6, 1.7 or 1.8
* Apache Maven, Commons
* JavaCC (BSD license)
* Sqlline 1.1.6 (BSD license)
* Junit 4.11 (EPL)
* Janino (BSD license)
* Guava (Apache 2.0 license)
* Eigenbase-resgen, eigenbase-xom, eigenbase-properties (Apache 2.0
license)

Some of Optiq's adapters (optiq-csv, optiq-mongodb, optiq-spark,
optiq-splunk) are currently developed alongside core Optiq, and have the
following additional dependencies:

* Open CSV 2.3 (Apache 2.0 license)
* Apache Incubator Spark
* Mongo Java driver (Apache 2.0 license)
Upon acceptance to the incubator, we would begin a thorough analysis of
all
transitive dependencies to verify this information and introduce license
checking into the build and release process by integrating with Apache
Rat.

=== Cryptography ===

Optiq will eventually support encryption on the wire. This is not one of
the initial goals, and we do not expect Optiq to be a controlled export
item due to the use of encryption.

== Required Resources ==

=== Mailing Lists ===

* pri...@optiq.incubator.apache.org
* d...@optiq.incubator.apache.org (will be migrated from
opti...@googlegroups.com)
* com...@optiq.incubator.apache.org

=== Source control ===

The Optiq team would like to use git for source control, due to our
current
use of git/github. We request a writeable git repo git://
git.apache.org/incubator-optiq, and mirroring to be set up to github
through INFRA.

=== Issue Tracking ===

Optiq currently uses the github issue tracking system associated with its
github repo: https://github.com/julianhyde/optiq/issues. We will migrate
to
the Apache JIRA: http://issues.apache.org/jira/browse/OPTIQ.

== Initial Committers ==

* Julian Hyde (jhyde at apache dot org)
* Jacques Nadeau (jacques at apache dot org)
* James R. Taylor (jamestaylor at apache dot org)
* Chris Wensel (cwensel at apache dot org)

=== Affiliations ===

The initial committers are employees of Concurrent, Hortonworks, MapR and
Salesforce.com.

* Julian Hyde (Hortonworks)
* Jacques Nadeau (MapR Technologies)
* James R. Taylor (Salesforce.com)
* Chris Wensel (Concurrent)

== Sponsors ==

=== Champion ===

* Ashutosh Chauhan (hashutosh at apache dot org)

=== Nominated Mentors ===

* Ted Dunning (tdunning at apache dot org) - Chief Application Architect
at MapR Technologies; committer for Lucene, Mahout and ZooKeeper.
* Alan Gates (gates at apache dot org) - Architect at Hortonworks;
committer for Pig, Hive and others.
* Steven Noels (stevenn at apache dot org) - Chief Technical Officer at
NGDATA; committer for Cocoon and Forrest, mentor for Phoenix.

=== Sponsoring Entity ===

The Apache Incubator.




--
http://hortonworks.com/download/

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.


Steven Noels

unread,
May 20, 2014, 2:42:47 AM5/20/14
to opti...@googlegroups.com
On Mon, May 19, 2014, at 04:35 PM, Julian Hyde wrote:
Great news! 
 
Apache Incubator PMC has voted to accept Optiq. We are now an Apache incubator project. 
 
Good on you, Julian - looking forward to the incubation phase (and a smooth exit)!
 
Steven.

Ashutosh Chauhan

unread,
May 21, 2014, 2:54:04 AM5/21/14
to Julian Hyde, opti...@googlegroups.com, tdun...@apache.org, Alan Gates, ste...@apache.org
Hello everyone,

Congrats for acceptance in Incubator!

As the next step, I have setup status page at http://incubator.apache.org/projects/optiq.html As you can see in that page there are few items which needs to get done. Also, I have filed an infra ticket for creating git repo for us at : https://issues.apache.org/jira/browse/INFRA-7776 We need to determine if we want to import github repo as it is or some variant of it?

Thanks,
Ashutosh

Julian Hyde

unread,
May 21, 2014, 3:22:07 PM5/21/14
to opti...@googlegroups.com
Moving champion & mentors to Bcc. They can subscribe to optiq-dev (and soon d...@optiq.incubator.apache.org) to stay in the discussion.

On May 20, 2014, at 11:54 PM, Ashutosh Chauhan <ashutosh...@gmail.com> wrote:

As the next step, I have setup status page at http://incubator.apache.org/projects/optiq.html As you can see in that page there are few items which needs to get done.

Thanks for setting up the page. Yup, lots to do.

When we have jira access, I plan to create a jira case for each task, so that we can assign and track.

There are a lot of regular contributors on this project that I would like to promote to committers in the near future. I am hoping that people will volunteer to take on a share of the housekeeping.

I'll start a discussion with the community on what the tasks should be, and who can do them.

Also, I have filed an infra ticket for creating git repo for us at : https://issues.apache.org/jira/browse/INFRA-7776 We need to determine if we want to import github repo as it is or some variant of it?

I think we should import the git repository, but just the master branch. (About 2 years of history, 800 commits.) We also need to import the repositories of https://github.com/julianhyde/linq4j and https://github.com/julianhyde/optiq-csv.

See my comments on https://issues.apache.org/jira/browse/INFRA-7776 and let me know whether that works from an IT and legal perspective.

Julian

Julian Hyde

unread,
May 21, 2014, 5:17:49 PM5/21/14
to Ted Dunning, opti...@googlegroups.com

On May 21, 2014, at 2:12 PM, Ted Dunning <ted.d...@gmail.com> wrote:

> I recommend auto-subscribing mentors and champion to all lists. We can always unsubscribe, but it is much easier to stay aware if we are scanning the mailing list. Better that the default is that we stay aware, especially early on.

I agree. I've subscribed mentors to optiq-dev until the apache lists are set up.

Julian

Ashutosh Chauhan

unread,
May 24, 2014, 10:30:19 PM5/24/14
to opti...@googlegroups.com, dev
Another update.

Following lists have been created.


Interested folks may choose to subscribe by sending blank email to dev-su...@optiq.incubator.apache.org and so on for other two.

I encourage everyone on this list to move their discussions to d...@optiq.incubator.apache.org for all project related discussion.


--
You received this message because you are subscribed to the Google Groups "optiq-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to optiq-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Reply all
Reply to author
Forward
0 new messages