A status update:
We have completed merging the iteration 3, in the original breakdown
below. We've refined the plan for what remains. If you're comparing this
with the original plan
(
https://groups.google.com/a/greenplum.org/forum/#!searchin/gpdb-dev/iterations/gpdb-dev/qlTs4J36Xo4/OhGzrZDhCgAJ),
we split the originally planned huge iteration 4 into two: 4 and 5.
We are slightly behind schedule at this point. Iterations #1 and #2 went
quickly, in less time than I estimated, but we squandered that lead in
iteration #3. I had estimated iteration #3 to take two weeks, and it
took six. We had a lot of trouble with the pipeline, and also run into
some existing bugs. From the point where we were "done" with iteration
#3 and installcheck-world was passing locally and on the pipeline, it
took us over 2 weeks to hunt down and fix the remaining failures. It's
no time to panic yet, but we cannot afford to get bogged down like that
too often.
ITERATION #3.2
--------------
Theme: hash-based DISTINCT and UNION/INTERSECT/EXPECT
Merge upto: eca1388629 (17 commits)
Duration: 3 days
Notes:
We decided to split off this small group of 17 commits into a separate
iteration, for multiple reasons. First, it's a nice, tight, group of
commits that provide certain functionality. Second, this works as a
practice run for new developers joining the team, because we can go
through the whole cycle very quickly. Third, the Window Functions (see
below) work depend or conflict with this, so it's nice to get this
merged sooner.
ITERATION #4
------------
Theme: Free-Space Map and Visibility Map, relation forks
Merge upto: 38e9348282 (582 commits)
Duration: 30 days
Notes:
This iteration includes the introduction of relation forks. The
Free-Space Map is reimplemented using relation forks, and a new
visibility map is introduced.
Those relation forks work will conflict heavily with GPDB's filerep
code, because many of the storage APIs are changed to deal with relation
forks. I had hoped that filerep would be gone, replaced with WAL
replication, before reaching this point, but alas. In principle the FSM
and VM are optional, so we will just not replicate them. Performance
after failover might suck without an up-to-date FSM though. We will
leave that as a TODO/FIXME, and the problem will eventually go away once
WAL replication lands. (I'm not sure if we have the same problem even
with filerep today; I don't think we keep the FSM up-to-date in the
mirrors.)
This iteration also includes changes to the planner, for pulling up
subqueries into SEMI and ANTI joins, as well as CTES (WITH clause). The
QP team cherry-picked both of these changes earlier already, which
hopefully makes this go in smoothly.
WINDOW FUNCTIONS
----------------
PostgreSQL 8.4 added support for Window Functions. GPDB already had an
implementation of window functions, but it was quite different. We will
replace the GPDB implementation with the upstream implementation, to
avoid merge conflicts, now and in the future. This needs to be done
before Iteration #5, because the upstream commit that introduces window
functions comes in that iteration.
90% of the work for this has already been done, and there is a PR open:
https://github.com/greenplum-db/gpdb/pull/3426. It's almost there, just
need to fix some remaining bugs, and make ORCA work with the new
executor implementation. So, only the other 90% of the work remains, as
these things tend to go :-).
ITERATION #5
------------
Theme: Window functions, cleanups
Upto: The rest, up to PostgreSQL 8.4.0 (803 commits)
Duration: 30 days
Notes:
This includes Window Functions. Per the previous item, by the time we
reach this point, that should already be taken care of, so it is
expected to cause only trivial merge conflicts.
This is tail of 8.4 release cycle, so aside from the window functions,
most of the commits are little tweaks here and there, refactorings and
fixes for things that were done earlier in the cycle. But there are a
lot of them.
- Heikki