PostgreSQL 8.4, breakdown into iterations

54 views
Skip to first unread message

Heikki Linnakangas

unread,
Aug 28, 2017, 9:04:54 AM8/28/17
to Greenplum Developers
Hi all,

I did some planning of the PostgreSQL 8.4. We talked about how large a
chunk we'll try to merge in each iteration with Daniel, and based on
that I came up with the 4 iterations. the rough schedule looks like this:

ITER 1
#####
| ITER 2
+>#########
| ITER 3
+>##########
| ITER 4
+>#########################################
^
Window functions |
#####################------+
^
Subselects refactoring |
#########------------------+

Sep 1 Sep 22 Oct 6 Dec 5

The timing and the dates obviously depend a lot on the availability of
people to work on this, on disruptions like serious bugs that need
fixing after the GPDB 5 release, and so on. We'll adjust as we go, but
this is what it now looks like.

ITERATION #1
------------

Theme: Warming up, getting people on board, learning the tools and tricks.

Merge upto: commit 0f855d621b, Feb 3 2008

Duration: 7 days

Notes:

A short iteration, with nothing too exciting coming in from the
upstream. The point is to get started, so that we get ourselves
organized, and everyone gets a taste of what the following iterations
will be like.

2-3 days to fix merge conflict markers and make it compile
2-3 days to make all the regression tests pass again
1-2 days to open PR and fix straggler regression tests failures from
concourse.


ITERATION #2
------------

Theme: Random stuff

Merge upto: commit 4e82a95476, Apr 10 2008

Duration: 14 days

Notes:

Longer and bigger than the first iteration, but nothing particular in
the commit list jumps out that would conflict badly.


ITERATION #3
------------
Theme: amgetmulti AM changes, plus lots more random stuff

Merge upto: commit 3f0e808c4a, Aug 11 2008

Duration: 14 days

Notes:

Lots more random stuff. The last commit of this iteration is just before
the relation forks commit (more on that in the next iteration). A couple
of notable changes:

* The amgetmulti API changes will conflict with similar changes in GPDB.
Need to determine whether we continue to support "streaming bitmaps".
Probably yes, because it's easier to keep supporting them than to do the
benchmarking and explaining require to remove it, even though I suspect
no customer would notice if we just removed it.

* Misc planner improvements that I suspect will conflict, but nothing
too major I hope.


ITERATION #4
------------

Theme: Big features

Upto: The rest of the REL8_4_STABLE

Duration: 60 days

Notes:

This includes all the major feature from the 8.4 cycle:
* Window functions
* relation forks, FSMs and visibility map. (trouble with filerep expected)
* SEMI and ANTI join handling (Subselect refactoring)
* CTEs

That may sound bad, but the plan is to have the Window functions, CTEs,
and subselect refactoring completed, in the master branch, before we
start this iteration. If all goes well, after the refactoring, those
upstream commits will slide in without much conflict, because it's
mostly done already.

Relation forks will conflict filerep. Not sure what we need to do to
filerep to keep it working. In principle the FSM and VM are optional, so
we could just not replicate them. Performance after failover might suck
without an up-to-date FSM though. But we could leave that as a
TODO/FIXME, and the problem will eventually go away once WAL replication
lands.

This assumes that we will not have WAL replication until iteration #4
begins. I wish we did, because then there will be no major conflicts
related to relation forks, but it seems unlikely.

This iteration is massive compared to the others, but because those big
features were added incrementally, and the increments on each feature
overlap, it would be inconvenient to split. Let's take a closer look
once we get there, but all things considered, I think this one big
iteration is the best.


Thoughts, comments?

- Heikki

Heikki Linnakangas

unread,
Nov 7, 2017, 4:13:13 AM11/7/17
to Greenplum Developers
A status update:

We have completed merging the iteration 3, in the original breakdown
below. We've refined the plan for what remains. If you're comparing this
with the original plan
(https://groups.google.com/a/greenplum.org/forum/#!searchin/gpdb-dev/iterations/gpdb-dev/qlTs4J36Xo4/OhGzrZDhCgAJ),
we split the originally planned huge iteration 4 into two: 4 and 5.

We are slightly behind schedule at this point. Iterations #1 and #2 went
quickly, in less time than I estimated, but we squandered that lead in
iteration #3. I had estimated iteration #3 to take two weeks, and it
took six. We had a lot of trouble with the pipeline, and also run into
some existing bugs. From the point where we were "done" with iteration
#3 and installcheck-world was passing locally and on the pipeline, it
took us over 2 weeks to hunt down and fix the remaining failures. It's
no time to panic yet, but we cannot afford to get bogged down like that
too often.


ITERATION #3.2
--------------

Theme: hash-based DISTINCT and UNION/INTERSECT/EXPECT

Merge upto: eca1388629 (17 commits)

Duration: 3 days

Notes:

We decided to split off this small group of 17 commits into a separate
iteration, for multiple reasons. First, it's a nice, tight, group of
commits that provide certain functionality. Second, this works as a
practice run for new developers joining the team, because we can go
through the whole cycle very quickly. Third, the Window Functions (see
below) work depend or conflict with this, so it's nice to get this
merged sooner.


ITERATION #4
------------

Theme: Free-Space Map and Visibility Map, relation forks

Merge upto: 38e9348282 (582 commits)

Duration: 30 days

Notes:

This iteration includes the introduction of relation forks. The
Free-Space Map is reimplemented using relation forks, and a new
visibility map is introduced.

Those relation forks work will conflict heavily with GPDB's filerep
code, because many of the storage APIs are changed to deal with relation
forks. I had hoped that filerep would be gone, replaced with WAL
replication, before reaching this point, but alas. In principle the FSM
and VM are optional, so we will just not replicate them. Performance
after failover might suck without an up-to-date FSM though. We will
leave that as a TODO/FIXME, and the problem will eventually go away once
WAL replication lands. (I'm not sure if we have the same problem even
with filerep today; I don't think we keep the FSM up-to-date in the
mirrors.)

This iteration also includes changes to the planner, for pulling up
subqueries into SEMI and ANTI joins, as well as CTES (WITH clause). The
QP team cherry-picked both of these changes earlier already, which
hopefully makes this go in smoothly.


WINDOW FUNCTIONS
----------------

PostgreSQL 8.4 added support for Window Functions. GPDB already had an
implementation of window functions, but it was quite different. We will
replace the GPDB implementation with the upstream implementation, to
avoid merge conflicts, now and in the future. This needs to be done
before Iteration #5, because the upstream commit that introduces window
functions comes in that iteration.

90% of the work for this has already been done, and there is a PR open:
https://github.com/greenplum-db/gpdb/pull/3426. It's almost there, just
need to fix some remaining bugs, and make ORCA work with the new
executor implementation. So, only the other 90% of the work remains, as
these things tend to go :-).


ITERATION #5
------------

Theme: Window functions, cleanups

Upto: The rest, up to PostgreSQL 8.4.0 (803 commits)

Duration: 30 days

Notes:

This includes Window Functions. Per the previous item, by the time we
reach this point, that should already be taken care of, so it is
expected to cause only trivial merge conflicts.

This is tail of 8.4 release cycle, so aside from the window functions,
most of the commits are little tweaks here and there, refactorings and
fixes for things that were done earlier in the cycle. But there are a
lot of them.

- Heikki
Reply all
Reply to author
Forward
0 new messages