Replicated tables and COPY FROM SEGMENT

Heikki Linnakangas

unread,

Nov 29, 2018, 11:57:44 AM11/29/18

to Greenplum Developers, Pengzhou Tang

Hi!

Currently, it's possible to load different rows on different segments,
for replicated table, using COPY FROM ON SEGMENT. COPY FROM ON SEGMENT
on a replicated table loads the data like it would on any other table.
If the per-segment files contain different data on different segments,
it will merrily load them anyway.

It's symmetric with COPY TO ON SEGMENT, which also copies the data to
the file on each segment, producing an identical file on each segment
(unless you managed to load inconsistent data into them already).

Is that intentional? Should we do something to protect the users from
that? Refuse ON SEGMENT with replicated tables?

- Heikki

Robert Eckhardt

unread,

Nov 29, 2018, 12:13:21 PM11/29/18

to Heikki Linnakangas, gpdb...@greenplum.org, Pengzhou Tang

We should protect users.

I think only loading replicated tables through the master makes
perfect sense to me.

-- Rob

>
> - Heikki
>
>

Scott Kahler

unread,

Nov 29, 2018, 12:39:17 PM11/29/18

to Robert Eckhardt, Heikki Linnakangas, gpdb...@greenplum.org, Pengzhou Tang

How does that work when you restore a backup with replicated tables currently? Does each seg get a dump and is supposed to COPY it back in, which makes a bit of an issue if we try to restore back through the master. Or does only one version of the table get dumped on master and it is fed back to all of segments during restore?

--

Scott Kahler | Pivotal, Greenplum Product Management | ska...@pivotal.io | 816.237.0610

Karen Huddleston

unread,

Nov 29, 2018, 2:44:37 PM11/29/18

to Scott Kahler, Robert Eckhardt, Heikki Linnakangas, gpdb...@greenplum.org, pt...@pivotal.io

I'm a little confused about what the intended behavior for replicated tables is. My understanding is that each segment has a copy of all (or only some?) of the data in the table so that UDFs run from functions can access all the data they need without having to bring data from other segments.

I did a small test to see how backup/restore handles this currently

CREATE TABLE repl_foo(i int, j int) DISTRIBUTED REPLICATED;

INSERT INTO repl_foo SELECT generate_series(1,1000), generate_series(1,1000);

SELECT * from repl_foo; on master showed 1000 rows.

I then ran gpbackup on the database with the replicated table and looked at the data files.

Each segment data file showed 1000 rows.

I then ran gprestore using the backup I had just taken and looked at the resulting table.

SELECT * from repl_foo; on master showed 1000 rows and didn't appear to have any duplicates.

I also did yet another backup of the newly restored table for good measure and saw that the data files looked the same as when I had taken the first backup.

From this experiment it seems like backup/restore and thus also COPY TO ON SEGMENT and COPY FROM ON SEGMENT are working correctly for replicated tables? Please correct me if there seems to be something I am missing or something else I should look at.

-Karen

Ashwin Agrawal

unread,

Nov 29, 2018, 2:55:56 PM11/29/18

to Karen Huddleston, Scott Kahler, Robert Eckhardt, Heikki Linnakangas, gpdb...@greenplum.org, Pengzhou Tang

On Thu, Nov 29, 2018 at 11:44 AM Karen Huddleston <khudd...@pivotal.io> wrote:

I'm a little confused about what the intended behavior for replicated tables is. My understanding is that each segment has a copy of all (or only some?) of the data in the table so that UDFs run from functions can access all the data they need without having to bring data from other segments.

I did a small test to see how backup/restore handles this currently

CREATE TABLE repl_foo(i int, j int) DISTRIBUTED REPLICATED;
INSERT INTO repl_foo SELECT generate_series(1,1000), generate_series(1,1000);

SELECT * from repl_foo; on master showed 1000 rows.

I then ran gpbackup on the database with the replicated table and looked at the data files.
Each segment data file showed 1000 rows.
I then ran gprestore using the backup I had just taken and looked at the resulting table.
SELECT * from repl_foo; on master showed 1000 rows and didn't appear to have any duplicates.
I also did yet another backup of the newly restored table for good measure and saw that the data files looked the same as when I had taken the first backup.

From this experiment it seems like backup/restore and thus also COPY TO ON SEGMENT and COPY FROM ON SEGMENT are working correctly for replicated tables? Please correct me if there seems to be something I am missing or something else I should look at.

I think the key point in initial mail was "If the per-segment files contain different data on different segments". What you tested is same data in files on all the segment. If for whatever reason the files differ (like go ahead and delete data from one segment file) there seems to be no check which enfoces or validates after restore the contents are same.

But yes I am not sure when would a case arise apart from corruption to backup file which could lead to this? And if that's case how is it any different from any other tables's data restore where row could go missing? (Sure for replicated table redundant copies can be leveraged and we could perform check if we wish too.)

Karen Huddleston

unread,

Nov 29, 2018, 5:09:16 PM11/29/18

to Ashwin Agrawal, Scott Kahler, Robert Eckhardt, Heikki Linnakangas, gpdb...@greenplum.org, pt...@pivotal.io

Ok, I understand a bit more. I'll start by saying that if COPY FROM ON SEGMENT is being used via gprestore, we do have row count checks to make sure we are restoring the same number of rows as were backed up. If the content of the rows change, but not the number, we don't have a way to check that, but I think any table could have a similar risk of that happening so I don't see how it is worse for replicated tables. This would be for cases when the data changed on the filesystem between COPY TO ON SEGMENT and COPY FROM ON SEGMENT.

For cases where the data is messed up on the system before it is copied out, we probably don't have a good way to handle it. I would argue that this shouldn't be our problem though and I'm not sure what would be the recommended way to fix it if this were to happen. What would be the source of truth in that case?

If we did modify COPY FROM ON SEGMENT to load data through the master, this would impact backup/restore, though I'm not sure by how much. I'm still struggling to see the benefit of this because the current system seems to be working fine when the data is intact, and the issues only seem to arise when the data gets corrupted, and we do have ways to catch it if gpbackup/gprestore is being used. I'm not sure what our procedure is generally in the case of data corruption.

Robert Eckhardt

unread,

Nov 29, 2018, 5:17:23 PM11/29/18

to Karen Huddleston, Ashwin Agrawal, Scott Kahler, Heikki Linnakangas, gpdb...@greenplum.org, Pengzhou Tang

On Thu, Nov 29, 2018 at 4:09 PM Karen Huddleston <khudd...@pivotal.io> wrote:
>
> Ok, I understand a bit more. I'll start by saying that if COPY FROM ON SEGMENT is being used via gprestore, we do have row count checks to make sure we are restoring the same number of rows as were backed up. If the content of the rows change, but not the number, we don't have a way to check that, but I think any table could have a similar risk of that happening so I don't see how it is worse for replicated tables. This would be for cases when the data changed on the filesystem between COPY TO ON SEGMENT and COPY FROM ON SEGMENT.

I'm honestly not concerned about gprestore. My assumption with restore
is that it is just putting back what was taken out. We can assume, I
hope, properly.

Where Heikki's note did concern me was the number of other programs
that are using copy as an API for getting data into our out of
greenplum where the intent is to update and modify things.

-- Rob

Heikki Linnakangas

unread,

Nov 29, 2018, 5:19:22 PM11/29/18

to Karen Huddleston, Ashwin Agrawal, Scott Kahler, Robert Eckhardt, gpdb...@greenplum.org, pt...@pivotal.io

Yeah, it happens to work fine for gpbackup. It's not optimal, to back up
the same data on every segment, but it's probably not a problem in
practice, because you wouldn't mark very large tables as replicated in
the first place.

But COPY ON SEGMENT is not just for gpbackup, it's a general purpose
command, available for users. It needs to have sane, intuitive behavior
on its own.

- Heikki

--

- Heikki

Karen Huddleston

unread,

Nov 29, 2018, 5:38:43 PM11/29/18

to Heikki Linnakangas, Ashwin Agrawal, Scott Kahler, Robert Eckhardt, gpdb...@greenplum.org, pt...@pivotal.io

It is generally available, but the ON SEGMENT clause was primarily intended to be used for gpbackup and gprestore. There is a note in the documentation that says "Warning: Use of the ON SEGMENT clause is recommended for expert users only."

It also seems to me like the current behavior is pretty consistent with what someone would expect it to do. If replicated tables are supposed to replicate the same data to all segments, it makes sense that all segment files would get copies of the same data and also that all files you are copying in would need to have the same data to work properly.

I do agree that we don't want to make it too easy for users to mess things up so that is an argument for only backing up a single copy of the data and restoring that through the master. It just seems to me like mistakes could happen regardless of the method we pick. Either data could get messed up on 1 segment, or the 1 copy of the data you have could get messed up. Keeping behavior more consistent across all types of tables seems like it will help reduce complexity and prevent us the developers from making mistakes in the code that will break functionality so that would be my vote.

Ashwin Agrawal

unread,

Nov 29, 2018, 9:26:21 PM11/29/18

to Karen Huddleston, Heikki Linnakangas, Scott Kahler, Robert Eckhardt, gpdb...@greenplum.org, Pengzhou Tang

One of the ideas proposed initially for backup files to be in csv was, someone, can take the backup files and load them back on the same or different size cluster via gpfdist. If that's true then having multiple copies of replicated table files is super problematic.

I am in the same boat keeping it consistent across table types is simpler than having exceptions to go via master or dump differently. But flexibility provided around what can be done with the backed up files may prove concerning in long run.

Adam Lee

unread,

Nov 29, 2018, 10:18:40 PM11/29/18

to Ashwin Agrawal, Karen Huddleston, Heikki Linnakangas, Scott Kahler, Robert Eckhardt, gpdb...@greenplum.org, Pengzhou Tang

I vote for only allowing loading replicated tables through the master.

To make utilities like gpcopy/gprestore's life eaiser, maybe we could
have a GUC for experts? Just
like the "gp_enable_segment_copy_checking".

BTW, since keeping replicated tables consistent is important, do we
need to have a automatically
checking and repair mechanism to guarantee it?

--
Adam Lee

Pengzhou Tang

unread,

Nov 29, 2018, 10:43:21 PM11/29/18

to Adam Lee, Ashwin Agrawal, khudd...@pivotal.io, Heikki Linnakangas, Scott Kahler, Robert Eckhardt, gpdb...@greenplum.org

When replicated table feature was added, considered of gpdump/gprestore still need COPY ON SEGMENT, so didn't disable it.

All the concerns about this seems reasonable, as Hekki said, when files differ, COPY ON SEGMENT to a replicated table is problematic, or as

Ashwin said, using gpfdist also may produce duplicated rows unexpected. Another example, user copy a replicated table to files using ON SEGMENT,

and then copy those files to a partition table using ON SEGMENT, this should be problematic.

So, disabling COPY ON SEGMENT for replicated table sounds the right thing to do, normally the size of a replicated table is not that big,

copying it through master should not be a problem.

are there other components rely on the COPY ON SEGMENT feature? gpcopy? we need to change COPY ON SEGMENT to normal COPY for

those components.

I am thinking of if there is a need to do a consistent check for the replicated table if all user interfaces can guarantee the consistent.

Thanks,

Pengzhou

Karen Huddleston

unread,

Nov 30, 2018, 3:02:29 PM11/30/18

to Pengzhou Tang, Oak Barrett, Adam Lee, Ashwin Agrawal, Heikki Linnakangas, Scott Kahler, Robert Eckhardt, gpdb...@greenplum.org

I want to get Oak's feedback on some of this.

It sounds like the proposal is to disable the use of COPY ON SEGMENT for replicated tables and force them to use regular COPY without ON SEGMENT. I assume this would include both COPY TO ON SEGMENT and COPY FROM ON SEGMENT.

This would definitely impact gpbackup and gprestore. COPY places the resulting data files on the master host. Currently, the architecture for backup is that the schema/metadata files go on the master and all data goes on the segments. This would break that pattern. I am not sure that customers would be ok with us placing data files on the master host, even if they are relatively small because they are replicated tables. Oak what do you think about that?

Asim R P

unread,

Dec 1, 2018, 2:37:57 AM12/1/18

to Karen Huddleston, Pengzhou Tang, Oak Barrett, Adam Lee, Ashwin Agrawal, Heikki Linnakangas, Scott Kahler, Robert Eckhardt, gpdb...@greenplum.org

On Fri, Nov 30, 2018 at 12:02 PM Karen Huddleston
<khudd...@pivotal.io> wrote:
>
> It sounds like the proposal is to disable the use of COPY ON SEGMENT for replicated tables and force them to use regular COPY without ON SEGMENT. I assume this would include both COPY TO ON SEGMENT and COPY FROM ON SEGMENT.

Yes, the proposed change applies to both to and from.

>
> This would definitely impact gpbackup and gprestore. COPY places the resulting data files on the master host. Currently, the architecture for backup is that the schema/metadata files go on the master and all data goes on the segments. This would break that pattern. I am not sure that customers would be ok with us placing data files on the master host, even if they are relatively small because they are replicated tables.
>

Well, the current behavior potentially leads to wrong results. That's
much worse as compared to the inconvenience of placing data files on
master.

Another alternative is to make COPY ON SEGMENT analogous to direct
dispatch when invoked on a replicated table. The command is
dispatched to all segments but QD includes additional information such
as content ID of one segment. Only that segment which is specified by
the QD actually writes data to the file. All other segments create
empty files. The dump thus gets created on segments. COPY FROM
SEGMENT may not even need any change.

Disclaimer: do not expect reason in my loud thinking, especially on a
Friday night!

Asim

Ashwin Agrawal

unread,

Dec 3, 2018, 2:30:36 AM12/3/18

to Asim Praveen, Karen Huddleston, Pengzhou Tang, Oak Barrett, Adam Lee, Heikki Linnakangas, Scott Kahler, Robert Eckhardt, gpdb...@greenplum.org

On Fri, Nov 30, 2018 at 11:37 PM Asim R P <apra...@pivotal.io> wrote:

Another alternative is to make COPY ON SEGMENT analogous to direct
dispatch when invoked on a replicated table. The command is
dispatched to all segments but QD includes additional information such
as content ID of one segment. Only that segment which is specified by
the QD actually writes data to the file. All other segments create
empty files. The dump thus gets created on segments. COPY FROM
SEGMENT may not even need any change.

Sounds like an option to consider to avoid dumping data on master. But still extra care needs to be taken for restore. Shouldn't happen data is only loaded back to one segment and missed completely on others.

Adam Lee

unread,

Dec 3, 2018, 3:57:53 AM12/3/18

to Ashwin Agrawal, Asim Praveen, Karen Huddleston, Pengzhou Tang, Oak Barrett, Heikki Linnakangas, Scott Kahler, Robert Eckhardt, gpdb...@greenplum.org

If dumped to only one segment, others having empty files, COPY FROM ON
SEGMENT won't distribute the data to every segment for now.

I like to have a GUC, forbid COPY FROM ON SEGMENT TO a replicated
table by default, allow when the GUC switched on.

--
Adam Lee

Simon Gao

unread,

Dec 5, 2018, 1:36:56 AM12/5/18

to Adam Lee, Ashwin Agrawal, Asim Praveen, Karen Huddleston, Pengzhou Tang, Oak Barrett, Heikki Linnakangas, Scott Kahler, Robert Eckhardt, Greenplum Developers, Jinbao Chen

The similar trouble can also happen in the following cases when DBA wants to backup a table before the change, and then restore it after the change:

1) change a table from random to hash distributed

2) change a table from hash distributed to replicated, or the reverse

3) expand a cluster, e.g. from 4 segments to 6 segments

All tables have some internal distribution constraints. If the data is inserted or selected by normal SQLs through master, master checks and follows these constraints. COPY ON SEGMENTS actually provides a new data input/output channel to the cluster, and it is better to follow and handle these constraints too.

Some ideas:

1) COPY TO ON SEGMENT will only let one segment dump its data for replicated table so that there is only one copy of the data in backup file

2) COPY FROM ON SEGMENT will introduce a motion node and broadcast replicated table data to other nodes, or check and re-distributed a tuple to right segment for other cases

offline discussed with Pengzhou and Jinbao, it seems that this is feasible, but just needs some efforts.

Any thoughts?

Oak Barrett

unread,

Dec 5, 2018, 3:49:54 PM12/5/18

to Simon Gao, Adam Lee, Ashwin Agrawal, Asim Praveen, Karen Huddleston, pt...@pivotal.io, Heikki Linnakangas, Scott Kahler, Robert Eckhardt, gpdb...@greenplum.org, jin...@pivotal.io

I like Simon's ideas.

I would prefer not to have gpbackup move data files from segments to master, to ensure proper loading of replicated tables upon gprestore.

If the COPY...SEGMENT functionality can be modified to support Simon's suggestions, I believe this is the most straightforward way to go.

oak

Ashwin Agrawal

unread,

Dec 5, 2018, 9:50:48 PM12/5/18

to Simon Gao, Adam Lee, Asim Praveen, Karen Huddleston, Pengzhou Tang, Oak Barrett, Heikki Linnakangas, Scott Kahler, Robert Eckhardt, Greenplum Developers, Jinbao Chen

On Tue, Dec 4, 2018 at 10:36 PM Simon Gao <sg...@pivotal.io> wrote:

The similar trouble can also happen in the following cases when DBA wants to backup a table before the change, and then restore it after the change:
1) change a table from random to hash distributed
2) change a table from hash distributed to replicated, or the reverse
3) expand a cluster, e.g. from 4 segments to 6 segments

All tables have some internal distribution constraints. If the data is inserted or selected by normal SQLs through master, master checks and follows these constraints. COPY ON SEGMENTS actually provides a new data input/output channel to the cluster, and it is better to follow and handle these constraints too.

Some ideas:
1) COPY TO ON SEGMENT will only let one segment dump its data for replicated table so that there is only one copy of the data in backup file
2) COPY FROM ON SEGMENT will introduce a motion node and broadcast replicated table data to other nodes, or check and re-distributed a tuple to right segment for other cases

offline discussed with Pengzhou and Jinbao, it seems that this is feasible, but just needs some efforts.

Any thoughts?

Good find. Output of discussion between me and Asim, agree. That replicated tables are not the only once needing special treatment for COPY FROM ON SEGMENT. All the tables essentially need to check the distribution constaints and reshuffle if doesn't match the conditions.

Explicitely calling out when restoring hash dumped data to replicated, each segment will essentially need to send the data to all other nodes.

Reply all

Reply to author

Forward