Hi,debug_query_string is a useful variable for developers to find out the current query with gdb, it's recorded on master and dispatched to segments. One problem about it is that if the SQL parsed into multiple statements then debug_query_string is dispatched repeatedly for each statement, this can be very expansive in some cases.
[snip]
We are considering to optimize this, there are many options:0. never dispatch debug_query_string;1. only dispatch debug_query_string in debug build;2. only dispatch debug_query_string if it's small enough (and count of substmts is small enough);3. only dispatch debug_query_string on the first substmt and provide some way to let segments reuse it in other substmts;4. provide a GUC to control whether to dispatch it;5. etc.;Without debug_query_string it's still possible for us to find out the segment id and command id from a segment core dump (gp_session_id, gp_command_count), and thus find out the sql from the master logs.
How easy is to strip out the comments. Wouldn't that be a good next step?
--
You received this message because you are subscribed to the Google Groups "Greenplum Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-dev+u...@greenplum.org.
Yes, truncated debug_query_string for QE seems to be a better idea.By the way, from the perspective of performance, I'm wondering if we should start to think about compression for those dispatched stuffs also. Personally I'm curious about the compression ratio of those dispatched stuffs and the cpu vs network io balance.
Yes, truncated debug_query_string for QE seems to be a better idea.By the way, from the perspective of performance, I'm wondering if we should start to think about compression for those dispatched stuffs also. Personally I'm curious about the compression ratio of those dispatched stuffs and the cpu vs network io balance.
Ning Yu <n...@pivotal.io> 于2018年10月10日周三 上午8:31写道:
Yes, personally I also vote for truncating.On Wed, Oct 10, 2018 at 1:25 AM, Asim R P <apra...@pivotal.io> wrote:Good finding.
On Mon, Oct 8, 2018 at 8:17 PM Ning Yu <n...@pivotal.io> wrote:
>
> 2. only dispatch debug_query_string if it's small enough (and count of substmts is small enough);
I support this option. How about dispatching always but truncating if
it's longer than a threshold?
--
You received this message because you are subscribed to the Google Groups "Greenplum Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-dev+unsubscribe@greenplum.org.
when debug problem, developers can find the full SQL from master log without recompiling. We reviewed existing GUCs and didn't see a proper one to control this. We want to avoid introducing a new GUC just for this specific purpose.
--
You received this message because you are subscribed to the Google Groups "Greenplum Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpdb-dev+u...@greenplum.org.
On Wed, Oct 10, 2018 at 3:27 AM Simon Gao <sg...@pivotal.io> wrote:when debug problem, developers can find the full SQL from master log without recompiling. We reviewed existing GUCs and didn't see a proper one to control this. We want to avoid introducing a new GUC just for this specific purpose.Why avoid a new GUC. The performance gain is so HUGE its worth adding a GUC? The only other good option is to just never make that debug info available on the segment and get the gain without adding the GUC. But i am not sure if developers will need this info ever?
On 10/10/2018 07:28, Ning Yu wrote:
Lots of good discussion. I couldn't decide which parts to reply to, so I
just collected my thoughts as a list:
* The elephant in the room is that a user's CREATE TABLE command can be
expanded to thousands of commands between the QD and QEs, if the table
is partitioned. If we could refactor that so that we'd dispatch all the
sub-commands as one batch, that would be nice. Aside from the
debug_query_string effect, it would reduce the number of round-trips.
* I object to removing debug_query_string completely, and to having a
GUC for it. debug_query_string is a very valuable debugging aid. When I
attach a debugger to a core dump or a hung backend, that's the first
thing I look at. If we have a GUC for it, it's in practice never going
to be there when you need it the most.
This will tremendously decrease the time it takes for user to restore their databases with partitioned tables.My quick test:Test Case:------------------------------- 2 Host Virtual cluster / 4 segments per host- 1 partitioned table with 1461 child partitions- original CREATE TABLE statement 139 lines & 12kb file size- gpbackup generated CREATE TABLE statement 191,525 lines & 18MB file sizeon 5.11.0:--------------------------------original CREATE: 66.37 secondsgpabackup CREATE: 423.98 secondson 5.11.1+dev.51.g23ae137e54:--------------------------------original CREATE: 77.16 secondsgpabackup CREATE: 78.52 seconds
This will tremendously decrease the time it takes for user to restore their databases with partitioned tables.My quick test:Test Case:------------------------------- 2 Host Virtual cluster / 4 segments per host- 1 partitioned table with 1461 child partitions- original CREATE TABLE statement 139 lines & 12kb file size- gpbackup generated CREATE TABLE statement 191,525 lines & 18MB file sizeon 5.11.0:--------------------------------original CREATE: 66.37 secondsgpabackup CREATE: 423.98 secondson 5.11.1+dev.51.g23ae137e54:--------------------------------original CREATE: 77.16 secondsgpabackup CREATE: 78.52 seconds
On Thu, Oct 11, 2018 at 1:22 PM Oak Barrett <obar...@pivotal.io> wrote:
This will tremendously decrease the time it takes for user to restore their databases with partitioned tables.
Nice!!!
Surprised we didn't realize and evaluate before what was causing the slow-down or that difference. Not sure why original CREATE table on 5.11.1+dev didn't perform at same speed or actually also better compared to 5.11.0.