statement | pqc(ms) | postgres
(ms)
----------------------------------------------------+---------+--------------
SELECT 1 | 0.13 | 0.05
SELECT * FROM pgbench_accounts WHERE aid = ? | 0.14 | 0.10
SELECT max(bid) FROM pgbench_accounts WHERE bid = ? | 0.14 | 0.17
SELECT count(*) FROM pgbench_accounts WHERE bid = ? | 3.10 | 36.00
Now I did 1000 iterations of each query in a loop (Php script), picking
an range of values that meant there was a reasonable chance of repeated
values.
Looks like the overhead of cache checking (and pgpool based bits) is
about 0.13->14 ms per query. Clearly as the statements get more
expensive then pqc helps a lot.
By way of comparison here is how Mariadb 5.2 with the same dataset in
Xtradb behaves with its query cache:
statement | cache(ms) |
nocache (ms)
----------------------------------------------------+-----------+--------------
SELECT 1 | 0.06 | 0.05
SELECT * FROM pgbench_accounts WHERE aid = ? | 0.10 | 0.10
SELECT max(bid) FROM pgbench_accounts WHERE bid = ? | 0.07 | 0.09
SELECT count(*) FROM pgbench_accounts WHERE bid = ? | 3.57 | 34.00
The cache probe overhead in this case looks to be around 0.06 ms
(actually re-executing the above queries several times brings the times
down to 0.05 ms for them all). This is quite a bit faster than the
pgpool + memcache design of pqc (at the moment), is there any way to
improve this for pqc? This could well be important as there are certain
web apps that do a great many simple single table queries with key
equality access, and it would be good if these were faster with the
cache enabled!
Cheers
Mark
Thanks for the report.
It's great observations, and it shows PostgreSQL is catching up
with MySQL. :)
2011/04/15 7:45, Mark Kirkwood wrote:
> statement | pqc(ms) | postgres (ms)
>
----------------------------------------------------+---------+--------------
>
> SELECT 1 | 0.13 | 0.05
> SELECT * FROM pgbench_accounts WHERE aid = ? | 0.14 | 0.10
> SELECT max(bid) FROM pgbench_accounts WHERE bid = ? | 0.14 | 0.17
> SELECT count(*) FROM pgbench_accounts WHERE bid = ? | 3.10 | 36.00
According to this measure, pqc seems to be having 0.04~0.08ms overhead
within it for each query processing.
> This is quite a bit faster than the
> pgpool + memcache design of pqc (at the moment), is there any way to
> improve this for pqc?
It's very interesting question.
I don't have any clear idea about that right now, but I think
some kind of profiling with using SystemTap, oprofile or else, would
be able to clarify the bottleneck and a way to solve it.
> This could well be important as there are certain
> web apps that do a great many simple single table queries with key
> equality access, and it would be good if these were faster with the
> cache enabled!
Yes, I'm interested in that way, too.
Regards,
--
NAGAYASU Satoshi <satoshi....@gmail.com>
2011/04/15 9:32, Satoshi Nagayasu wrote:
> 2011/04/15 7:45, Mark Kirkwood wrote:
> > statement | pqc(ms) | postgres (ms)
> >
> ----------------------------------------------------+---------+--------------
>
> >
> > SELECT 1 | 0.13 | 0.05
> > SELECT * FROM pgbench_accounts WHERE aid = ? | 0.14 | 0.10
> > SELECT max(bid) FROM pgbench_accounts WHERE bid = ? | 0.14 | 0.17
> > SELECT count(*) FROM pgbench_accounts WHERE bid = ? | 3.10 | 36.00
>
> According to this measure, pqc seems to be having 0.04~0.08ms overhead
> within it for each query processing.
>
> > This is quite a bit faster than the
> > pgpool + memcache design of pqc (at the moment), is there any way to
> > improve this for pqc?
>
> It's very interesting question.
>
> I don't have any clear idea about that right now, but I think
> some kind of profiling with using SystemTap, oprofile or else, would
> be able to clarify the bottleneck and a way to solve it.
I have reproduced the situation and got the result.
On my server, a "SELECT 1" query takes 80 usec for each without pqc,
and 200 usec for each with pqc. So, I have done profiling with using
oprofile to observe more details.
Here is the result reported by oprofile.
--------------------------------------------------------------
[root@devsv02 snaga]# opcontrol --no-vmlinux --start; ./test2; opcontrol --stop
[root@devsv02 snaga]# opreport -l
warning: /no-vmlinux could not be found.
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples % app name symbol name
10577 97.6729 no-vmlinux /no-vmlinux
30 0.2770 libmemcached.so.5.0.2 /opt/uptime/querycache/lib/libmemcached.so.5.0.2
22 0.2032 memcached /opt/uptime/querycache/bin/memcached
19 0.1755 libc-2.5.so __write_nocancel
15 0.1385 libc-2.5.so send
13 0.1200 libc-2.5.so recv
13 0.1200 libc-2.5.so vfprintf
13 0.1200 libpthread-2.5.so __sendmsg_nocancel
12 0.1108 libevent-1.4.so.2.2.0 /opt/uptime/querycache/lib/libevent-1.4.so.2.2.0
7 0.0646 librt-2.5.so clock_gettime
6 0.0554 pqcd encode_key
6 0.0554 pqcd pqc_get_cache
5 0.0462 libc-2.5.so memcpy
5 0.0462 libpthread-2.5.so __read_nocancel
4 0.0369 bash /bin/bash
4 0.0369 libc-2.5.so __select_nocancel
4 0.0369 libc-2.5.so memset
4 0.0369 libc-2.5.so poll
4 0.0369 libpthread-2.5.so pthread_mutex_lock
4 0.0369 pqcd dump_cache_data
4 0.0369 pqcd pool_process_query
4 0.0369 pqcd pqc_send_cached_messages
3 0.0277 libc-2.5.so __ctype_b_loc
3 0.0277 libc-2.5.so _int_free
3 0.0277 libc-2.5.so buffered_vfprintf
2 0.0185 libc-2.5.so __epoll_wait_nocancel
2 0.0185 libc-2.5.so __find_specmb
2 0.0185 libc-2.5.so __strtoul_internal
2 0.0185 libc-2.5.so htonl
2 0.0185 libc-2.5.so malloc
2 0.0185 libc-2.5.so strncpy
2 0.0185 libc-2.5.so vsnprintf
2 0.0185 libpq.so.5.3 pqParseInput3
2 0.0185 pqcd .plt
2 0.0185 pqcd pool_debug
2 0.0185 pqcd pool_write
1 0.0092 libc-2.5.so _IO_default_xsputn
1 0.0092 libc-2.5.so _IO_do_write@@GLIBC_2.2.5
1 0.0092 libc-2.5.so _IO_setb
1 0.0092 libc-2.5.so __errno_location
1 0.0092 libc-2.5.so __libc_enable_asynccancel
1 0.0092 libc-2.5.so __read_nocancel
1 0.0092 libc-2.5.so _int_malloc
1 0.0092 libc-2.5.so epoll_wait
1 0.0092 libc-2.5.so free
1 0.0092 libc-2.5.so strdup
1 0.0092 libc-2.5.so strncasecmp
1 0.0092 libpq.so.5.3 PQresultStatus
1 0.0092 libpq.so.5.3 getRowDescriptions
1 0.0092 libpq.so.5.3 pqGetInt
1 0.0092 libpq.so.5.3 pqGetc
1 0.0092 libpq.so.5.3 pqGets_internal
1 0.0092 libpq.so.5.3 pqSendSome
1 0.0092 libpq.so.5.3 pqsecure_read
1 0.0092 libpthread-2.5.so __pthread_disable_asynccancel
1 0.0092 oprofiled /usr/bin/oprofiled
1 0.0092 pqcd pool_flush_it
1 0.0092 pqcd pqc_send_message
1 0.0092 pqcd strlcpy
[root@devsv02 snaga]#
--------------------------------------------------------------
According to this observation, pqc does not have much overhead
within itself, calling memcached takes a bit more time,
and calling kernel takes most of the elapsed time.
So, what I can do right now is trying to reduce costs in encode_key
and pqc_get_cache, but I guess it would not improve pqc performance
dramatically.
Of course, if anyone have a good idea to make a breakthrough,
it would be welcome. :)
Hmm - not very enlightening is it? By way of comparison what does
oprofile of "SELECT 1" without pqc look like?
regards
Mark
2011/04/24 18:06, Mark Kirkwood wrote:
> Hmm - not very enlightening is it? By way of comparison what does
> oprofile of "SELECT 1" without pqc look like?
Here is the result without pqc.
--------------------------------------------------------------
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples % app name symbol name
5960 97.7370 no-vmlinux /no-vmlinux
26 0.4264 postgres hash_seq_search
7 0.1148 postgres SHMQueueNext
5 0.0820 postgres LockReleaseAll
4 0.0656 libc-2.5.so send
4 0.0656 postgres CleanupTempFiles
4 0.0656 postgres set_ps_display
3 0.0492 bash /bin/bash
3 0.0492 postgres hash_any
3 0.0492 postgres hash_search_with_hash_value
3 0.0492 postgres pg_encoding_mbcliplen
2 0.0328 libc-2.5.so _int_free
2 0.0328 libc-2.5.so _int_malloc
2 0.0328 libc-2.5.so poll
2 0.0328 postgres AllocSetDelete
2 0.0328 postgres AllocSetFreeIndex
2 0.0328 postgres AtCommit_Notify
2 0.0328 postgres CommitTransaction
2 0.0328 postgres IsTransactionOrTransactionBlock
2 0.0328 postgres MemoryContextDelete
2 0.0328 postgres PostgresMain
2 0.0328 postgres pgstat_report_xact_timestamp
2 0.0328 postgres pq_getbyte
2 0.0328 postgres pq_recvbuf
1 0.0164 ld-2.5.so _dl_sysdep_start
1 0.0164 ld-2.5.so dl_main
1 0.0164 libc-2.5.so __xstat
1 0.0164 libc-2.5.so _dl_addr
1 0.0164 libc-2.5.so _nl_intern_locale_data
1 0.0164 libc-2.5.so mbrtowc
1 0.0164 libc-2.5.so memcmp
1 0.0164 libc-2.5.so recv
1 0.0164 libc-2.5.so strchr
1 0.0164 libpq.so.5.3 PQmakeEmptyPGresult
1 0.0164 libpq.so.5.3 PQsendQueryStart
1 0.0164 libpq.so.5.3 appendBinaryPQExpBuffer
1 0.0164 postgres .plt
1 0.0164 postgres AllocSetReset
1 0.0164 postgres AtEOXact_GUC
1 0.0164 postgres AtEOXact_Inval
1 0.0164 postgres AtEOXact_LargeObject
1 0.0164 postgres AtEOXact_LocalBuffers
1 0.0164 postgres AtEOXact_MultiXact
1 0.0164 postgres AtEOXact_Namespace
1 0.0164 postgres AtEOXact_SPI
1 0.0164 postgres AtEarlyCommit_Snapshot
1 0.0164 postgres EnableNotifyInterrupt
1 0.0164 postgres EndCommand
1 0.0164 postgres ExecEvalConst
1 0.0164 postgres GetCurrentStatementStartTimestamp
1 0.0164 postgres LWLockRelease
1 0.0164 postgres MemoryContextAlloc
1 0.0164 postgres MemoryContextSwitchTo
1 0.0164 postgres ProcArrayEndTransaction
1 0.0164 postgres ReadyForQuery
1 0.0164 postgres RecordTransactionCommit
1 0.0164 postgres RegisterSnapshot
1 0.0164 postgres ReleaseResources_hash
1 0.0164 postgres ResourceOwnerNewParent
1 0.0164 postgres ResourceOwnerRelease
1 0.0164 postgres ShowTransactionState
1 0.0164 postgres TupleDescInitEntry
1 0.0164 postgres deregister_seq_scan
1 0.0164 postgres errstart
1 0.0164 postgres hash_seq_init
1 0.0164 postgres hash_seq_term
1 0.0164 postgres internal_flush
1 0.0164 postgres internal_putbytes
1 0.0164 postgres pgstat_clear_snapshot
1 0.0164 postgres pgstat_report_stat
--------------------------------------------------------------
Can you read something from this?
Hmm - again all kernel. What event is oprofile actually counting here?
Usually I'd expect to see something like: "Counted CPU_CLK_UNHALTED
events (clocks processor is not halted)" in the output.
I'm wondering if oprofile is configured to count something strange on
your box. Heh - we might need someone with heavy oprofile foo to help us :-)
regards
Mark
Hmm, I have used oprofile with its default setting on CentOS 5.5,
so I need to review it again to understand what actually reported.
I will try it, and yes, oprofile experts also needed. :)
I'll also give it a try on my (Ubuntu 10.10) box here at home (am on
leave having kittens), and let you know if I find anything interesting
(probably tomorrow sometime).
Cheers
Mark
Doing some oprofiling, I'm seeing similar stuff to you:
$ opreport -l
CPU: AMD64 processors, speed 2000.48 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 100000
samples % image name app name
symbol name
318125 79.8924 no-vmlinux no-vmlinux
/no-vmlinux
13375 3.3589 libc-2.12.1.so pqcd
/lib/libc-2.12.1.so
7794 1.9573 libmemcached.so.5.0.0 pqcd
/usr/lib/libmemcached.so.5.0.0
6333 1.5904 libc-2.12.1.so php
/lib/libc-2.12.1.so
5682 1.4269 memcached memcached
/usr/bin/memcached
2198 0.5520 oprofiled oprofiled
/usr/bin/oprofiled
1935 0.4859 [vdso] (tgid:20539 range:0x50f000-0x510000) pqcd
[vdso] (tgid:20539 range:0x50f000-0x510000)
1866 0.4686 libevent-1.4.so.2.1.3 memcached
/usr/lib/libevent-1.4.so.2.1.3
1741 0.4372 pqcd pqcd
pqc_check_cache_avail
1614 0.4053 libc-2.12.1.so memcached
/lib/libc-2.12.1.so
1474 0.3702 libpq.so.5.2 php
pqParseInput3
952 0.2391 php php
zend_parse_va_args
790 0.1984 libpthread-2.12.1.so memcached
pthread_mutex_lock
759 0.1906 [vdso] (tgid:1116 range:0x3ec000-0x3ed000) memcached
[vdso] (tgid:1116 range:0x3ec000-0x3ed000)
680 0.1708 php php
zend_do_fcall_common_helper_SPEC
652 0.1637 libpq.so.5.2 php pqGetInt
635 0.1595 [vdso] (tgid:21222 range:0x2fa000-0x2fb000) php
[vdso] (tgid:21222 range:0x2fa000-0x2fb000)
631 0.1585 pqcd pqcd T.76
So the global report shows most of the cpu time going to kernel and libc
- however that is for the entire system (notice some php stuff too - my
script to do "SELECT 1" is php)
Trying some more specific reports:
$ opreport -l /usr/local/bin/pqcd
CPU: AMD64 processors, speed 2000.48 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
13375 44.5225 libc-2.12.1.so /lib/libc-2.12.1.so
7794 25.9445 libmemcached.so.5.0.0 /usr/lib/libmemcached.so.5.0.0
1935 6.4412 [vdso] (tgid:20539 range:0x50f000-0x510000) [vdso]
(tgid:20539 range:0x50f000-0x510000)
1741 5.7954 pqcd pqc_check_cache_avail
631 2.1005 pqcd T.76
515 1.7143 pqcd Query
454 1.5113 pqcd dump_cache_data
399 1.3282 pqcd pqc_send_message
374 1.2450 pqcd pool_process_query
374 1.2450 pqcd pool_write
299 0.9953 pqcd pool_debug
263 0.8755 pqcd pqc_get_cache
237 0.7889 pqcd query_ps_status
224 0.7456 pqcd pool_check_fd
203 0.6757 pqcd pqc_send_cached_messages
178 0.5925 pqcd pool_flush_it
161 0.5359 pqcd strlcpy
132 0.4394 pqcd ProcessFrontendResponse
112 0.3728 pqcd pool_flush
110 0.3662 pqcd skip_comment
97 0.3229 pqcd pool_read
90 0.2996 pqcd set_ps_display
88 0.2929 pqcd consume_pending_data
84 0.2796 pqcd pqc_check_cache_hint
70 0.2330 pqcd save_pending_data
43 0.1431 pqcd pool_read2
41 0.1365 pqcd is_drop_database
16 0.0533 pqcd pqc_push_current_query
1 0.0033 pqcd pqc_buf_init
That is a bit more interesting - suggests most of the time is spent
doing non pqcd things like libc and libmemcached. Let's look at them:
$ opreport -l /lib/libc-2.12.1.so
CPU: AMD64 processors, speed 2000.48 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 100000
samples % image name app name
symbol name
13375 59.4815 libc-2.12.1.so pqcd
/lib/libc-2.12.1.so
6333 28.1642 libc-2.12.1.so php
/lib/libc-2.12.1.so
1614 7.1778 libc-2.12.1.so memcached
/lib/libc-2.12.1.so
423 1.8812 libc-2.12.1.so multiload-applet-2
/lib/libc-2.12.1.so
219 0.9739 libc-2.12.1.so bash
/lib/libc-2.12.1.so
193 0.8583 libc-2.12.1.so oprofiled
/lib/libc-2.12.1.so
61 0.2713 libc-2.12.1.so Xorg
/lib/libc-2.12.1.so
49 0.2179 libc-2.12.1.so ophelp
/lib/libc-2.12.1.so
38 0.1690 libc-2.12.1.so compiz
/lib/libc-2.12.1.so
35 0.1557 libc-2.12.1.so gawk
/lib/libc-2.12.1.so
26 0.1156 libc-2.12.1.so postgres
/lib/libc-2.12.1.so
Hmm - so most of our libc time is used by pqcd, what about memcached:
$ opreport -l /usr/bin/memcached
CPU: AMD64 processors, speed 2000.48 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
5682 44.3283 memcached /usr/bin/memcached
1866 14.5577 libevent-1.4.so.2.1.3 /usr/lib/libevent-1.4.so.2.1.3
1614 12.5917 libc-2.12.1.so /lib/libc-2.12.1.so
790 6.1632 libpthread-2.12.1.so pthread_mutex_lock
759 5.9214 [vdso] (tgid:1116 range:0x3ec000-0x3ed000) [vdso]
(tgid:1116 range:0x3ec000-0x3ed000)
485 3.7837 librt-2.12.1.so /lib/librt-2.12.1.so
280 2.1844 libpthread-2.12.1.so __pthread_mutex_unlock_usercnt
262 2.0440 libpthread-2.12.1.so __i686.get_pc_thunk.bx
259 2.0206 libpthread-2.12.1.so pthread_mutex_unlock
259 2.0206 libpthread-2.12.1.so sendmsg
258 2.0128 libpthread-2.12.1.so read
193 1.5057 libpthread-2.12.1.so __read_nocancel
66 0.5149 libpthread-2.12.1.so __pthread_enable_asynccancel
45 0.3511 libpthread-2.12.1.so __pthread_disable_asynccancel
Hmm - not sure why memcached is showing up in its own profile like that....
Anyway, I'd guess that pqcd -> memcached communication is a good place
to start looking for things to improve. Casting an eye over
pqc_check_cache_avail which is the top actual function in pqcd's profile
I can't see anything obvious that would be slow. There is a call to
pqc_get_cache which in turn does memcached_get. I'd be tempted to put
timers on those functions and see what percentage of time is spent therein!
Cheers
Mark
Here's an interesting data point, I built (what I think is) the Pgpool
version (3.4.1) you have used in Pqc and added that to the mix:
statement (avg over 1000 repetitions) | pqc(ms) | pgpool (ms)| postgres
(ms)
--------------------------------------+---------+------------+------
SELECT 1 | 0.12 | 0.13 | 0.07
So Pgpool's overhead appears to be about the same as we are seeing from
Pqc, so maybe looking at what extra processing Pgpool is doing in its
Query function might be interesting.
Cheers
Mark
Apologies - it would have been better if I had not placed the Pgpool,
Pqc data in the wrong columns! Correct table:
statement (avg over 1000 repetitions) | pqc(ms) | pgpool (ms)| postgres
(ms)
--------------------------------------+---------+------------+------
SELECT 1 | 0.13 | 0.12 | 0.07
Also note that I'm using a slightly slower machine (2GHz vs 2.7GHz
prev), which is why the native Postgres elapsed is 0.07 instead of 0.05
(interesting that the Pqc measurement is exactly the same).
regards
Mark