I think this would be very good feature but in general I would like to
see a bit more broad notion of priorities attached to the index pages.
This can range from "never replace"=pinned to be first candidates
for replacement.
For example full table scans can place pages with low priority, so
large table scans do not wipe out buffer pool. Non-leaf pages also
can get higher priority than leaf pages etc.
--
Peter Zaitsev, CEO, Percona Inc.
Tel: +1 888 401 3401 ext 501
24/7 Emergency Line +1 888 401 3401 ext 911
Our Services: http://www.percona.com/services.html
Our Blog: http://www.mysqlperformanceblog.com/
How complex do you think these are?
Peter Zaitsev wrote:
> Baron,
>
> I think this would be very good feature but in general I would like to
> see a bit more broad notion of priorities attached to the index pages.
> This can range from "never replace"=pinned to be first candidates
> for replacement.
>
> For example full table scans can place pages with low priority, so
> large table scans do not wipe out buffer pool. Non-leaf pages also
> can get higher priority than leaf pages etc.
>
>
>
>
> Saturday, January 3, 2009, 11:32:51 AM, you wrote:
>
>
>> What do you think about the possibility to pin a table (or an index --
>> if you want the "table" you pin the PRIMARY index) into the buffer
>> pool so its pages don't get replaced by something else?
>
>
>
>
--
Baron Schwartz, Director of Consulting, Percona Inc.
Tel: +1 888 401 3401 ext 507 (Timezone: EST5EDT; GMT-5/GMT-4)
Skype: baron.schwartz AIM: bps7j
Yasufumi would be good to check. In general I assume some information
will need to be tracked in the buffer pool to track priority of pages
and other information. I bet there is extra space for it on the page
(for example both checksum slots can be used because checksum is only
counted on write our)
When we will need to change the main replacement policy from LRU to be
something smarter.
Finally interface needs to be added to set priority for pages which
are being read (if it is some pinned tables or something else).
Pinning tables also has other issue - we need to store this pinned
flag somewhere if mapping is permanent (ie MyISAM key cache allocation
is not)
BTW looking at midpoint insertion strategy in MyISAM could be also
interesting for ideas for Innodb.
Sunday, January 4, 2009, 3:24:50 PM, you wrote:
> Peter,
> How complex do you think these are?
> Peter Zaitsev wrote:
>> Baron,
>>
>> I think this would be very good feature but in general I would like to
>> see a bit more broad notion of priorities attached to the index pages.
>> This can range from "never replace"=pinned to be first candidates
>> for replacement.
>>
>> For example full table scans can place pages with low priority, so
>> large table scans do not wipe out buffer pool. Non-leaf pages also
>> can get higher priority than leaf pages etc.
>>
>>
>>
>>
>> Saturday, January 3, 2009, 11:32:51 AM, you wrote:
>>
>>
>>> What do you think about the possibility to pin a table (or an index --
>>> if you want the "table" you pin the PRIMARY index) into the buffer
>>> pool so its pages don't get replaced by something else?
>>
>>
>>
>>
--
Peter Zaitsev, CEO, Percona Inc.
Tel: +1 888 401 3401 ext 501
If we do that, can we modify the undo-slots patch so there are no backwards
compatibility problems?
I think we need to check how such things are done.
I think in 5.1 you can have various options specified in CREATE TABLE
syntax.
The hack many tables uses (like FEDERATED) is using special table
comment.
> Is it possible to create supplement file with our extended data, like
> pinning flag for the table?
--
Peter Zaitsev, CEO, Percona Inc.
Tel: +1 888 401 3401 ext 501
There are two things. Here is one example: a customer wants to have a hot
standby master-master replication, but does not want to send any reads to
the standby machine. Since it has replication's write workload but not the
read, its buffer pool is not ready to support reads after a failover. I
have already measured the difference in the buffer pool with a write-only
workload and with a read-write workload, and it is significant. It might be
useful to pin the most important tables in the buffer pool so the write
workload doesn't push them out.
I admit that pinning is a bad solution for this, and I think actually the
"right" solution is to mirror the read workload onto the standby server. I
don't like pinning either.
The second thing is what you pointed out -- your idea makes a lot of sense.
I think the best test is this: benchmark on a workload that has a mixture of
a) single-row lookups by PK and other indexes, and b) full table and index
scans. But this requires you to actually make the change you suggested :)
Baron
--
> For example, OracleDB don't give LRU priority to "(whole) scanned"
> datapages.
> So, I never have used pinning table of OracleDB's.
> I think "pinning table" is special function for specific cases.
This is only one of the reasons. There are other cases when you simply
want to have real time access to the table such as it is in memory.
> Currently, my simplest first idea is
> giving different LRU priority to "leaf pages" and "not leaf pages
> (root, branch)".
> InnoDB datapages have flag shows the level in the btree (0:leaf,
> max:root).
> So, I think we can implement simply by using also an another LRU list
> (for not leaf pages)
> without any change of the datafile format or compatibility.
What does data file have to do with it at all ?
Honestly I think it is best to split notion of the priority for the
pages and when how this priority is set.
This would allow to do a lot of cool stuff easily... For example for
batch jobs (even not full table scans) we may want to lower their
priority so pages touched by these jobs are not high priority. Leaf,
Full Table scans are also different ways to set priority.
--
Peter Zaitsev, CEO, Percona Inc.
Tel: +1 888 401 3401 ext 501
Right, mk-table-checksum queries are not full scans (if you're wisely using
--chunksize and if you have a usable index) but I wouldn't want them to
pollute the cache if I could help it. Right now I can't.
I think we can talk about a lot of complex ways to do this. All models are
wrong but some models are useful. Is there a simple/elegant/whatever model
that has a lot of benefits because it approximates real-world needs better
than just LRU?
I think straightforward leaf/nonleaf is probably worth looking into.
I think it is worth checking what others are using. MyISAM ? Maria ?
PostgreSQL ? PBXT ? What file caches use in Linux ? Solaris ?
Some interesting links are here:
http://www.informatik.uni-trier.de/~ley/db/dbimpl/buffer.html
http://en.wikipedia.org/wiki/Cache_algorithms
http://en.wikipedia.org/wiki/Page_replacement_algorithm
http://en.wikipedia.org/wiki/Adaptive_Replacement_Cache
> I think we can talk about a lot of complex ways to do this. All models are
> wrong but some models are useful. Is there a simple/elegant/whatever model
> that has a lot of benefits because it approximates real-world needs better
> than just LRU?
> I think straightforward leaf/nonleaf is probably worth looking into.
--
> I think there are several properties of implementation.
> [volatile or static?]
> volatile: implement as new command like "FLUSH TABLE"?
> (difficulty) new plugin specific command implementation (?)
> static: implement as new attribution of "CREATE TABLE"?
> (difficulty) where the attribution is stored?
> we must change the datafile format to store the new
> attribution
> (or it may be internal table? are there any problem?)
We need to check what 5.1 offers as storing table properties in .frm
file - I think it was greatly extended.
See for example:
http://dev.mysql.com/doc/refman/5.0/en/federated-use.html
Federated engine adds CONNECTION option. I guess cache policy should
be done similar way.
> * Adding the new attribution to the entries of the online dictionary
> cache
> may not cause compatibility problem of datafiles.
Note this is what we say on per table specification. We also may have
it dynamic on connection/query basics.
> [passive or active?]
> passive: give high priority to the specified tables/indexes.
> active: + read the all pages of the specified tables/indexes at the
> time.
> * "active" may have more things (exception cases?) to consider than
> "passive"
What do you mean by reading all pages by tables/indexes ? I think
this is rather prefetch which is independent from anything else.
If table is pinned we can also do prefetch for it (I think we need a
command to be able to prefetch table anyway)
> * Anyway, I think the all of read/create pages must look up the
> dictionary
> to check the attribution and set the new flag of the page.
Yeah. This is something what needs to be efficient. Though really it
can be handled similar way as compression properties or
compact/redundant format exist now - there is already some space to
store options in Innodb