_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
Send new Informix-list mailing list postings to
inform...@iiug.org
If replying to a posting, the message number must appear on the
Subject line. This will maintain the discussion thread. The message
number is inside square brackets (e.g. [102]). You may edit the rest
of the Subject line.
To subscribe or unsubscribe log in to the Member Area of
http://www.iiug.org and click on "Subscribe to E-mail Lists".
You can reach the person managing the list at
informix-...@iiug.org
Today's Topics:
1. Best practices for IDS on virtual machine? (R. Spitz)
2. Re: Best practices for IDS on virtual machine? (Nick Lello)
3. Re: Best practices for IDS on virtual machine? (Art Kagel)
4. Execution Plan (tuser...@gmail.com)
5. Re: Execution Plan (Art Kagel)
6. Re: Execution Plan (tuser...@gmail.com)
7. Re: Execution Plan (Art Kagel)
---------- Forwarded message ----------
From: "R. Spitz" <rspi...@googlemail.com>
To: inform...@iiug.org
Cc:
Date: Thu, 11 Apr 2013 13:05:29 -0700 (PDT)
Subject: Best practices for IDS on virtual machine?
Hi Informixers,
I'm system and database admin for our departmental IDS server. Currently we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit) on a 5 year old Xeon machine. Overall data and transaction volume is rather modest, total amount of data is about 3GB. Performance has never been an issue with the current hardware, but now its time to plan for a replacement server.
Company policy dictates that dedicated server hardware for departmental servers is to be avoided, so it will have to be a virtual machine under VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the latest IDS 12.10 version.
I've always used raw devices on partitions in RAID1 configuration, but what is the best way to go in a virtual environment? I'm aware that IO performance might be rather poor compared to physical hardware, but I have no choice. The ESX admins promised me storage space on a RAID10 LUN, how should I configure this?
Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if available in IDS Workgroup Edition) still the recommended file system type? I'm unsure about the number of chunks and dbspaces in the virtual Environment: Does it make sense to use separate chunks and dbspaces for rootdbs, data and logical logs, or should I put everything into one large rootdbs since I cannot influence the physical storage anyway?
I'd appreciate any pointers to "best practices" in the virtual environment.
Regards, Richard
---------- Forwarded message ----------
From: Nick Lello <nick....@rentrakmail.com>
To: "R. Spitz" <rspi...@googlemail.com>
Cc: inform...@iiug.org
Date: Thu, 11 Apr 2013 21:14:11 +0100
Subject: Re: Best practices for IDS on virtual machine?ext2 with DIRECT_IO worked well for me under VMWare ESXi .... however get as much memory as possible assigned to the VM and make your buffers larger than normal.Expect slower than usual checkpoints and adjust to cater.Try to persuade the VMWare admins to give your VM higher than normal disk priority.. every little helps.
On 11 April 2013 21:05, R. Spitz <rspi...@googlemail.com> wrote:
Hi Informixers,
I'm system and database admin for our departmental IDS server. Currently we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit) on a 5 year old Xeon machine. Overall data and transaction volume is rather modest, total amount of data is about 3GB. Performance has never been an issue with the current hardware, but now its time to plan for a replacement server.
Company policy dictates that dedicated server hardware for departmental servers is to be avoided, so it will have to be a virtual machine under VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the latest IDS 12.10 version.
I've always used raw devices on partitions in RAID1 configuration, but what is the best way to go in a virtual environment? I'm aware that IO performance might be rather poor compared to physical hardware, but I have no choice. The ESX admins promised me storage space on a RAID10 LUN, how should I configure this?
Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if available in IDS Workgroup Edition) still the recommended file system type? I'm unsure about the number of chunks and dbspaces in the virtual Environment: Does it make sense to use separate chunks and dbspaces for rootdbs, data and logical logs, or should I put everything into one large rootdbs since I cannot influence the physical storage anyway?
I'd appreciate any pointers to "best practices" in the virtual environment.
Regards, Richard
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
--
Nick Lello | Web Architect
o +44 (0) 843.330.9374 | o 503.284.7581 ext. 418 | m +44 (0) 750.890.3456 | Skype: nicholas.lello
Regards, Richard
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
---------- Forwarded message ----------
From: tuser...@gmail.com
To: inform...@iiug.org
Cc:
Date: Fri, 12 Apr 2013 06:59:46 -0700 (PDT)
Subject: Execution Plan
Hi All,
Given a statement that is prepared this way
select
p.*
from
person p
join address a on a.person_id = p.id
where
(0 = $1 or p.name = $2)
and (0 = $3 or a.city = $4)
where $1-$4 are host variables and there is a non-unique index on both p.name and c.city - what execution plan would be used?
Would conditional clause selection like this always do a table scan on both tables, because it doesn't know prior to execution which indices to use?
Or does table statistics aid in avoiding table scan and choose the indices that are present?
Will the plan creation be deferred until the first execution with values for the host variables?
Thoughts on this article?
http://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic
>From the Performance Guide:
Page: 13-32
When a statement contains host variables, the database server replaces the host variables with placeholders when it stores the statement in the SQL statement cache. Therefore, the statement is optimized without the database server having access to the values of the host variables. In some cases, if the database server had access to the values of the host variables, the statement might be optimized differently, usually because the distributions stored for a column inform the optimizer exactly how many rows pass the filter.
Thanks,
Tom
---------- Forwarded message ----------
From: Art Kagel <art....@gmail.com>
To: tuser...@gmail.com
Cc: inform...@iiug.org
Date: Fri, 12 Apr 2013 10:16:54 -0400
Subject: Re: Execution Plan
Tom:
I don't have any easy answer for you, but I do have a few questions:
- What version of Informix are you using? Different versions will optimize this join differently.
- What host language are you using that uses $<num> as a host var that is not replaced until OPEN time?
- Move the filters into the ON clause of the JOIN otherwise the engine will have to perform a sequential scan of one or both tables, write the joined tuples into a temp table and perform the filtered fetches from the temp table even if you supplied constant values instead of replaceable parameters. ANSO SQL rules require that filters in the WHERE clause be applied post-join requiring the temp table.
- Why not just run the thing with SET EXPLAIN enabled or the EXPLAIN optimizer directive in it (with or without the AVOID_EXECUTE option) and look at the actual query plan rather than have us speculate?
- Also try this as a UNION (OK three UNIONed SELECTs) instead of the OR conditions.
- Also try it with the address table in a non-correlated subquery (both with the OR and with a UNION instead) like:
AND p.id in (select a.id from address as a where a.city = $4)There are many ways to write a SELECT statement and get the same results. If you have not tested them all, you may not be using the best one.
ArtArt S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.
On Fri, Apr 12, 2013 at 9:59 AM, <tuser...@gmail.com> wrote:Hi All,
Given a statement that is prepared this way
select
p.*
from
person p
join address a on a.person_id = p.id
where
(0 = $1 or p.name = $2)
and (0 = $3 or a.city = $4)
where $1-$4 are host variables and there is a non-unique index on both p.name and c.city - what execution plan would be used?
Would conditional clause selection like this always do a table scan on both tables, because it doesn't know prior to execution which indices to use?
Or does table statistics aid in avoiding table scan and choose the indices that are present?
Will the plan creation be deferred until the first execution with values for the host variables?
Thoughts on this article?
http://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic
>From the Performance Guide:
Page: 13-32
When a statement contains host variables, the database server replaces the host variables with placeholders when it stores the statement in the SQL statement cache. Therefore, the statement is optimized without the database server having access to the values of the host variables. In some cases, if the database server had access to the values of the host variables, the statement might be optimized differently, usually because the distributions stored for a column inform the optimizer exactly how many rows pass the filter.
Thanks,
Tom
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
---------- Forwarded message ----------
From: tuser...@gmail.com
To: inform...@iiug.org
Cc:
Date: Fri, 12 Apr 2013 08:04:06 -0700 (PDT)
Subject: Re: Execution Plan
Thanks much, Art
Will answer more of the questions in a bit - I am working through some developers in this posting.
We are on version 11.7.FC5
as an added FYI from one of the team :
I do not know if this is worth adding or not.
A common coding practice is to make use of short circuiting. If you know something will be false, you can use that to avoid executing the rest of the line - goDoAnExpensiveAndLongThing() :
if( 3!=3 && goDoAnExpensiveAndLongThing()) {
<skipped>
Even changing that into:
if( 4==4 || tryThisTheHardWay()) {
<executed and avoided the hard way>
That same approach would not have the same impact here:
where
(0 = $1 or p.name = $2)
Since SQL statements are not executed in a sequential manner, I am not sure if it is worth highlighting that difference. This might be absolutely obvious to everyone already.
---------- Forwarded message ----------
From: Art Kagel <art....@gmail.com>
To: tuser...@gmail.com
Cc: inform...@iiug.org
Date: Fri, 12 Apr 2013 11:31:30 -0400
Subject: Re: Execution Plan
Wait, are those filter "zero equal $1" and "zero equal $3"? I thought that you just had a column named letter O! OK, those will evaluate to false if $1 and $3 are not also zero. That's not a short circuit, especially since they are OR'd with the other filters. They will be evaluated every time.
Informix 11.70 supports multi-index scans so it's possible that multiple indexes could be used to satisfy the multiple filters, but we'd need to know the indexing structure. Again, the best thing to do is to try all of the various versions of the select under SET EXPLAIN, time them and see which runs fastest and has the most reasonable query plan (since the specific values plugged into replaceable parameters can change the query plan you always want a version of the select that will produce a reasonable plan for any given value). Make sure the check the query plans to make sure that your data distributions are up-to-date.
Note that prepared queries with replaceable parameters go through a final optimization step after the parameter values are known at OPEN time (or the first FETCH if the open is deferred).
Art
Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.
On Fri, Apr 12, 2013 at 11:04 AM, <tuser...@gmail.com> wrote:Thanks much, Art
Will answer more of the questions in a bit - I am working through some developers in this posting.
We are on version 11.7.FC5
as an added FYI from one of the team :
I do not know if this is worth adding or not.
A common coding practice is to make use of short circuiting. If you know something will be false, you can use that to avoid executing the rest of the line - goDoAnExpensiveAndLongThing() :
if( 3!=3 && goDoAnExpensiveAndLongThing()) {
<skipped>
Even changing that into:
if( 4==4 || tryThisTheHardWay()) {
<executed and avoided the hard way>
That same approach would not have the same impact here:
where
(0 = $1 or p.name = $2)
Since SQL statements are not executed in a sequential manner, I am not sure if it is worth highlighting that difference. This might be absolutely obvious to everyone already.
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
2013/4/12 R. Spitz <rspi...@googlemail.com>
Thanks for the quick replies. After a quick look at what the IDS 12.10 admin guide says, I'm thoroughly confused:
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
- IBM's Hypervisor has the lowest overhead and highest IO rates.
- If you must use VMWare make sure you are using the vSphere hypervisor, it's IO is about double the older hypervisors from VMWare or running without one.
- Yes RAID10.
- As much as possible make sure you do not have competition for the physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and NEVER email stores. The access patterns are just too different from database access patterns.
- EXT2 with DIRECT_IO, yes.
- Yes to separate dbspaces for high access data like rootdb, logical logs, physical logs, high access tables and indexes. Informix assigns IO resources per dbspace and per chunk depending on specifics. More dbspaces and more chunks can lead to more total IO throughput. Keeping logical logs on different LUNs from physical logs and from high IO data dbspaces will take advantage of more physical resources like IO channels and spindles.
Regards, Richard
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
--
Nick Lello | Web Architect
o +44 (0) 843.330.9374 | o 503.284.7581 ext. 418 | m +44 (0) 750.890.3456 | Skype: nicholas.lello
---------- Forwarded message ----------
From: Art Kagel <art....@gmail.com>
To: "R. Spitz" <rspi...@googlemail.com>
Cc: inform...@iiug.org
Date: Thu, 11 Apr 2013 16:20:50 -0400
Subject: Re: Best practices for IDS on virtual machine?
VM Best practices for databases:
- IBM's Hypervisor has the lowest overhead and highest IO rates.
- If you must use VMWare make sure you are using the vSphere hypervisor, it's IO is about double the older hypervisors from VMWare or running without one.
- Yes RAID10.
- As much as possible make sure you do not have competition for the physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and NEVER email stores. The access patterns are just too different from database access patterns.
- EXT2 with DIRECT_IO, yes.
- Yes to separate dbspaces for high access data like rootdb, logical logs, physical logs, high access tables and indexes. Informix assigns IO resources per dbspace and per chunk depending on specifics. More dbspaces and more chunks can lead to more total IO throughput. Keeping logical logs on different LUNs from physical logs and from high IO data dbspaces will take advantage of more physical resources like IO channels and spindles.
ArtArt S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.
On Thu, Apr 11, 2013 at 4:05 PM, R. Spitz <rspi...@googlemail.com> wrote:
Hi Informixers,
I'm system and database admin for our departmental IDS server. Currently we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit) on a 5 year old Xeon machine. Overall data and transaction volume is rather modest, total amount of data is about 3GB. Performance has never been an issue with the current hardware, but now its time to plan for a replacement server.
Company policy dictates that dedicated server hardware for departmental servers is to be avoided, so it will have to be a virtual machine under VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the latest IDS 12.10 version.
I've always used raw devices on partitions in RAID1 configuration, but what is the best way to go in a virtual environment? I'm aware that IO performance might be rather poor compared to physical hardware, but I have no choice. The ESX admins promised me storage space on a RAID10 LUN, how should I configure this?
Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if available in IDS Workgroup Edition) still the recommended file system type? I'm unsure about the number of chunks and dbspaces in the virtual Environment: Does it make sense to use separate chunks and dbspaces for rootdbs, data and logical logs, or should I put everything into one large rootdbs since I cannot influence the physical storage anyway?
I'd appreciate any pointers to "best practices" in the virtual environment.
Regards, Richard
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
ArtArt S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.
On Fri, Apr 12, 2013 at 9:59 AM, <tuser...@gmail.com> wrote:
Hi All,
Given a statement that is prepared this way
select
p.*
from
person p
join address a on a.person_id = p.id
where
(0 = $1 or p.name = $2)
and (0 = $3 or a.city = $4)
where $1-$4 are host variables and there is a non-unique index on both p.name and c.city - what execution plan would be used?
Would conditional clause selection like this always do a table scan on both tables, because it doesn't know prior to execution which indices to use?
Or does table statistics aid in avoiding table scan and choose the indices that are present?
Will the plan creation be deferred until the first execution with values for the host variables?
Thoughts on this article?
http://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic
>From the Performance Guide:
Page: 13-32
When a statement contains host variables, the database server replaces the host variables with placeholders when it stores the statement in the SQL statement cache. Therefore, the statement is optimized without the database server having access to the values of the host variables. In some cases, if the database server had access to the values of the host variables, the statement might be optimized differently, usually because the distributions stored for a column inform the optimizer exactly how many rows pass the filter.
Thanks,
Tom
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
Art
Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.
On Fri, Apr 12, 2013 at 11:04 AM, <tuser...@gmail.com> wrote:
Thanks much, Art
Will answer more of the questions in a bit - I am working through some developers in this posting.
We are on version 11.7.FC5
as an added FYI from one of the team :
I do not know if this is worth adding or not.
A common coding practice is to make use of short circuiting. If you know something will be false, you can use that to avoid executing the rest of the line - goDoAnExpensiveAndLongThing() :
if( 3!=3 && goDoAnExpensiveAndLongThing()) {
<skipped>
Even changing that into:
if( 4==4 || tryThisTheHardWay()) {
<executed and avoided the hard way>
That same approach would not have the same impact here:
where
(0 = $1 or p.name = $2)
Since SQL statements are not executed in a sequential manner, I am not sure if it is worth highlighting that difference. This might be absolutely obvious to everyone already.
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
--Peace
Colin McDermott
- IBM's Hypervisor has the lowest overhead and highest IO rates.
- If you must use VMWare make sure you are using the vSphere hypervisor, it's IO is about double the older hypervisors from VMWare or running without one.
- Yes RAID10.
- As much as possible make sure you do not have competition for the physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and NEVER email stores. The access patterns are just too different from database access patterns.
- EXT2 with DIRECT_IO, yes.
- Yes to separate dbspaces for high access data like rootdb, logical logs, physical logs, high access tables and indexes. Informix assigns IO resources per dbspace and per chunk depending on specifics. More dbspaces and more chunks can lead to more total IO throughput. Keeping logical logs on different LUNs from physical logs and from high IO data dbspaces will take advantage of more physical resources like IO channels and spindles.
Apologies for the previous mispost, Moderators please delete.I notice that you recommend ext2 with DIRECT_IO. Is this to avoid the journaling overhead of ext3? Is there a substantial performance boost in switching to ext2? Is JFS or Reiserfs better for this. I take it that the Plog and Llog should remove the need for the journal in ext3.
Cesar, your timings for read-reread using the OS cache are credible and I will say something about that at the end. However, note that whether you use RAW or COOKED without DIRECT_IO or COOKED with DIRECT_IO your data is safe! When you disable DIRECT_IO and are using COOKED chunks, Informix opens all chunk files, except temp dbspaces, with the O_SYNC flag enabled which causes any writes to those files to be immediately flushed to disk before acknowledging the write as complete. Informix in turn does not complete any transaction on an unbuffered log database unless the logical log buffer write containing the COMMIT record has been acknowledged by the OS that means that your data is safe! Now if you are using BUFFERED logging, then there is a small risk that transactions accumulating in the log buffers may not be written to disk until the buffer fills. That opens a window of risk, but the risk is the same for RAW and COOKED with DIRECT_IO if you use buffered logging.
Now to your timings. Yes, reading and rereading the same data from the buffer cache will be faster with COOKED files than with RAW or DIRECT_IO which bypass the buffer cache. However, you did not present any timings for writing! That is where RAW and DIRECT_IO improve your performance. With COOKED files and no DIRECT_IO Informix is writing from its buffers to the OS buffers and the O_SYNC flag is forcing the OS to wake a sync thread to write the newly dirtied cache pages to disk. Informix has to wait for the extra memory copy then for the OS IO scheduler to wake an IO thread and perform the write. With RAW devices and with DIRECT_IO enabled for COOKED files, the copy from Informix memory to OS memory is skipped and an Informix AIO VP performs the write directly or a KAIO thread schedules the asynchronous IO in the OS kernel's asynchronous IO service and returns to work without having to wait. This is at least 25-20% faster than an O_SYNC write which is completely synchronous and so must be performed by Informix's AIO VPs.
Test it!
Art
Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/
Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.
2013/4/12 R. Spitz <rspi...@googlemail.com>
Thanks for the quick replies. After a quick look at what the IDS 12.10 admin guide says, I'm thoroughly confused:
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
Regards, Richard
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list