Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Best practices for IDS on virtual machine?

465 views
Skip to first unread message

R. Spitz

unread,
Apr 11, 2013, 4:05:29 PM4/11/13
to
Hi Informixers,

I'm system and database admin for our departmental IDS server. Currently we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit) on a 5 year old Xeon machine. Overall data and transaction volume is rather modest, total amount of data is about 3GB. Performance has never been an issue with the current hardware, but now its time to plan for a replacement server.

Company policy dictates that dedicated server hardware for departmental servers is to be avoided, so it will have to be a virtual machine under VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the latest IDS 12.10 version.

I've always used raw devices on partitions in RAID1 configuration, but what is the best way to go in a virtual environment? I'm aware that IO performance might be rather poor compared to physical hardware, but I have no choice. The ESX admins promised me storage space on a RAID10 LUN, how should I configure this?

Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if available in IDS Workgroup Edition) still the recommended file system type? I'm unsure about the number of chunks and dbspaces in the virtual Environment: Does it make sense to use separate chunks and dbspaces for rootdbs, data and logical logs, or should I put everything into one large rootdbs since I cannot influence the physical storage anyway?

I'd appreciate any pointers to "best practices" in the virtual environment.

Regards, Richard

Nick Lello

unread,
Apr 11, 2013, 4:14:11 PM4/11/13
to R. Spitz, inform...@iiug.org
ext2 with DIRECT_IO worked well for me under VMWare ESXi .... however get as much memory as possible assigned to the  VM and make your buffers larger than normal.

Expect slower than usual checkpoints and adjust to cater.

Try to persuade the VMWare admins to give your VM higher than normal disk priority.. every little helps.



_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list



--
 
 
Nick Lello | Web Architect
o +44 (0) 843.330.9374503.284.7581 ext. 418 | m +44 (0) 750.890.3456 | Skype: nicholas.lello
Email: nick.lello at rentrak.com
RENTRAK | www.rentrak.com | NASDAQ: RENT

Art Kagel

unread,
Apr 11, 2013, 4:20:50 PM4/11/13
to R. Spitz, inform...@iiug.org
VM Best practices for databases:

  • IBM's Hypervisor has the lowest overhead and highest IO rates.
  • If you must use VMWare make sure you are using the vSphere hypervisor, it's IO is about double the older hypervisors from VMWare or running without one.
  • Yes RAID10.
  • As much as possible make sure you do not have competition for the physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and NEVER email stores.  The access patterns are just too different from database access patterns.
  • EXT2 with DIRECT_IO, yes.
  • Yes to separate dbspaces for high access data like rootdb, logical logs, physical logs, high access tables and indexes.  Informix assigns IO resources per dbspace and per chunk depending on specifics.  More dbspaces and more chunks can lead to more total IO throughput.  Keeping logical logs on different LUNs from physical logs and from high IO data dbspaces will take advantage of more physical resources like IO channels and spindles.

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.


R. Spitz

unread,
Apr 12, 2013, 12:14:51 PM4/12/13
to
Thanks for the quick replies. After a quick look at what the IDS 12.10 admin guide says, I'm thoroughly confused:

"Important: While you must use raw disk devices on UNIX to achieve better performance, recent advances in I/O caching for cooked writes can provide similar if not better performance. [...] If optimum performance is unimportant, you can configure the database server to store data in cooked files. Cooked files are easier to set up than raw disk devices."

This is a little contradictory in itself. Do raw devices still deliver best IO performance or not? On the other hand, I am not even sure that I would be able to use raw devices in the virtual environment.

If I go the route with ext2 filesystem and cooked files, what is the best strategy: Use one big ext2-formated partition and create a file for each chunk, or use several partitions with just one chunk file in each partition?

Regards, Richard

Cesar Inacio Martins

unread,
Apr 12, 2013, 8:19:19 PM4/12/13
to R. Spitz, informix-list cdi
Hi Richard , 

check this practical test about the effect of the cooked file cache (performance focus) :

 | tablename      = my_table                  
 | size allocated = 3168 MB      
 | size used      = 2974 MB
 | rows           = 33.120.000

I reconfigure my test instance with only 40MB of 4k pages buffer (where the data is allocated)
 | $ onstat -
 | IBM Informix Dynamic Server Version 11.70.FC6 -- On-Line -- Up 00:05:43 -- 688536 Kbytes
 | $ onstat -b | grep buff
 |  0 modified, 10000 total, 16384 hash buckets, 2048 buffer size
 |  0 modified, 10000 total, 16384 hash buckets, 4096 buffer size
 |  0 modified, 10000 total, 16384 hash buckets, 8192 buffer size


The machine: Opensuse 12.2 with 4GB memory running over vmware ESXi with 1 SATA disk (this is a desktop, where I use for tests)
the chunks are into ext2 partition , but I'm not using DIRECT_IO , where will make all difference here... because with it will behave as RAW .
 | $ mount | grep ifx
 | /dev/mapper/vgifxdados-lvifxdisco1 on /ifxdados type ext2 (rw,noatime)

Here I clear the Linux cache : 
 | jdivm06:~ # echo 3 > /proc/sys/vm/drop_caches

This output is from "dstat -tmd 5" utility (is a SAR improved)
 | ----system---- ------memory-usage----- -dsk/total-
 |      time     | used  buff  cach  free| read  writ
 | 12-04 20:29:36| 429M 28.6M 2384M 1022M| 211k   75k
 | 12-04 20:29:41| 429M 28.6M 2384M 1022M|   0     0
 | 12-04 20:29:46| 429M 28.6M 2384M 1022M|   0     0
 | 12-04 20:29:51| 215M  264k  298M 3350M|  16k    0   <<<< drop caches run here
 | 12-04 20:29:54| 214M  264k  298M 3351M|5461B    0

I run this select forcing a full scan, remember: 2.9 Gb will be read from disk at least and the engine have only 40 MB of buffers.
I don't know why the output of my "time" command become all together on the same line, anyway you can see , took 1 minute to run.
 | $ echo "select {+full (my_table)} count(1) from my_table" | time dbaccess testedb
 | Database selected.
 |          (count)
 |         33120929
 | 1 row(s) retrieved.
 | Database closed.
 | 0.00 user 0.00 system 1:02.78 elapsed 0%CPU (0 avgtext+0avgdata 11248maxresident)k
 | 0inputs+0outputs (0major+781minor)pagefaults 0swaps

dstat output  : Check the memory usage and disk read.
 | ----system---- ------memory-usage----- -dsk/total-
 |      time     | used  buff  cach  free| read  writ
 | 12-04 20:34:46| 214M  232k  320M 3329M| 958k    0
 | 12-04 20:34:51| 213M  232k  320M 3330M|   0     0
 | 12-04 20:34:56| 217M  336k  455M 3192M|  21M    0  <<<<< start the select here
 | 12-04 20:35:01| 219M  636k  755M 2888M|  60M    0
 | 12-04 20:35:06| 217M  852k  970M 2675M|  43M    0
 | 12-04 20:35:11| 217M 1032k 1151M 2494M|  36M    0
 | 12-04 20:35:16| 217M 1120k 1238M 2407M|  17M    0
 | 12-04 20:35:21| 218M 1160k 1284M 2360M|9162k  401k
 | 12-04 20:35:26| 217M 1440k 1556M 2088M|  54M  345k
 | 12-04 20:35:31| 217M 1684k 1801M 1843M|  49M    0
 | 12-04 20:35:36| 218M 1928k 2044M 1600M|  49M    0
 | 12-04 20:35:41| 218M 2264k 2379M 1264M|  67M    0
 | 12-04 20:35:46| 219M 2472k 2587M 1054M|  42M    0
 | 12-04 20:35:51| 219M 2720k 2839M  803M|  50M    0
 | 12-04 20:35:56| 220M 3108k 3224M  416M|  77M    0  <<<<<< finished here
 | 12-04 20:36:01| 219M 3200k 3319M  322M|  19M    0  
 | 12-04 20:36:06| 220M 3200k 3319M  322M|   0     0
 | 12-04 20:36:11| 220M 3200k 3319M  322M|   0   211k
 | 12-04 20:36:16| 220M 3200k 3319M  322M|   0     0

 
You must agree we will not use much buffer pool from engine right.
So if you go with RAW devices or DIRECT_IO here and run the same statemente again, will take the same or close time.
With cooked file without DIRECT_IO , take only 9 seconds...

 | $ echo "select {+full (my_table)} count(1) from my_table" | time dbaccess testedb
 | Database selected.
 |          (count)
 |         33120929
 | 1 row(s) retrieved.
 | Database closed.
 | 
 | 0.00user 0.00system 0:09.23elapsed 0%CPU (0avgtext+0avgdata 11248maxresident)k
 | 0inputs+0outputs (0major+781minor)pagefaults 0swaps

dstat output : 
 | ----system---- ------memory-usage----- -dsk/total-
 |      time     | used  buff  cach  free| read  writ
 | 12-04 20:37:16| 220M 3200k 3319M  321M|   0     0
 | 12-04 20:37:21| 220M 3200k 3319M  322M|   0     0
 | 12-04 20:37:26| 220M 3200k 3319M  322M|   0     0
 | 12-04 20:37:31| 220M 3200k 3322M  318M| 729k    0  <<< select run here
 | 12-04 20:37:36| 219M 3200k 3329M  312M|1326k    0
 | 12-04 20:37:41| 219M 3200k 3329M  312M|   0     0
 | 12-04 20:37:46| 219M 3200k 3329M  312M|   0     0

This occur because the engine read the cooked file which is on Linux cache...

Running the test for the third and last time... now clearing the cache again...

 | jdivm06:~ # echo 3 > /proc/sys/vm/drop_caches
 
 | ----system---- ------memory-usage----- -dsk/total-
 |      time     | used  buff  cach  free| read  writ
 | 12-04 20:37:56| 219M 3200k 3329M  312M|   0     0
 | 12-04 20:38:01| 219M 3200k 3329M  312M|   0     0
 | 12-04 20:38:06| 220M  196k 2356M 1287M|   0     0   <<<< drop caches here....
 | 12-04 20:38:11| 217M  196k  339M 3307M|  74k  200k


we back to 1 minute... 

 | $ echo "select {+full (my_table)} count(1) from my_table" | time dbaccess testedb
 | Database selected.
 |          (count)
 |         33120929
 | 1 row(s) retrieved.
 | Database closed.
 | 0.00user 0.00system 0:56.56elapsed 0%CPU (0avgtext+0avgdata 11232maxresident)k
 | 9080inputs+0outputs (11major+769minor)pagefaults 0swaps

And all was read again from the disk :
 | ----system---- ------memory-usage----- -dsk/total-
 |      time     | used  buff  cach  free| read  writ
 | 12-04 20:38:16| 217M  196k  339M 3307M|   0     0
 | 12-04 20:38:21| 217M  196k  339M 3307M|   0     0
 | 12-04 20:38:26| 217M  196k  339M 3307M|  10k    0
 | 12-04 20:38:31| 217M  260k  420M 3226M|  16M    0 <<< select start here
 | 12-04 20:38:36| 216M  472k  635M 3012M|  43M    0
 | 12-04 20:38:41| 218M  868k 1028M 2617M|  79M    0
 | 12-04 20:38:46| 217M 1232k 1389M 2255M|  72M    0
 | 12-04 20:38:51| 218M 1584k 1739M 1905M|  70M    0
 | 12-04 20:38:56| 218M 1784k 1943M 1701M|  41M  254k
 | 12-04 20:39:01| 218M 1860k 2019M 1625M|  15M    0
 | 12-04 20:39:06| 218M 1980k 2139M 1504M|  24M    0
 | 12-04 20:39:11| 218M 2184k 2341M 1302M|  40M    0
 | 12-04 20:39:16| 219M 2556k 2714M  928M|  75M    0
 | 12-04 20:39:21| 220M 2932k 3088M  552M|  75M    0
 | 12-04 20:39:26| 219M 3136k 3323M  318M|  47M    0 <<< finish here
 | 12-04 20:39:31| 220M 3136k 3323M  317M|   0     0
 | 12-04 20:39:36| 220M 3136k 3323M  318M|   0     0
 

But,don't forget.. if you use OS cache and occur a power off or crash on you OS ,you can loose more information and have more chance to go with a corrupt database.

Regards
Cesar


2013/4/12 R. Spitz <rspi...@googlemail.com>

colin mcdermott

unread,
Apr 13, 2013, 4:40:37 AM4/13/13
to inform...@iiug.org
  • IBM's Hypervisor has the lowest overhead and highest IO rates.
  • If you must use VMWare make sure you are using the vSphere hypervisor, it's IO is about double the older hypervisors from VMWare or running without one.
  • Yes RAID10.
  • As much as possible make sure you do not have competition for the physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and NEVER email stores.  The access patterns are just too different from database access patterns.
  • EXT2 with DIRECT_IO, yes.
  • Yes to separate dbspaces for high access data like rootdb, logical logs, physical logs, high access tables and indexes.  Informix assigns IO resources per dbspace and per chunk depending on specifics.  More dbspaces and more chunks can lead to more total IO throughput.  Keeping logical logs on different LUNs from physical logs and from high IO data dbspaces will take advantage of more physical resources like IO channels and spindles.


On Sat, Apr 13, 2013 at 2:00 AM, <informix-l...@iiug.org> wrote:
Send new Informix-list mailing list postings to
        inform...@iiug.org

If replying to a posting, the message number must appear on the
Subject line. This will maintain the discussion thread. The message
number is inside square brackets (e.g. [102]). You may edit the rest
of the Subject line.

To subscribe or unsubscribe log in to the Member Area of
http://www.iiug.org and click on "Subscribe to E-mail Lists".

You can reach the person managing the list at
        informix-...@iiug.org


Today's Topics:

   1. Best practices for IDS on virtual machine? (R. Spitz)
   2. Re: Best practices for IDS on virtual machine? (Nick Lello)
   3. Re: Best practices for IDS on virtual machine? (Art Kagel)
   4. Execution Plan (tuser...@gmail.com)
   5. Re: Execution Plan (Art Kagel)
   6. Re: Execution Plan (tuser...@gmail.com)
   7. Re: Execution Plan (Art Kagel)



---------- Forwarded message ----------
From: "R. Spitz" <rspi...@googlemail.com>
To: inform...@iiug.org
Cc: 
Date: Thu, 11 Apr 2013 13:05:29 -0700 (PDT)
Subject: Best practices for IDS on virtual machine?
Hi Informixers,

I'm system and database admin for our departmental IDS server. Currently we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit) on a 5 year old Xeon machine. Overall data and transaction volume is rather modest, total amount of data is about 3GB. Performance has never been an issue with the current hardware, but now its time to plan for a replacement server.

Company policy dictates that dedicated server hardware for departmental servers is to be avoided, so it will have to be a virtual machine under VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the latest IDS 12.10 version.

I've always used raw devices on partitions in RAID1 configuration, but what is the best way to go in a virtual environment? I'm aware that IO performance might  be rather poor compared to physical hardware, but I have no choice. The ESX admins promised me storage space on a RAID10 LUN, how should I configure this?

Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if available in IDS Workgroup Edition) still the recommended file system type? I'm unsure about the number of chunks and dbspaces in the virtual Environment: Does it make sense to use separate chunks and dbspaces for rootdbs, data and logical logs, or should I put everything into one large rootdbs since I cannot influence the physical storage anyway?

I'd appreciate any pointers to "best practices" in the virtual environment.

Regards, Richard



---------- Forwarded message ----------
From: Nick Lello <nick....@rentrakmail.com>
To: "R. Spitz" <rspi...@googlemail.com>
Cc: inform...@iiug.org
Date: Thu, 11 Apr 2013 21:14:11 +0100
Subject: Re: Best practices for IDS on virtual machine?
ext2 with DIRECT_IO worked well for me under VMWare ESXi .... however get as much memory as possible assigned to the  VM and make your buffers larger than normal.

Expect slower than usual checkpoints and adjust to cater.

Try to persuade the VMWare admins to give your VM higher than normal disk priority.. every little helps.

On 11 April 2013 21:05, R. Spitz <rspi...@googlemail.com> wrote:
Hi Informixers,

I'm system and database admin for our departmental IDS server. Currently we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit) on a 5 year old Xeon machine. Overall data and transaction volume is rather modest, total amount of data is about 3GB. Performance has never been an issue with the current hardware, but now its time to plan for a replacement server.

Company policy dictates that dedicated server hardware for departmental servers is to be avoided, so it will have to be a virtual machine under VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the latest IDS 12.10 version.

I've always used raw devices on partitions in RAID1 configuration, but what is the best way to go in a virtual environment? I'm aware that IO performance might  be rather poor compared to physical hardware, but I have no choice. The ESX admins promised me storage space on a RAID10 LUN, how should I configure this?

Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if available in IDS Workgroup Edition) still the recommended file system type? I'm unsure about the number of chunks and dbspaces in the virtual Environment: Does it make sense to use separate chunks and dbspaces for rootdbs, data and logical logs, or should I put everything into one large rootdbs since I cannot influence the physical storage anyway?

I'd appreciate any pointers to "best practices" in the virtual environment.

Regards, Richard
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
--
 
 
Nick Lello | Web Architect
o +44 (0) 843.330.9374503.284.7581 ext. 418 | m +44 (0) 750.890.3456 | Skype: nicholas.lello
Email: nick.lello at rentrak.com
RENTRAK | www.rentrak.com | NASDAQ: RENT

Regards, Richard
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list



---------- Forwarded message ----------
From: tuser...@gmail.com
To: inform...@iiug.org
Cc: 
Date: Fri, 12 Apr 2013 06:59:46 -0700 (PDT)
Subject: Execution Plan
Hi All,

Given a statement that is prepared this way

select
 p.*
from
 person p
 join address a on a.person_id = p.id
where
 (0 = $1 or p.name = $2)
 and (0 = $3 or a.city = $4)

where  $1-$4 are host variables and there is a non-unique index on both p.name and c.city -  what execution plan would be used?

Would conditional clause selection like this always do a table scan on both tables, because it doesn't know prior to execution which indices to use?

Or does table statistics aid in avoiding table scan and choose the indices that are present?

Will the plan creation be deferred until the first execution with values for the host variables?

Thoughts on this article?

 http://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic


>From the Performance Guide:
Page: 13-32
When a statement contains host variables, the database server replaces the host variables with placeholders when it stores the statement in the SQL statement cache. Therefore, the statement is optimized without the database server having access to the values of the host variables. In some cases, if the database server had access to the values of the host variables, the statement might be optimized differently, usually because the distributions stored for a column inform the optimizer exactly how many rows pass the filter.

Thanks,
Tom



---------- Forwarded message ----------
From: Art Kagel <art....@gmail.com>
To: tuser...@gmail.com
Cc: inform...@iiug.org
Date: Fri, 12 Apr 2013 10:16:54 -0400
Subject: Re: Execution Plan
Tom:

I don't have any easy answer for you, but I do have a few questions:
  1. What version of Informix are you using?  Different versions will optimize this join differently.
  2. What host language are you using that uses $<num> as a host var that is not replaced until OPEN time?
  3. Move the filters into the ON clause of the JOIN otherwise the engine will have to perform a sequential scan of one or both tables, write the joined tuples into a temp table and perform the filtered fetches from the temp table even if you supplied constant values instead of replaceable parameters.  ANSO SQL rules require that filters in the WHERE clause be applied post-join requiring the temp table.
  4. Why not just run the thing with SET EXPLAIN enabled or the EXPLAIN optimizer directive in it (with or without the AVOID_EXECUTE option) and look at the actual query plan rather than have us speculate?
  5. Also try this as a UNION (OK three UNIONed SELECTs) instead of the OR conditions.
  6. Also try it with the address table in a non-correlated subquery (both with the OR and with a UNION instead) like:
    AND p.id in (select a.id from address as a where a.city = $4)

There are many ways to write a SELECT statement and get the same results.  If you have not tested them all, you may not be using the best one.


Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.


On Fri, Apr 12, 2013 at 9:59 AM, <tuser...@gmail.com> wrote:
Hi All,

Given a statement that is prepared this way

select
 p.*
from
 person p
 join address a on a.person_id = p.id
where
 (0 = $1 or p.name = $2)
 and (0 = $3 or a.city = $4)

where  $1-$4 are host variables and there is a non-unique index on both p.name and c.city -  what execution plan would be used?

Would conditional clause selection like this always do a table scan on both tables, because it doesn't know prior to execution which indices to use?

Or does table statistics aid in avoiding table scan and choose the indices that are present?

Will the plan creation be deferred until the first execution with values for the host variables?

Thoughts on this article?

 http://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic


>From the Performance Guide:
Page: 13-32
When a statement contains host variables, the database server replaces the host variables with placeholders when it stores the statement in the SQL statement cache. Therefore, the statement is optimized without the database server having access to the values of the host variables. In some cases, if the database server had access to the values of the host variables, the statement might be optimized differently, usually because the distributions stored for a column inform the optimizer exactly how many rows pass the filter.

Thanks,
Tom

_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list



---------- Forwarded message ----------
From: tuser...@gmail.com
To: inform...@iiug.org
Cc: 
Date: Fri, 12 Apr 2013 08:04:06 -0700 (PDT)
Subject: Re: Execution Plan
Thanks much,  Art

Will answer more of the questions in a bit - I am working through some developers in this posting.

We are on version 11.7.FC5


as an added FYI from one of the team :

   I do not know if this is worth adding or not.

   A common coding practice is to make use of short circuiting.  If you know something will be false, you can use that to avoid executing the rest of the line - goDoAnExpensiveAndLongThing() :
if( 3!=3 && goDoAnExpensiveAndLongThing()) {
   <skipped>

   Even changing that into:
if( 4==4 || tryThisTheHardWay()) {
  <executed and avoided the hard way>

   That same approach would not have the same impact here:
where
 (0 = $1 or p.name = $2)

   Since SQL statements are not executed in a sequential manner, I am not sure if it is worth highlighting that difference.  This might be absolutely obvious to everyone already.



---------- Forwarded message ----------
From: Art Kagel <art....@gmail.com>
To: tuser...@gmail.com
Cc: inform...@iiug.org
Date: Fri, 12 Apr 2013 11:31:30 -0400
Subject: Re: Execution Plan
Wait, are those filter "zero equal $1" and "zero equal $3"?  I thought that you just had a column named letter O!  OK, those will evaluate to false if $1 and $3 are not also zero.  That's not a short circuit, especially since they are OR'd with the other filters.  They will be evaluated every time. 

Informix 11.70 supports multi-index scans so it's possible that multiple indexes could be used to satisfy the multiple filters, but we'd need to know the indexing structure.  Again, the best thing to do is to try all of the various versions of the select under SET EXPLAIN, time them and see which runs fastest and has the most reasonable query plan (since the  specific values plugged into replaceable parameters can change the query plan you always want a version of the select that will produce a reasonable plan for any given value).  Make sure the check the query plans to make sure that your data distributions are up-to-date.

Note that prepared queries with replaceable parameters go through a final optimization step after the parameter values are known at OPEN time (or the first FETCH if the open is deferred).


Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.


On Fri, Apr 12, 2013 at 11:04 AM, <tuser...@gmail.com> wrote:
Thanks much,  Art

Will answer more of the questions in a bit - I am working through some developers in this posting.

We are on version 11.7.FC5


as an added FYI from one of the team :

   I do not know if this is worth adding or not.

   A common coding practice is to make use of short circuiting.  If you know something will be false, you can use that to avoid executing the rest of the line - goDoAnExpensiveAndLongThing() :
if( 3!=3 && goDoAnExpensiveAndLongThing()) {
   <skipped>

   Even changing that into:
if( 4==4 || tryThisTheHardWay()) {
  <executed and avoided the hard way>

   That same approach would not have the same impact here:
where
 (0 = $1 or p.name = $2)

   Since SQL statements are not executed in a sequential manner, I am not sure if it is worth highlighting that difference.  This might be absolutely obvious to everyone already.

_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list





--
Peace

Colin McDermott

colin mcdermott

unread,
Apr 13, 2013, 4:45:13 AM4/13/13
to inform...@iiug.org
  • IBM's Hypervisor has the lowest overhead and highest IO rates.
  • If you must use VMWare make sure you are using the vSphere hypervisor, it's IO is about double the older hypervisors from VMWare or running without one.
  • Yes RAID10.
  • As much as possible make sure you do not have competition for the physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and NEVER email stores.  The access patterns are just too different from database access patterns.
  • EXT2 with DIRECT_IO, yes.
  • Yes to separate dbspaces for high access data like rootdb, logical logs, physical logs, high access tables and indexes.  Informix assigns IO resources per dbspace and per chunk depending on specifics.  More dbspaces and more chunks can lead to more total IO throughput.  Keeping logical logs on different LUNs from physical logs and from high IO data dbspaces will take advantage of more physical resources like IO channels and spindles.
Apologies for the previous mispost, Moderators please delete.

I notice that you recommend ext2 with DIRECT_IO. Is this to avoid the journaling overhead of ext3? Is there a substantial performance boost in switching to ext2? Is JFS or Reiserfs better for this. I take it that the Plog and Llog should remove the need for the journal in ext3.

Art Kagel

unread,
Apr 13, 2013, 9:22:05 PM4/13/13
to Cesar Inacio Martins, informix-list cdi, R. Spitz
Cesar, your timings for read-reread using the OS cache are credible and I will say something about that at the end.  However, note that whether you use RAW or COOKED without DIRECT_IO or COOKED with DIRECT_IO your data is safe!  When you disable DIRECT_IO and are using COOKED chunks, Informix opens all chunk files, except temp dbspaces, with the O_SYNC flag enabled which causes any writes to those files to be immediately flushed to disk before acknowledging the write as complete.  Informix in turn does not complete any transaction on an unbuffered log database unless the logical log buffer write containing the COMMIT record has been acknowledged by the OS that means that your data is safe!  Now if you are using BUFFERED logging, then there is a small risk that transactions accumulating in the log buffers may not be written to disk until the buffer fills.  That opens a window of risk, but the risk is the same for RAW and COOKED with DIRECT_IO if you use buffered logging.

Now to your timings.   Yes, reading and rereading the same data from the buffer cache will be faster with COOKED files than with RAW or DIRECT_IO which bypass the buffer cache.  However, you did not present any timings for writing!  That is where RAW and DIRECT_IO improve your performance.  With COOKED files and no DIRECT_IO Informix is writing from its buffers to the OS buffers and the O_SYNC flag is forcing the OS to wake a sync thread to write the newly dirtied cache pages to disk.  Informix has to wait for the extra memory copy then for the OS IO scheduler to wake an IO thread and perform the write.  With RAW devices and with DIRECT_IO enabled for COOKED files, the copy from Informix memory to OS memory is skipped and an Informix AIO VP performs the write directly or a KAIO thread schedules the asynchronous IO in the OS kernel's asynchronous IO service and returns to work without having to wait.  This is at least 25-20% faster than an O_SYNC write which is completely synchronous and so must be performed by Informix's AIO VPs.

Test it!


Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.




2013/4/12 R. Spitz <rspi...@googlemail.com>
Thanks for the quick replies. After a quick look at what the IDS 12.10 admin guide says, I'm thoroughly confused:
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list

Art Kagel

unread,
Apr 13, 2013, 9:23:44 PM4/13/13
to colin mcdermott, inform...@iiug.org
Gee, that response looks awfully familliar!?!?!


Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.


On Sat, Apr 13, 2013 at 4:40 AM, colin mcdermott <colinjame...@gmail.com> wrote:
  • IBM's Hypervisor has the lowest overhead and highest IO rates.
  • If you must use VMWare make sure you are using the vSphere hypervisor, it's IO is about double the older hypervisors from VMWare or running without one.
  • Yes RAID10.
  • As much as possible make sure you do not have competition for the physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and NEVER email stores.  The access patterns are just too different from database access patterns.
  • EXT2 with DIRECT_IO, yes.
  • Yes to separate dbspaces for high access data like rootdb, logical logs, physical logs, high access tables and indexes.  Informix assigns IO resources per dbspace and per chunk depending on specifics.  More dbspaces and more chunks can lead to more total IO throughput.  Keeping logical logs on different LUNs from physical logs and from high IO data dbspaces will take advantage of more physical resources like IO channels and spindles.
Regards, Richard
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
--
 
 
Nick Lello | Web Architect
o +44 (0) 843.330.9374503.284.7581 ext. 418 | m +44 (0) 750.890.3456 | Skype: nicholas.lello
Email: nick.lello at rentrak.com
RENTRAK | www.rentrak.com | NASDAQ: RENT



---------- Forwarded message ----------
From: Art Kagel <art....@gmail.com>
To: "R. Spitz" <rspi...@googlemail.com>
Cc: inform...@iiug.org
Date: Thu, 11 Apr 2013 16:20:50 -0400
Subject: Re: Best practices for IDS on virtual machine?
VM Best practices for databases:

  • IBM's Hypervisor has the lowest overhead and highest IO rates.
  • If you must use VMWare make sure you are using the vSphere hypervisor, it's IO is about double the older hypervisors from VMWare or running without one.
  • Yes RAID10.
  • As much as possible make sure you do not have competition for the physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and NEVER email stores.  The access patterns are just too different from database access patterns.
  • EXT2 with DIRECT_IO, yes.
  • Yes to separate dbspaces for high access data like rootdb, logical logs, physical logs, high access tables and indexes.  Informix assigns IO resources per dbspace and per chunk depending on specifics.  More dbspaces and more chunks can lead to more total IO throughput.  Keeping logical logs on different LUNs from physical logs and from high IO data dbspaces will take advantage of more physical resources like IO channels and spindles.

Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.


On Thu, Apr 11, 2013 at 4:05 PM, R. Spitz <rspi...@googlemail.com> wrote:
Hi Informixers,

I'm system and database admin for our departmental IDS server. Currently we are running IDS 11.50 on Suse Linux Enterprise Server (SLES) 10 (32bit) on a 5 year old Xeon machine. Overall data and transaction volume is rather modest, total amount of data is about 3GB. Performance has never been an issue with the current hardware, but now its time to plan for a replacement server.

Company policy dictates that dedicated server hardware for departmental servers is to be avoided, so it will have to be a virtual machine under VMWare ESX. OS will be SLES 11 64bit, and we will probably move to the latest IDS 12.10 version.

I've always used raw devices on partitions in RAID1 configuration, but what is the best way to go in a virtual environment? I'm aware that IO performance might  be rather poor compared to physical hardware, but I have no choice. The ESX admins promised me storage space on a RAID10 LUN, how should I configure this?

Raw devices seem to be a thing of the past. Is ext2 with DIRECT_IO (if available in IDS Workgroup Edition) still the recommended file system type? I'm unsure about the number of chunks and dbspaces in the virtual Environment: Does it make sense to use separate chunks and dbspaces for rootdbs, data and logical logs, or should I put everything into one large rootdbs since I cannot influence the physical storage anyway?

I'd appreciate any pointers to "best practices" in the virtual environment.

Regards, Richard
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.


On Fri, Apr 12, 2013 at 9:59 AM, <tuser...@gmail.com> wrote:
Hi All,

Given a statement that is prepared this way

select
 p.*
from
 person p
 join address a on a.person_id = p.id
where
 (0 = $1 or p.name = $2)
 and (0 = $3 or a.city = $4)

where  $1-$4 are host variables and there is a non-unique index on both p.name and c.city -  what execution plan would be used?

Would conditional clause selection like this always do a table scan on both tables, because it doesn't know prior to execution which indices to use?

Or does table statistics aid in avoiding table scan and choose the indices that are present?

Will the plan creation be deferred until the first execution with values for the host variables?

Thoughts on this article?

 http://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic


>From the Performance Guide:
Page: 13-32
When a statement contains host variables, the database server replaces the host variables with placeholders when it stores the statement in the SQL statement cache. Therefore, the statement is optimized without the database server having access to the values of the host variables. In some cases, if the database server had access to the values of the host variables, the statement might be optimized differently, usually because the distributions stored for a column inform the optimizer exactly how many rows pass the filter.

Thanks,
Tom
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.


On Fri, Apr 12, 2013 at 11:04 AM, <tuser...@gmail.com> wrote:
Thanks much,  Art

Will answer more of the questions in a bit - I am working through some developers in this posting.

We are on version 11.7.FC5


as an added FYI from one of the team :

   I do not know if this is worth adding or not.

   A common coding practice is to make use of short circuiting.  If you know something will be false, you can use that to avoid executing the rest of the line - goDoAnExpensiveAndLongThing() :
if( 3!=3 && goDoAnExpensiveAndLongThing()) {
   <skipped>

   Even changing that into:
if( 4==4 || tryThisTheHardWay()) {
  <executed and avoided the hard way>

   That same approach would not have the same impact here:
where
 (0 = $1 or p.name = $2)

   Since SQL statements are not executed in a sequential manner, I am not sure if it is worth highlighting that difference.  This might be absolutely obvious to everyone already.
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list





--
Peace

Colin McDermott

Art Kagel

unread,
Apr 13, 2013, 9:29:12 PM4/13/13
to colin mcdermott, inform...@iiug.org
Sorry, Colin, I missed your question at the end.  Best not to bottom post!

Yes, in part recommeding EXT2 is to avoid journaling overhead.  The rest is the copy-on-write feature which physically fragments your chunks over time.  As to the others, see my presentation from last year's IIUG Conference, "Doing Storage Better" which includes a table of timings I ran (available in the members' pages on the IIUG web site - www.iiug.org).  Also see a post to the forums that Eric did last month where he posted his own timings which agree. 

Yes, Informix logging eliminates the need for FS journaling.  It's covered with over 20 years of experience proving that it works, why duplicate it with an inferior version?


Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.


On Sat, Apr 13, 2013 at 4:45 AM, colin mcdermott <colinjame...@gmail.com> wrote:
  • IBM's Hypervisor has the lowest overhead and highest IO rates.
  • If you must use VMWare make sure you are using the vSphere hypervisor, it's IO is about double the older hypervisors from VMWare or running without one.
  • Yes RAID10.
  • As much as possible make sure you do not have competition for the physical disks underneath your LUNS, ESPECIALLY not Windows Filesystems and NEVER email stores.  The access patterns are just too different from database access patterns.
  • EXT2 with DIRECT_IO, yes.
  • Yes to separate dbspaces for high access data like rootdb, logical logs, physical logs, high access tables and indexes.  Informix assigns IO resources per dbspace and per chunk depending on specifics.  More dbspaces and more chunks can lead to more total IO throughput.  Keeping logical logs on different LUNs from physical logs and from high IO data dbspaces will take advantage of more physical resources like IO channels and spindles.
Apologies for the previous mispost, Moderators please delete.

I notice that you recommend ext2 with DIRECT_IO. Is this to avoid the journaling overhead of ext3? Is there a substantial performance boost in switching to ext2? Is JFS or Reiserfs better for this. I take it that the Plog and Llog should remove the need for the journal in ext3.

Cesar Inacio Martins

unread,
Apr 13, 2013, 10:05:20 PM4/13/13
to Art Kagel, informix-list cdi, R. Spitz
Hi Art , 

Hmm... I would like to disagree with you... but before that I will do the write test.
What I remember from tests did on the past (studying about this behave) I conclude the write I/O throughput bounds will be "defined" from the Linux kernel dirty buffers flush/sync parameters and how much memory do you have available to hold this dirty buffer (against the amount of data written from the engine during a checkpoint for example).
Like I said , I *would like* but I do not dare disagree THE Art Kagel :)
I will test and post here my test ...  

Regards
Cesar



2013/4/13 Art Kagel <art....@gmail.com>
Cesar, your timings for read-reread using the OS cache are credible and I will say something about that at the end.  However, note that whether you use RAW or COOKED without DIRECT_IO or COOKED with DIRECT_IO your data is safe!  When you disable DIRECT_IO and are using COOKED chunks, Informix opens all chunk files, except temp dbspaces, with the O_SYNC flag enabled which causes any writes to those files to be immediately flushed to disk before acknowledging the write as complete.  Informix in turn does not complete any transaction on an unbuffered log database unless the logical log buffer write containing the COMMIT record has been acknowledged by the OS that means that your data is safe!  Now if you are using BUFFERED logging, then there is a small risk that transactions accumulating in the log buffers may not be written to disk until the buffer fills.  That opens a window of risk, but the risk is the same for RAW and COOKED with DIRECT_IO if you use buffered logging.

Now to your timings.   Yes, reading and rereading the same data from the buffer cache will be faster with COOKED files than with RAW or DIRECT_IO which bypass the buffer cache.  However, you did not present any timings for writing!  That is where RAW and DIRECT_IO improve your performance.  With COOKED files and no DIRECT_IO Informix is writing from its buffers to the OS buffers and the O_SYNC flag is forcing the OS to wake a sync thread to write the newly dirtied cache pages to disk.  Informix has to wait for the extra memory copy then for the OS IO scheduler to wake an IO thread and perform the write.  With RAW devices and with DIRECT_IO enabled for COOKED files, the copy from Informix memory to OS memory is skipped and an Informix AIO VP performs the write directly or a KAIO thread schedules the asynchronous IO in the OS kernel's asynchronous IO service and returns to work without having to wait.  This is at least 25-20% faster than an O_SYNC write which is completely synchronous and so must be performed by Informix's AIO VPs.

Test it!
Art

Art S. Kagel
Advanced DataTools (www.advancedatatools.com)
Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Advanced DataTools, the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.


2013/4/12 R. Spitz <rspi...@googlemail.com>
Thanks for the quick replies. After a quick look at what the IDS 12.10 admin guide says, I'm thoroughly confused:
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list

Fernando Nunes

unread,
Apr 14, 2013, 10:56:34 AM4/14/13
to R. Spitz, IIUG Informix List
Best performance possible = RAW.
But RAW are not as easy to use as cooked files, specialy if your environemtn is segregated, meaning, you - the DBA - don't have access to root.

But If I'm not confusing posts, you mentioned 3GB of data.... Unlesee you use a very small ammount of memory, there's no way you can get bad performance with that :)
Regards




Regards, Richard

_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list



--
Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...
0 new messages