Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ADO.NET Performance with Large Record set

0 views
Skip to first unread message

John Thuma

unread,
Jul 25, 2002, 6:51:47 PM7/25/02
to
We are about to begin our second round of benchmarks
testing the throughput of C#\ADO.NET when communicating to
an Oracle database. Our previous test results, which I
have included for you, left us a little disappointed in
the performance numbers. I think a great deal of it is
centered around the batch update process in C# so we wrote
our own called "String Builder" that builds a dynamic
string that is sent through in one large command at the
end.

This is going to be a significant decision for us on
whether we should run our integration on the Unix machines
or on the Windows N-Tier. Most of our integration
scenarios are going to be the bulk read\write (Could be
multi-threaded, could be be read and wrote asynchronously)
of 10,000,000 records in about a 30 minute window.

I am most concerned with what moving 10,000,000 rows off
of one Unix machine to the N-Tier and then to another Unix
machine will do as far as network performance, or in the
need to hold the 10,000,000 records in state while they
are processed in the N-Tier. Couple this with the not so
impressive performance of the .NET Oracle managed provider
and you can see our concern.

If possible can you make any recommendations that we
should look towards to improve the performance. Whether
it is tuning the Windows installation, a different
architecture, or a new code strategy we are open to
suggestions.

Please review and give any advice you might think
relevant. I am looking for ADO.NET ACTION QUERY BATCH
PERFORMANCE RUN TIME IMPROVEMENTS.

My first assumption would be to break up the single batch
of 1.5 million records - 5 million records into multiple
50,000 record batches and process. ADO.NET seems to lose
efficiency after 100,000 records. Again, any advice would
be greatly appreciated.

I have also discussed with my customer the use of DTS like
procedures in oracle and doing the batch queries inside of
Stored Procedures, heterogeneous attached tables, and
possibly using Yukon. None of these are an option at the
moment.

ISSUE:

Some concern has arisen regarding ADO.NET's relative
performance to existing code in an Oracle batch update
environment. Therefore, the batch update data access
performance of ADO.NET needs to be ascertained. Knowledge
of the performance characteristics of ADO.NET will shape
important aspects of current and future enterprise
application designs.

SCOPE:

There are many variables that play a role in a data access
operation's overall performance. The size of the table
(i.e., number of rows and columns), indexes available for
the table, sizes of the data being retrieved, use of
appropriate selection criteria and statement
optimizations, proximity of the data to the requesting
program, and communication/transport protocol used are
some of those variables. The approach chosen for achieving
a data access task can also have a large impact on
performance and scalability. Most of these factors are
beyond the scope of this evaluation. This study focuses on:

· Simple Java and ADO.NET data access for an
Oracle table with ~1,500,000 rows.

· Simple Java and ADO.NET data access for a MSSQL
table with ~5,000,000 rows.

For both of these scenarios, we conduct a series of tests
where an increasing number of rows from a single table are
operated on. Each test uses a single thread to select data
by a primary key, modify several non-indexed fields in
each retrieved row, and then send the modified data back
to the database in a single batch update operation. The
batch update operation occurs within a single database
transaction. Duration is recorded for each data access
related execution step.

The database schema and operations performed against it
are identical for both Oracle and MSSQL

ASSUMPTIONS:

· No external machine load impact on performance.

· No statement optimizations are specified (such
as optimizer hints).

· The algorithm being tested consists of:

o Opening a connection to the data source

o Querying all the pertinent data

o Iterating through the results to make changes to
data in several columns

o Updating the changed data back to the data source
in a single transaction

· Response (time) is based on SQL executions only.

· Each test must be exception free for a
meaningful result.

.NET Implementation

The .NET test application was written in C# to initial
public release of the Microsoft .NET Common Language
Runtime (CLR), version 1.0.3705.0. Several different data
access technologies were tested for access to both data
sources. For Oracle, the ADO.NET OLE DB Managed Provider
and ODBC Managed Provider used OLE DB and ODBC drivers
respectively included in the Oracle 8.1.7.0.0
distribution. For MSSQL, the SQL Server Managed Provider
was used in addition to the OLE DB and ODBC Managed
Providers with drivers supplied by Microsoft in MDAC 2.7.

ADO.NET presents several objects, and consequently,
several approaches to accessing data. The option that most
closely matches the directive of this study is to use the
DataSet object, which can be thought of as a full
featured, in-memory database. Microsoft advocates the use
of the lighter-weight DataReader object instead when a
forward-only result set (i.e., non-scrollable cursor) is
all that is required. Results are reported for the
DataReader approach for completeness but are not directly
comparable with the other results. Numbers are also
reported for a string-building approach that appends each
UPDATE statement to a string that subsequently will be
submitted to the database as a single "batch". The
inclusion of this approach is intended to underscore the
fact that the DataSet does not currently support true
physical batch update operations but rather a logical or
convenience batch update instead.

HARDWARE:

Server hardware consists of a Dell 2 x 1GHz Pentium III
Xeon server with 1GB RAM and a hardware RAID 5 array
consisting of five 10,000-RPM UltraSCSI disks. The tested
operating system is Windows .NET Server Enterprise beta 3.
Database software consisted of Microsoft SQL Server 2000
Enterprise Edition SP2 and Oracle 8i Enterprise Edition
8.1.7.0.0.

The Java and .NET testing applications executed directly
on the database server to better isolate the test
environment from other influences such as network traffic.
Only a single instance of the testing application was
executing with data access occurring on only a single
thread within the testing application.

FINDINGS:

When accessing the Oracle database, the Java solution
outperforms the C# OLEDB DataSet solution at 1,000 rows by
a factor of three. At 100,000 rows, the gap widens to a
factor of six. Note that the C# OLEDB StringBuilder
implementation is less than 20% behind Java at 1,000 rows.
This might be important if smaller batch sizes are
allowable in other scenarios.

I have informed the partner of the following and they are
testing them and will get back to me end of today.
Microsoft .NET data provider for Oracle

Oracles Beta version of their .NET provider


0 new messages