DataRelation concept problem.

Grant

unread,

Sep 2, 2001, 5:46:40 PM9/2/01

to

I understand, and can set up a data relation in a dataset,
using either code or the xml designer, but there is one
simple concept that has got me stumped. I will use the
example given in the documentation as follows:

[C#]
DataRelation custOrderRel = custDS.Relations.Add
("CustOrders",
custDS.Tables["Customers"].Columns
["CustomerID"],
custDS.Tables["Orders"].Columns
["CustomerID"]);

[VB]
Dim custOrderRel As DataRelation = custDS.Relations.Add
("CustOrders", _
custDS.Tables("Customers").Columns
("CustomerID"), _
custDS.Tables("Orders").Columns
("CustomerID"))

What I want to be able to do is to have subset of (say 10)
customers that meet a condition (eg CustomerBad = True).
Say for example these 10 customers each have 2 orders. I
want to be able to have a relationship that links all the
orders (total 20) for these 10 customers.

In ADO, I would use a JOIN.
In ADO.NET, I have tried to do this by:

1. Fill the dataset with a datadaper that has a
selectcommand of "Select * from customers where
custbad=TRUE"
2. Fill the dataset with a dataAdapter that has a
selectcommand of "Select * from Orders".

This is not ideal because:

1. I will get exceptions because there are orders
that exist in my datatable that don't have a corresponding
customerid in my customers table.
2. I just read the entire Orders table, when all I
wanted was 20 records.

How do I use relations in ADO.NET to get just the 20
orders that are linked to the bad customers?

I know this is a very common situation (in fact would be
the 'norm') so there must be a very simple answer, but I
can't see it! Can someone enlighten me?

David Sceppa

unread,

Sep 5, 2001, 8:30:53 PM9/5/01

to

Grant,

Great question. And great timing. I'm in the middle of writing a
chapter on working with relational data. You're right, this is a
rather common situation. There are a couple fairly straightforward
solutions.

Joins:
You can still use joins in ADO .NET. DataTables don't have to
map back to tables in your database.

The CommandBuilder object and the Visual Studio .NET Data Adapter
Configuration Wizard will be unable to generate updating logic if you
use them with a query that retrieves data from multiple tables.
Because of their structure, DataAdapters are designed to update single
tables. But that doesn't mean that you can't use a DataAdapter to
store the results of a join query in a DataTable.

So, if you're working with read-only data, your best solution may
be to work with joins.

DataRelations and constraints:
I prefer working with DataRelations and separate DataTables, even
if I'm working with read-only data. By default, creating a
DataRelation creates a foreign key constraint on the child DataTable.
This is why you're getting an exception when you fetch child rows
that do not have corresponding parent rows within your DataSet.

There are a couple simple solutions to this problem.

You could turn off the constraint associated with the DataRelation
either when you create the DataRelation. It's an option via code and
via the schema designer. You can also set the EnforceConstraints
property of the DataSet to False before calling the Fill method on your
DataAdapter(s).

Modifying your child query:
A more palatable option is to make sure that you only fetch the
desired child rows. In your case, you could use a query like:

SELECT O.OrderID, O.CustomerID, ... FROM Orders O, Customers C
WHERE O.CustomerID = C.CustomerID AND C.Bad = 1

Neither the Data Adapter Configuration Wizard nor the CommandBuilder
will complain about generating updating logic against this query since
it only retrieves columns from a single table. Another major benefit
is that you're fetching less data. You could also use other query
constructs such as sub-queries.

I hope this information proves helpful.

David Sceppa
Microsoft
The opinions expressed are not necessarily those of Microsoft

Sergio Sette

unread,

Sep 6, 2001, 2:52:40 AM9/6/01

to

>
>
>DataRelations and constraints:

>
> There are a couple simple solutions to this problem.
>
> You could turn off the constraint associated with the
DataRelation
>either when you create the DataRelation.

This cannot be a real solution since all the orders
records are fetched !!

>
>Modifying your child query:
> A more palatable option is to make sure that you only
fetch the
>desired child rows. In your case, you could use a query
like:
>
>SELECT O.OrderID, O.CustomerID, ... FROM Orders O,
Customers C
> WHERE O.CustomerID = C.CustomerID AND C.Bad = 1
>

This is a possible solution but in fact what you are doing
is to execute the query on the customers table again (and
this is a simple case).
I'm surprised that ado.NET don't have a simple mechanism
to handle master/details relations.
With other tools that uses a ado.NET similar concept
(disconnected work) the master/detail relation
automatically fetches only the detail records related to
the master record(s) and in some situations is possible to
fetch the details only if needed (fetch on demand).
I think this can be done with ado.NET too (with parametric
queries) but this requires a lot of work. I hope that
Microsoft will includes in the final release an updated
relation object...

Regards

Sergio Sette

Grant

unread,

Sep 6, 2001, 5:44:59 PM9/6/01

to

Sergio,

I couldn't agree more. As I see it, the way the DataRelation is currently
implemented means that unless you want to right inefficient code, you won't
use it. The Datarelation collection is only useful when you want to read
the entire child table, and because you would never want to do this, then
you wouldn't use a Datarelation.

I was very surprised also, and assumed MS *must* have thought of this, that
is why I spent many hours looking for the construct that would do this. But
I never found it.

David, don't you agree that the DataRelation collection in it's current form
does not function in a way to be useful in the real world?

It's a shame because using a Datarelation with Typed Datasets lends to some
very elegant code, I would love for someone to prove me wrong, and show me
how to use DataRelations efficiently.

Grant

"Sergio Sette" <sse...@tin.it> wrote in message
news:175f501c136a0$85bb5ac0$9ee62ecf@tkmsftngxa05...

Sergio Sette

unread,

Sep 7, 2001, 4:29:59 AM9/7/01

to

Hi Grant,

>
>I couldn't agree more. As I see it, the way the
DataRelation is currently
>implemented means that unless you want to right
inefficient code, you won't
>use it.

I think the dataset is so implemented because it must be
usable in a stateless - multi tier architecture. The
problem is that there are many situation where this
architecture is not the right one or at least not the most
usefull.
The Dataset cannot be used conveniently used in a "normal"
stateful architecture and this is (IMHO) a big design
limit. But this seems to be the "new age"...

Regards

Sergio Sette

Grant

unread,

Sep 8, 2001, 3:41:26 AM9/8/01

to

Sergio,

I see what you're getting at.

I still believe the Datarelation class (as it stands), is not useful even in
a 'stateless - multi tier architecture' you described.

But, I think it *could be made to be useful* while still remaining true to
the stateless disconnected model.

For example if I had available a Dataset construct such as:

<datadapterwithjoin>.FillUsingJoin(<dataset>,<table1>,<table2>,<tablen>,...)

I know there are other issues here, but if I could get the required (and
only the required) values and relationships into my dataset efficiently and
using elegant code, then I would find a lot of use for the Datarelation
classes.

Grant

__________

"Sergio Sette" <sse...@tin.it> wrote in message

news:1d1c901c13777$48144b50$b1e62ecf@tkmsftngxa04...

David Sceppa

unread,

Sep 10, 2001, 6:30:38 PM9/10/01

to

There are a few points I'd like to address in this post so I've
divided it into three sections. The first section focuses on
DataRelations and functionality. The second section focuses on
different options between joins and separate queries. The last
section focuses on some tests I ran to compare performance of the
different query options. You might be surprised at the results.

DataRelations and functionality:

The hierarchical functionality in ADO.NET is more powerful
and flexible than previous Microsoft data access object models
(namely, using MSDataShape to create hierarchical ADO Recordsets).
I'm not familiar with other object models, but I'm always interested
to hear how other object models handle hierarchical data.

Many developers use hierarchical Recordsets in real world ADO
applications. The DataRelation object makes navigating through
hierarchical data extremely simple. You can even use DataRelations
to relate data from different data sources. We've had ADO
developers ask how they could use the MSDataShape provider to relate
queries against different data sources.

Developers have had problems modifying the results of a join
query and trying to have object models like ADO or DAO interpret
those changes and submit them to back-end databases, often with
very little success. Using separate DataTables in ADO.NET lets
you handle these problems with ease.

Need to add, modify and delete rows at multiple levels of a
hierarchy and submit those changes to your database? You'll need
to submit new rows in the hierarchy in a top-down manner, but submit
deleted rows bottom-up. ADO.NET gives you the control required to
handle such situations.

Do you need to change data for a parent row? Simply change the
row in the parent DataTable. Now all child rows will see the
change. Submitting the change to the database is also simple.

Need to change which parent a child row is related to? Just
change the value(s) of the column(s) that you use to define the
relationship.

Handling these scenarios by modifying the results of a join
query can be quite a challenge, if it's even possible.

After working with this functionality, I'd much rather use
DataRelations and separate DataTables to handle relational data
so long as there isn't a significant performance penalty to get
my data into these structures.

Joins and/or breaking up your queries:

The suggestions of having ADO.NET automatically generate more
intelligent logic to fetch relational data are definitely welcome.
Grant's suggestion of a FillWithJoin method is an interesting one.

If you have to use, or really want to use a single join query,
you can walk through the resultset returned by a join query and
split the data into logical DataTables. However, writing such
code can be tedious.

If you are able to use separate queries, there are a number
of possible options. You can use the method I described in my
previous post where you include the criteria via your preferred
method:

SELECT ChildTable.Field1, ChildTable.Field2, ...
FROM ChildTable, ParentTable
WHERE ChildTable.ParentID = ParentTable.ID AND
ParentTable.SearchField = SearchValue

or a subquery:

SELECT Field1, Field2, ... FROM ChildTable WHERE ParentID IN
(SELECT ID FROM ParentTable WHERE SearchField = SearchValue)

You can use parameterized child queries that you execute at run-
time based on each parent row you retrieve:

SELECT Field1, Field2, ... FROM ChildTable WHERE ParentID = ?

You can concatenate all of the parent ID's together and issue a
single query using an IN clause:

SELECT Field1, Field2, ... FROM ChildTable WHERE ParentID IN
(Value1, Value2, ...)

Temp tables are also an option. No, I didn't say they're
generally a good option.

Testing different query options:

Based on the recent feedback on this thread, I ran some tests in
SQL Query Analyzer. I worked with a Customers / Orders / Order
Details hierarchy where I only want the customers from a particular
country.

Using one multi-table query to pull back data from all three
tables took 20ms. Using three separate queries that queried each
table separately and use the filters within the queries took 10ms.
Using parameterized child queries also took 10ms. Using three
separate queries that relied on IN clauses and lists of parent key
values also took 10ms. I ran the tests ten times and those were the
average results. I'm guessing the reason the big join query took
longer is that it had to return more total data.

But that's just SQL Query Analyzer. What about ADO.NET?

I created two simple applications to test the performance of
the different query structures. I built one with VB.NET/ADO.NET
and one with VB 6/ADO 2.6. Each application uses the same four
query structures - join, separate queries with filters,
parameterized queries, and separate queries with IN clauses.

The VB 6 app is built simulate fetching data into an ADO.NET
DataSet. The closest equivalent in the ADO object model is a
read-only client-side Recordset. An updateable client-side
Recordset provides comparable functionality, but requires
fetching base table, column and key information when running the
query. Since I'm trying to simulate simply filling a DataSet, I
chose to use read-only client-side Recordsets in the app. I built
this VB 6 app really for a sanity check. While the app could
neither prove nor disprove my sanity, it did provide performance
numbers that were right in line with the VB.NET/ADO.NET app.

Getting back to the ADO.NET app, here's how I process the
results of the various queries. With the join routine, I simply
pull the data back into a single DataTable. I don't do any work
at the client side to separate the data into separate DataTables.
With the other routines, I pull the data back into a DataSet that
has a DataTable for each distinct table, a primary key on each
DataTable and DataRelations set up between the DataTables.

Since the primary key and DataRelations perform validation at
the client-side, I set EnforceConstraints to False on the DataSet
prior to fetching each set data from the database. Since the
data passes identical validation routines in the database, there's
no need to validate this data as I fetch it.

I ran the routines through fifty passes (100 sets of queries per
pass). The separate filtered queries proved to be the fastest means
to fetch the data. The queries that used IN clauses finished a
close second, generally 5% slower. The big join query was usually
15-25% slower than the separate filtered queries. The parameterized
queries were generally 6x to 10x (yes, that's 10x and not 10%)
slower than the separate queries. That result surprised me, but the
numbers were were consistent for both the VB.NET/ADO.NET and VB 6/
ADO 2.6 applications.

I'm no SQL performance guru, but I'm assuming that the join
query was slower because it has to return more data. The
parameterized queries were slower because they required more round-
trips to the server to run each query. They're more performant in
SQL Query Analyzer than in a real-world application because with SQL
Query Analyzer runs much closer to the server itself.

Based on all of this information, working with different
DataTables proved a faster and more powerful solution, at least for
the scenario I tested.

As with all tests, your mileage may very. Some methods might
be completely inappropriate depending on your application. When in
doubt, write your own tests and draw your own conclusions.

I hope this information proves helpful.

David Sceppa
Microsoft
The opinions expressed are not necessarily those of Microsoft

P.S. I added two more tests afterwards - one combining the
separate filtered queries into a batch and one that used IN
subqueries. The batched queries improved performance by between
five and ten percent. The performance of the IN subqueries was
comparable to that of the original filtered queries.

Grant

unread,

Sep 13, 2001, 4:24:51 AM9/13/01

to

David,

Fantastic feedback. Thank you.

The primary problem I can see with using the join / sub query solution is that it is reading a lot of data twice e.g., if I had 1000 - bad customers I’ll end up reading those customers three times, and the orders twice (for the customers/orders/orderDetail situation). Just to clarify your test results, did they indicate that using three separate queries was still just as fast as a single Select with Join, even though it seems tables are getting read many times?

A second problem is that it is hard to maintain. Say for example you wanted to create customer filters to look at orders for Bad customers, and then Customers who live in New Zealand, customers starting with A etc… You have to create a dummy ‘join query’ or ‘Select with sub-queries’ for each child table (the orders table and the orders details table) and for each Customer Filter. You’ll end up with a lot of Datadapters or Select commands that have had to duplicate the same information.

Here’s another possible solution. It uses your 3rd suggestion (parameterized child queries), it seems elegant enough and it doesn’t require defining a relationship twice (i.e. in sql statements and DataRelations).

This is what I did:

1. Added a component Class called dbAccess.vb

2. Added DataAdapters to the component for daCustomer, daOrders and daLineitems (All with simple “SELECT * FROM <TABLE>)

4. Generate a dataset using all three adapters, and then update appropriate relationships using the xml schema designer.

5. Added 2 more DataAdapters to the dbAccess component:

a) daOrdersForCustomers SELECT * FROM Orders WHERE (CustomerId = ?)

b) daLineItemsForOrders SELECT * FROM LineItems WHERE (Orderid = ?)

Then create a form (add any datagrids bound to the Customer Data or the CustomerOrders relationshiop etc), alos, add an instance of the dsALL class to the form designer (called dsALL1). Then add the code as :

   Private WithEvents CustTable As dsAll.CustomersDataTable
    Private WithEvents OrdTable As dsAll.OrdersDataTable
    Private doCascadeFill As Boolean = False
    Private oDB As New dbAccess()

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        CustTable = Me.DsAll1.Customers
        OrdTable = Me.DsAll1.Orders
        FillCascading(Me.oDB.daCustomers)
    End Sub

    Private Sub FillCascading(ByVal daConditional As OleDb.OleDbDataAdapter)
        Me.doCascadeFill = True
        daConditional.Fill(Me.DsAll1)
        Me.doCascadeFill = False
    End Sub

    Private Sub CustomersFilling(ByVal sender As System.Object, ByVal e As System.Data.DataRowChangeEventArgs) Handles CustTable.RowChanged
        If Me.doCascadeFill And e.Action = DataRowAction.Add Then
            Me.oDB.daOrdersforCustomers.SelectCommand.Parameters(0).Value = e.Row.Item("CustomerID")
            Me.oDB.daOrdersforCustomers.Fill(Me.DsAll1)
        End If
    End Sub

    Private Sub OrdersFilling(ByVal sender As System.Object, ByVal e As System.Data.DataRowChangeEventArgs) Handles OrdTable.RowChanged
        If Me.doCascadeFill And e.Action = DataRowAction.Add Then
            Me.oDB.daLineItemforOrders.SelectCommand.Parameters(0).Value = e.Row.Item("Orderid")
            Me.oDB.daLineItemforOrders.Fill(Me.DsAll1)
        End If
    End Sub

The great thing about this is that you can easily change your filter criteria on the customers and then, just by calling FillCascading, the child tables will be refreshed with the correct values (i.e. no need to create select commands / DataAdapters on all the child tables for each different customer filter – you made need to add some <datatable>.clear code though). Another important advantage is that you can limit the size of the Child tables for ‘each’ parent. For example you could show the last 10 orders for each bad customer.

You could also make some changes so that the tables are filled on demand instead of loading the dataset with ALL the children.

My only concern is performance. I’m executing a lot of Select Statements here. David, I wasn’t completely sure on your test results, is this going to be 10x slower?

If this is the case perhaps it would be better to modify the code slightly, and to use your last suggestion and Replace the FillCascading / CustomersFilling method with something like:

Private Sub FillCascading(ByVal daConditional As OleDb.OleDbDataAdapter)
Me.doCascadeFill = True

''Fill the customers and setup for Orders
daConditional.Fill(Me.DsAll1)

        ''Fill the orders and set up for LineItems
        Me.oDB.OleDbSelectCommand5.CommandText = _
        "SELECT * FROM ORDERS WHERE CUSTOMERID IN (" & InClause.Substring(1) & ")"
        Me.oDB.daOrdersforCustomers.Fill(Me.DsAll1)

''Fill the the lineitems
''...etc

Me.doCascadeFill = False
End Sub

    Private Sub CustomersFilling(ByVal sender As System.Object, ByVal e As System.Data.DataRowChangeEventArgs) Handles CustTable.RowChanged
        If Me.doCascadeFill And e.Action = DataRowAction.Add Then
            InClause = InClause & "," & e.Row.Item("CustomerID")
        End If
    End Sub

This is a little messy but you could easily create generic methods that built the SQL statement, so no problem.

Okay, now my concerns are: Can you really create a string with a Select statement that lists 1,000+ values. Does this create any new issues performance wise, limits wise…

Any thoughts?

Regards,

Grant

Final Note:

Trying to do a Fill <while> another Fill is being executed, caused ADO.NET some problems. It seems to be to do with the fact that only one datareader (which the Dataset must use internally) can be active on a connection at one time.

There’s probably a better solution, but, to get around this I went to my dbaccess component and duplicated my connection 3 times and updated the ‘Generated Code’ so that my … dataadpaters select data command used 3 different connections. Eg:

In dbaccess.vb, change me.cnTestData as follows:

Me.OleDbSelectCommand5.Connection = Me.cnTestData2

David Sceppa

unread,

Sep 13, 2001, 6:11:17 PM9/13/01

to

Grant,

> The primary problem I can see with using the join / sub
> query solution is that it is reading a lot of data twice
> e.g., if I had 1000 - bad customers I'll end up reading
> those customers three times, and the orders twice (for
> the customers/orders/orderDetail situation). Just to
> clarify your test results, did they indicate that using
> three separate queries was still just as fast as a single
> Select with Join, even though it seems tables are getting
> read many times?

The separate queries were't just as fast as the join,
they were actually faster, usually by between 15 and 25%.
I'm guessing that the major reason that the separate queries
were faster is that they returned less data combined than
the join query.

> Say for example you wanted to create customer filters to
> look at orders for Bad customers, and then Customers who

> live in New Zealand, customers starting with A etc. You

> have to create a dummy 'join query' or 'Select with sub-

> queries' for each child table (the orders table and the

> orders details table) and for each Customer Filter.
> You'll end up with a lot of Datadapters or Select
> commands that have had to duplicate the same
> information.

Whether you work with joins or separate queries, you
will have to change the structure of your DataAdapter(s) or
Command(s) in order to support different search criteria
per query. Working with a join means that you would only
need to change the search criteria with a single DataAdapter
or Command. Changing the search criteria for separate
filtered queries requires more work. The code may be
inelegant, but I think the difference in run-time performance
may outweigh the annoyance at design-time.

Your parameterized solution definitely simplifies the
process of fetching the hierarchy because you only have to
change the filter on the top-level query. Based on the
tests I ran, the performance of such parameterized queries
was between six and ten times slower, for exactly the
reason you pointed out. It's a lot of queries to run. The
parameterized queries really strikes me as a more elegant
solution, but the numbers imply that it's not practical in
this particular scenario.

You've already written so much code, I'd strongly
suggest that you generate your own numbers. I'm confident
you'll see similar numbers but I still think it's worth
testing yourself.

> ... to get around this I went to my dbaccess component
> and duplicated my connection 3 times ...

Yes, calling DataAdapter.Fill does open a DataReader
on your Connection, which blocks it until the DataReader is
closed. You may encounter problems with your solution if
one DataAdapter fetches order information but the "parent"
DataAdapter hasn't fetched its corresponding customer yet.
If you go with this solution, you should probably set the
EnforceConstraints property on your DataSet to False while
fetching your data.

Also, keep in mind that while this solution may improve
the performance of your application, you're asking the
database to do more work when your application requests this
hieararchical data. I'd recommend testing the single and
multiple connection solutions under more real-world
circumstances before running with the multiple connection
solution. You may find it improves performance, but you
may also find that it degrades performance. I'd be very
curious to hear how the different solutions performed if
you ran the application from multiple machines at the same
time.

I'm not trying to knock your solution at all. I'm just
recommending that you run some tests to make sure you'll be
pleased with the solution later in the development process.