ToDataTable() - merge with modelshredder

93 views
Skip to first unread message

Johannes Rudolph

unread,
Dec 20, 2009, 3:18:10 PM12/20/09
to MoreLINQ Developers
Hi,

I am the owner of the modelshredder project http://code.google.com/p/modelshredder/,
providing a ToDataTable convenience method for IEnumerables.I have
discussed merging modelshredder into morelinq with Jon Skeet. He likes
the idea, however he can't get around to review the code at the
moment. I do hereby ask you to have a look at the current code base
and let me know if there's anything that should be fixed before the
code will end up in morelinq.

Kind Regards and a Merry Christmas
Johannes Rudolph

Konrad

unread,
Dec 24, 2009, 9:04:53 AM12/24/09
to MoreLINQ Developers
Hi,

a quick few things that I noted in the project. Hope that helps.

* At the moment, your `IEnumerable` extension is only part of the Demo
project. I imagine it will be the main use of your library (even
though it can do so much more) so it should be moved into the library
itself.
* Some of your methods lack argument checks.
* `ShredderOptions` should probably be marked `sealed` since its
members cannot be modified from derived classes.
* `*.suo` files contain only user information; they shouldn't be
checked into source control.

Kind regards, and also a Merry Christmas
Konrad Rudolph (duh)

Johannes Rudolph

unread,
Dec 26, 2009, 2:37:08 PM12/26/09
to moreli...@googlegroups.com
Thanks for your suggestions Konrad. I have fixed them in the public repository. I also improved code documentation and refactored some of the code. Since MoreLinq works only on IEnumerable<T> I decided to stick with that convention in modelshredder too.
I have also replaced the ShredderOptions handling, ShredderOptions are now directly passed into the Shredder. I plan turning ShredderOptions into something like the DataLoadOptions in Linq2Sql. 

If you think the modelshredder code is a useful contribution to the morelinq project and the code quality is on par, please let me know how we should proceed with the merge.

Kind Regards,
Johannes Rudolph



--

You received this message because you are subscribed to the Google Groups "MoreLINQ Developers" group.
To post to this group, send email to moreli...@googlegroups.com.
To unsubscribe from this group, send email to morelinq-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/morelinq-dev?hl=en.



Johannes Rudolph

unread,
Dec 30, 2009, 5:52:30 PM12/30/09
to moreli...@googlegroups.com
I have finally integrated modelshredder code into MoreLinq. I have tried to stick with all the conventions of MoreLinq. Due to modelshredder's subsystem I decided to create a MoreLinq.Modelshredder namespace. The source directory layout resembles this fact. The same goes for the Test project.

According to NCover modelshredder code has 100% test coverage (even though I don't give too much about coverage).

Attached is a patch containing my efforts. Hereby I do agree with licensing my code under the Apache 2.0 License.

Kind Regards,
Johannes Rudolph
modelshredderMerge.diff

Johannes Rudolph

unread,
Dec 31, 2009, 11:56:28 AM12/31/09
to moreli...@googlegroups.com
One quick question regarding the Silverlight Project:

Why is are all files referenced as Links inside the Silverlight Project? Is there any way I can incorporate my changes without hacking the .csproj xml? Couldn't find any info on how VS would let me do this.

I whish you all a happy new year, it's 6hrs and 5 minutes left for me ^^
Johannes Rudolph

Jon Skeet

unread,
Dec 31, 2009, 12:02:44 PM12/31/09
to moreli...@googlegroups.com
Unfortunately building for Silverlight and the Compact Framework as well as desktop is pretty tricky. We have some hacks in Protocol Buffers to keep Silverlight in the same project file, but even then we have to use a different project file for the Compact Framework :(

Jon

2009/12/31 Johannes Rudolph <jojo.r...@googlemail.com>

Johannes Rudolph

unread,
Jan 1, 2010, 9:45:59 AM1/1/10
to moreli...@googlegroups.com
Ok, after a little bit of research on Silverlight (I have never worked with it before) I found out there's no DataTable support. Following this there will be no modelshredder support for Silverlight.

Kind regards,
Johannes Rudolph 

Atif Aziz

unread,
Jan 11, 2010, 7:27:29 AM1/11/10
to moreli...@googlegroups.com
Hi,

I'm sorry that I'm late to respond here. I had been meaning to since the original request for review but couldn't find the time until now.

I'm afraid I find the integration of ModelShredder into MoreLINQ against the spirit of MoreLINQ and LINQ operators. I believe the integration introduces way too many new types and muddies scope of MoreLINQ with the accompanying support code (schema building, reflection and IL generation). Don't get me wrong. This is not criticism against ModelShredder, its utility or implementation. Rather, I'm not sure if a full integration of the code as it is sitting right now in the trunk (and BTW broken in compilation) is the best approach. Anyway, here's a breakdown to walk through my arguments.

The main function of concern seems to be Shred:

public DataTable Shred(IEnumerable<T> source) {
  DataTable table = new DataTable();
  m_SchemaBuilder.BuildTableSchema(table, m_ShredderOptions);
  IEnumerator e = source.GetEnumerator();
  table.BeginLoadData();
  while (e.MoveNext()) {
      table.Rows.Add(m_ShredderMethod.Invoke(e.Current));
  }
  return table;
}

First of all, the function doesn't care about T for the purpose of walking through the sequence, but we can take care of this with var so that it implicitly gets typed as IEnumerator<T>:

public DataTable Shred(IEnumerable<T> source) {
  DataTable table = new DataTable();
  m_SchemaBuilder.BuildTableSchema(table, m_ShredderOptions);
  var e = source.GetEnumerator();
  table.BeginLoadData();
  while (e.MoveNext()) {
      table.Rows.Add(m_ShredderMethod.Invoke(e.Current));
  }
  return table;
}

Next, Shred does not dispose the enumerator so we can take care of this with using:

public DataTable Shred(IEnumerable<T> source) {
  DataTable table = new DataTable();
  m_SchemaBuilder.BuildTableSchema(table, m_ShredderOptions);
  using (var e = source.GetEnumerator()) {
    table.BeginLoadData();
    while (e.MoveNext()) {
        table.Rows.Add(m_ShredderMethod.Invoke(e.Current));
    }
  }
  return table;
}

But then we might as well use foreach:

public DataTable Shred(IEnumerable<T> source) {
  DataTable table = new DataTable();
  m_SchemaBuilder.BuildTableSchema(table, m_ShredderOptions);
  table.BeginLoadData();
  foreach (var e in source) {
      table.Rows.Add(m_ShredderMethod.Invoke(e.Current));
  }
  return table;
}

I also noticed that EndLoadData is missing to complement BeginLoadData. Minor detail but let's fix that quickly:

public DataTable Shred(IEnumerable<T> source) {
  DataTable table = new DataTable();
  m_SchemaBuilder.BuildTableSchema(table, m_ShredderOptions);
  table.BeginLoadData();
  foreach (var e in source) {
      table.Rows.Add(m_ShredderMethod.Invoke(e.Current));
  }
  table.BeginLoadData();
  return table;
}

Shred is not an extension method on IEnumerable<T>, which is a shame because that's how LINQ operators should be designed. To turn it into an extension method, we have to make it static and disconnect it from instance-based state. We can take care of m_ShredderMethod easily through a delegate:

public DataTable Shred(
  IEnumerable<T> source, 
  Func<T, object[]> rowFunc) 
{
  DataTable table = new DataTable();
  m_SchemaBuilder.BuildTableSchema(table, m_ShredderOptions);
  table.BeginLoadData();
  foreach (var e in source) {
      table.Rows.Add(rowFunc(e.Current));
  }
  return table;
}

We can get rid of the dependency on m_SchemaBuilder the same way by delegating it:

public DataTable Shred(
  IEnumerable<T> source, 
  Action<DataTable> schemaAction,
  Func<T, object[]> rowFunc) 
{
  DataTable table = new DataTable();
  schemaAction(table);
  table.BeginLoadData();
  foreach (var e in source) {
      table.Rows.Add(rowFunc(e.Current));
  }
  return table;
}

The schema building through an action is a bit ugly but at this stage we can turn it into a static and extension method:

public static DataTable Shred<TSource>(
  this IEnumerable<TSource> source, 
  Action<DataTable> schemaAction,
  Func<TSource, object[]> rowFunc) 
{
  // ...
}

One problem with this approach is that it is biased towards a DataTable only and excludes working with subclasses like when you have typed data sets and those already have a schema. One way to solve this problem is to let the caller pass in a table that is typed generically yet constrained to DataTable like this:

public static TTable Shred<TSource, TTable>(
  IEnumerable<TSource> source, 
  TTable table,
  Func<TSource, object[]> rowFunc)
  where TTable : DataTable 
{
  table.BeginLoadData();
  foreach (var e in source) {
      table.Rows.Add(rowFunc(e.Current));
  }
  table.EndLoadData();
  return table;
}

The benefit here is that the function will return the input table type, which can be either DataTable or a subclass. Type inference will also just work and let type bleed through the system. Finally, we can just call this method ToDataTable as a more stand-alone and meaningful name:

public static TTable ToDataTable<TSource, TTable>(...) { ... }

What I wanted to point out through this little exercise is really how Shred or ToDataTable could and should be implemented as far as MoreLINQ is concerned. It does not preclude anyone from using ModelShredder as a library in combination with MoreLINQ. It also does not need to bring in any of the type or supporting code from ModelShredder.

LINQ operators should be lightweight, remain as oblivious to inputs and outputs, encode the algorithm of the operation and delegate the rest of the decisions to the caller. Adding an operator that helps with DataTables should be implemented in the same spirit.

Comments, expansions and tasteful flames welcome.

- Atif

Johannes Rudolph

unread,
Jan 11, 2010, 12:32:03 PM1/11/10
to moreli...@googlegroups.com
Hi Atif,

I'm grateful you have found time to thoroughly review my code, it's very valuable for me. I am very very sorry to have commited a broken version to svn, I did my best to check my commit beforehands but somehow I am not handling the csproj files correctly. I hope it is finally fixed in r108. Not a very good start for me, again, I am very sorry.

I need to finish some important things for school today, but I will adress your points tomorrow at the latest.

Kind regrads,
Johannes

Atif Aziz

unread,
Jan 11, 2010, 1:14:00 PM1/11/10
to moreli...@googlegroups.com
Hi Johannes,

I checked out r108 and it builds fine now. Thanks for taking care of this.
BTW, I wrote some of it in a hurry so I hope some of the typos or omissions didn't make it too hard to understand. Let me know if any passage need to be clarified or if you want me to expand on any particular aspect.

- Atif 

Johannes Rudolph

unread,
Jan 12, 2010, 9:48:00 AM1/12/10
to moreli...@googlegroups.com
Hi Atif,

I have incorporated your suggestions on using a foreach loop instead of IEnumerator and correctly call table.EndLoadData(); and have committed them to svn in r109.

LINQ operators should be lightweight, remain as oblivious to inputs and outputs, encode the algorithm of the operation and delegate the rest of the decisions to the caller. Adding an operator that helps with DataTables should be implemented in the same spirit.

I do fully agree with you on that one. You might have missed MoreEnumerable/ToDataTable.cs which provides the actual extension methods. It is implemented like this:

public static DataTable ToDataTable<T>(this IEnumerable<T> source, ShredderOptions options)
{
Shredder<T> ms = new Shredder<T>(options, new InjectionObjectShredder(), new DefaultSchemaBuilder());

return ms.Shred(source);
}

There is also an overload without Shredderoptions. My idea behind the Implementation of the Shredder was to use it as coordinator between the different things one needs to do when translating an object collection to a datatable.

It is very important for the user to be able to decide which members of an object should be represented in the datatable. As I plan to support creating datasets in the future, this will be even more important to decide which associations will be included in the resulting dataset. This is the task of ShredderOptions.

The SchemaBuilder takes the ShredderOptions and creates a datatable/dataset accordingly. I originally thought it should be possible to provide custom implementations, but I realize schema building can all be handled by a single SchemaBuilder. 
Your idea of putting data into a strongly typed datatable (which thus already has a schema) is a little against the idea of modelshredder, I guess. You would normally use tabular representation for presentation/reporting only. In my experience strongly typed datasets are used to provide a more convenient "row per object" style access to the data, but you still have your object collection for that. 

In the same way as for the SchemaBuilder, support for the IShredderDelegateProvider extension point can be dropped. I don't think any other shredder mechanism will need to be implemented in the future. Extending the existing to take datasets into account should be possible.

ShredderDelegate can be replaced by Func<T, object[]>. I don't know if that's a real benefit, however. It's the same like e.g. using a Tuple<int, string, string> instead of a (anonymous) class with three members to me. What's your take on this?

All in all this would considerably slim the Infrastructure code required and result in a more lightweight implementation, as you suggested. Please let me know what you think about it.

Kind regards,
Johannes

Atif Aziz

unread,
Jan 13, 2010, 6:55:03 PM1/13/10
to moreli...@googlegroups.com
You might have missed MoreEnumerable/ToDataTable.cs which provides the actual extension methods

You're right. I don't how I missed that one. I guess it was late. In any event, the implementation mostly delegates to Shredder<T>'s Shred so that's where the most interesting things happen. You could pretty much inline it into ToDataTable and then we're back to square one. That's because Shredder<T> doesn't seem to be doing much except holding on its constructor arguments, causing the creation of the shredding method. The T is not really used until inside Shred and then too to only type the looping variable of foreach. ShredderDelegate doesn't seem to care about it since its first argument is typed as object. In other words, if you make Shredder non-generic, everything would still work. But then if you inline Shred into ToDataTable then you get rid of Shredder altogether. This would give us back the implementation I showed in my first review:

public static TTable ToDataTable<T, TTable>(
  this IEnumerable<T> source, 
  TTable table, 
  Func<T, object[]> rowFunc)
  where TTable : DataTable 
{
  table.BeginLoadData();
  foreach (var element in source)
    table.Rows.Add(rowFunc(element));
  table.EndLoadData();
  return table;
}

With this, you can go back and work the other overload as simply helping with providing defaults like this:

public static DataTable ToDataTable<T>(
  this IEnumerable<T> source) 
{
  var options = DefaultShredderOptionsProvider.ProvideShredderOptions(typeof(T));
  var provider = new InjectionObjectShredder();
  var table = new DataTable();
  var builder = new DefaultSchemaBuilder();
  builder.BuildTableSchema(table, options);
  var shredder = provider.GetShredderMethod(options);
  return source.ToDataTable(table, e => shredder(e));
}

In theory, I believe we can further improve this and completely remove the need for nearly all supporting code via lambda expressions, but that may have to be a topic for another post.

Your idea of putting data into a strongly typed datatable (which thus already has a schema) is a little against the idea of modelshredder

Perhaps, but I was talking about keeping ToDataTable unbiased (as an operator implementation should be in MoreLINQ) rather than satisfying the idea of modelshredder especially when you're proposing to fold the latter into MoreLINQ.

ShredderDelegate can be replaced by Func<T, object[]>. I don't know if that's a real benefit, however. It's the same like e.g. using a Tuple<int, string, string> instead of a (anonymous) class with three members to me. What's your take on this?

Not sure I've understood your point here. Func<T, object[]> is only a generic version of ShredderDelegate.

- Atif

Johannes Rudolph

unread,
Jan 14, 2010, 3:18:48 AM1/14/10
to moreli...@googlegroups.com
 In other words, if you make Shredder non-generic, everything would still work. 
Nah, I couldn't be sure all object are of the same (or sub-)type.  

But then if you inline Shred into ToDataTable then you get rid of Shredder altogether. This would give us back the implementation I showed in my first review.
...
In theory, I believe we can further improve this and completely remove the need for nearly all supporting code via lambda expressions, but that may have to be a topic for another post.
 Yes, thats what I was trying to say the last time. It feels to me like replacing object oriented with functional programming paradigmas. This also fits better with LINQ. 

Your idea of putting data into a strongly typed datatable (which thus already has a schema) is a little against the idea of modelshredder
 
Perhaps, but I was talking about keeping ToDataTable unbiased (as an operator implementation should be in MoreLINQ) rather than satisfying the idea of modelshredder especially when you're proposing to fold the latter into MoreLINQ.

I think letting the user pass a prepared datatable is an option, however it would require validating the schema is correctly set-up. After all, I think we should drop support for custom schema building altogether, as stated in my last mail.

 
ShredderDelegate can be replaced by Func<T, object[]>. I don't know if that's a real benefit, however. It's the same like e.g. using a Tuple<int, string, string> instead of a (anonymous) class with three members to me. What's your take on this?

Not sure I've understood your point here. Func<T, object[]> is only a generic version of ShredderDelegate.

I think you understood, it's basically what you were saying about replacing concrete infrastructure with generic versions. 

Atif Aziz

unread,
Jan 14, 2010, 5:54:27 AM1/14/10
to MoreLINQ Developers
> It feels to me like
> replacing object oriented with functional programming paradigmas.

It's just a matter of how one twists the brain. A delegate is an
object and closures are just syntax sugar for getting the compiler to
generate the classes for what you would've written otherwise anyhow.
So while it all looks and sounds very functional, we're still OOP-ish
behind-the-scenes except the compiler is doing all the heavy-
lifting. ;)

I think your modelshredder has all the right ingredients and extension
points from a design stand-point. We just need to rewire the
abstractions and mechanics so we avoid introducing new types. The
supporting code and implementation like reflection, schema building
and IL generation can be more tightly integrated as the default
behavior for simpler overloads. In the end, we wouldn't be losing
anything.

- Atif

On Jan 14, 9:18 am, Johannes Rudolph <jojo.rudo...@googlemail.com>
wrote:

> >> On Mon, Jan 11, 2010 at 7:14 PM, Atif Aziz <aziza...@gmail.com> wrote:
>
> >>> Hi Johannes,
>
> >>> I checked out r108 and it builds fine now. Thanks for taking care of
> >>> this.
> >>> BTW, I wrote some of it in a hurry so I hope some of the typos or
> >>> omissions didn't make it too hard to understand. Let me know if any passage
> >>> need to be clarified or if you want me to expand on any particular aspect.
>
> >>> - Atif
>
> >>> On Mon, Jan 11, 2010 at 6:32 PM, Johannes Rudolph <
> >>> jojo.rudo...@googlemail.com> wrote:
>
> >>>> Hi Atif,
>
> >>>> I'm grateful you have found time to thoroughly review my code, it's very
> >>>> valuable for me. I am very very sorry to have commited a broken version to
> >>>> svn, I did my best to check my commit beforehands but somehow I am not
> >>>> handling the csproj files correctly. I hope it is finally fixed in r108. Not
> >>>> a very good start for me, again, I am very sorry.
>
> >>>> I need to finish some important things for school today, but I will
> >>>> adress your points tomorrow at the latest.
>
> >>>> Kind regrads,
> >>>> Johannes
>

> >>>>  On Mon, Jan 11, 2010 at 1:27 PM, Atif Aziz <aziza...@gmail.com> wrote:
>
> >>>>>  Hi,
>
> >>>>> I'm sorry that I'm late to respond here. I had been meaning to since
> >>>>> the original request for review but couldn't find the time until now.
>
> >>>>> I'm afraid I find the integration of ModelShredder into MoreLINQ
> >>>>> against the spirit of MoreLINQ and LINQ operators. I believe the integration
> >>>>> introduces way too many new types and muddies scope of MoreLINQ with the
> >>>>> accompanying support code (schema building, reflection and IL generation).
> >>>>> Don't get me wrong. This is not criticism against ModelShredder, its utility
> >>>>> or implementation. Rather, I'm not sure if a full integration of the code as
> >>>>> it is sitting right now in the trunk (and BTW broken in compilation) is the
> >>>>> best approach. Anyway, here's a breakdown to walk through my arguments.
>
> >>>>> The main function of concern seems to be Shred:
>
> >>>>> public DataTable Shred(IEnumerable<T> source) {
> >>>>>   DataTable table = new DataTable();
> >>>>>   m_SchemaBuilder.BuildTableSchema(table, m_ShredderOptions);
> >>>>>   IEnumerator e =
>

> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -

Atif Aziz

unread,
Jan 14, 2010, 7:40:19 PM1/14/10
to moreli...@googlegroups.com
Hi Johannes,

In theory, I believe we can further improve this and completely remove the need for nearly all supporting code via lambda expressions, but that may have to be a topic for another post.
Yes, thats what I was trying to say the last time. It feels to me like replacing object oriented with functional programming paradigmas. This also fits better with LINQ. 

I experimented with this idea to see how it could be implemented. Check out the attached proposal and let me know what you think. It should (hopefully)  maintain the spirit and flexibility of modelshredder as well as MoreLINQ but without introducing any new types. Also, most of the supporting code has been inlined (didn't amount to much) and defaulted as needed. It supports many scenarios as well, including the most basic one from the home page of modelshredder:

var query = from x in xs 
            select new {
              x.Id, x.Name, 
              x.Price, x.Qty, x.LineTotal
            };
DataGridView.DataSource = query.ToDataTable();

You can plug the attached file into the MoreLINQ project to try it out. I've also posted it online at:

- Atif
ToDataTable.cs

Johannes Rudolph

unread,
Jan 15, 2010, 12:41:11 PM1/15/10
to moreli...@googlegroups.com
Hi,

there are some interesting bits in it. I do especially like the way you handle schema building. Interestingly enough, performance is Ok, but not as good as it could be. I have done some basic perf testing (1 mio. records) and Shredder is almost exactly twice as fast. I wonder a little why this is the case as Expression.MakeMemberAccess should compile down to the same IL,  I think the difference is for the "single cell loading" instead of "row loading" the DataTable. I have confirmed the DataTable itself is not the reason for it with:

foreach (T element in source)
{
var objs = m_ShredderMethod.Invoke(element); // get obj array
var row = table.NewRow();
for (var i = 0; i < objs.Length; i++)
row[i] = objs[i]; // load each cell
table.Rows.Add(row);
}
against:

foreach (var element in source)
{
var row = table.NewRow();
for (var i = 0; i < members.Length; i++)
row[columns[i]] = members[i].GetValue(element);
table.Rows.Add(row);
}

Basically both approaches should generate the same amount of member access calls. Only obvious difference is all the array indexing but I guess it should have no significance. Interesting.

Further:

  • I don't like that the most overloaded ToDataTable is now a little over 100 LoC, refactoring would be easy though.
  • It doesn't take indexer properties into account, failing with low level exception at runtime.
  • As I do plan to support DataSets in the future, I need some kind of Load Options that specify which associations will be represented in the DataSet. I could do it like Linq2sql (my original intention, therefore ShredderOptions) or with params Expression<Func<T, object>>[] expressions as access "indicators", but don't use them for real access for performance reasons (see above).
Do you think I should commit to a branch in svn or are you happy to share code via patches/gist?

Atif Aziz

unread,
Jan 16, 2010, 8:47:39 AM1/16/10
to moreli...@googlegroups.com
Hi Johannes,

Interestingly enough, performance is Ok, but not as good as it could be. I have done some basic perf testing (1 mio. records) and Shredder is almost exactly twice as fast

Good catch and thanks for taking the implementation through its paces with performance tests! I also observed the same performance issues on my end. Turns out, the problem has nothing to with the MSIL compiled through lambda expressions or how the rows are built but rather the schema constraints. How you and I were setting the AllowDBNull property was different. In the refactored version, AllowDBNull was being set to false for everything except nullable value types. DefaultSchemaBuilder, on the other hands, was coded such that AllowDBNull would always turn out to be true irrespective of the property or field type! This difference affected the execution speed of EndLoadData between the two implementations. In my implementation, it was causing the null constraint checks to run for all rows after the loading took place and which is why the performance ended up being twice as bad.

Here's what DefaultSchemaBuilder was doing for properties (identical for fields):

if (IsNullable(pi.PropertyType)) 
{
  dc.DataType = pi.PropertyType.GetGenericArguments()[0];
  dc.AllowDBNull = true;
}
else
{
  dc.DataType = pi.PropertyType;
}

The funny thing is, AllowDBNull is true by default for a DataColumn. Setting it to true for nullable value types was actually having no effect. DBNull would still be allowed for, say, a non-nullable int. Since all columns always allowed DBNull, it meant that EndLoadData has no null constraint checks to carry out after loading.

When I was experimenting with the refactoring, I miswrote that as:

...
let nullable = type.IsGenericType 
               && type.GetGenericTypeDefinition() == typeof(Nullable<>)
...

Later, the column was being created like this:

new DataColumn(m.Name, m.Type) { AllowDBNull = m.Nullable }

And this is not exactly the same. I've now fixed this issue to keep the behavior identical by completely removing the nullable condition and setting of AllowDBNull since it was effectively always resulting in true (the default) with DefaultSchemaBuilder. With this, the two are nearly on par with ModelShredder being a tad 8% to 9% quicker according to my tests. This small difference is now coming from two things. How the row is built and differences in the generated MSIL code. ModelShredder creates and invokes a single method that builds an entire array from member values. In lambda expression scenario, there are several invocations of delegates to read out the member values. I played around with dynamically combining the lambda expressions such that it too initializes an array from member values in a single invocation and it went something like this:

private static Func<T, object[]> CreateShredder<T>(
  IEnumerable<MemberInfo> members)
{
  var parameter = Expression.Parameter(typeof (T), "e");
  var initializers = members.Select(m => CreateMemberAccessor(parameter, m))
                            .Cast<Expression>();
  var array = Expression.NewArrayInit(typeof(object), initializers);
  var lambda = Expression.Lambda<Func<T, object[]>>(array, parameter);            
  return lambda.Compile();
}

This brings the performance of the two implementations the same for all intent and purposes. 

Do you think I should commit to a branch in svn or are you happy to share code via patches/gist?

I've gone ahead and committed the re-factored version to a branch named linqish-to-data-table. I have left out the combined lambda version for now. I don't know if it's worth the small gain and one can really add the optimization any time once we agree on the general approach. If you're happy with it, feel free to merge it into the trunk.

I don't like that the most overloaded ToDataTable is now a little over 100 LoC, refactoring would be easy though.

This is rather subjective and there's always some trade off involved. If you think about it, there's nothing that's really different. The code that was spread across various types is now under one roof but without any loss of functionality (unless I missed something). Also, some of the code bloat is coming from schema mapping and extra checks. This nicely leads to your next comment...

It doesn't take indexer properties into account, failing with low level exception at runtime.

Agreed! I didn't want to add too much validation noise right now so that the overall approach can be clear for discussion. Feel free to add it now, though.

but don't use them for real access for performance reasons (see above).

Still applicable?

The only bad news is that the use of lambda expressions means that MoreLINQ cannot be used in .NET Framework 2.0 projects anymore but this can be taken care of through conditional compilation.

- Atif

Atif Aziz

unread,
Jan 16, 2010, 9:03:35 AM1/16/10
to moreli...@googlegroups.com
Adding to that, memory usage patterns also seem very similar between the two implementations. See the attached screenshot that tests a load of 4M rows. I'll agree any day, though, that observations in Task Manager are the most accurate method for reaching the conclusion :) but it's a start and shows that there's nothing warranting a deeper look right now.

- Atif
ToDataTable-GC.png

Johannes Rudolph

unread,
Jan 16, 2010, 10:16:15 AM1/16/10
to moreli...@googlegroups.com
Hi Atif,

That's good news, I do really like your linq implementation. Working with plain MSIL is rather nasty, you know :-). I will add validation against indexer properties now. I think we should opt for the approach of combining the MemberAccessExpressions into an ArrayInitExpression, it's not too complicated. And certainly much less complicated than the original ModelShredder code.

but don't use them for real access for performance reasons (see above)
Still applicable?

No, no longer relevant. 

I will leave the branch unmerged just now and will add some unit tests to verify ToDataTable() correctness. After that I'd merge it back in the trunk.

I'm also keen to implement a ToDataSet() method . Im right now using a similar approach for feeding a Report Engine with strongly typed DataSets. This nicely decouples my domain layer from the reports and I can handle versioning easily without breaking my clients report templates. A ToDataSet() extensions would also be useful for integrating with old .net 2.0 code and in general interfacing with infrastructure that relies on DataSets. Whats's your take on this?

Kind regards,
Johannes Rudolph

Just curious, how have you done performance testing? Have you used a profiler?

Johannes Rudolph

unread,
Jan 16, 2010, 2:35:05 PM1/16/10
to moreli...@googlegroups.com
I have just finished implementing row based loading. 

In terms of the DataTables one can supply to ToDataTable this approach is a little less flexible as it requires column ordering to be exactly the same as in the source or as in the expressions provided.

Your previous implementation was lacking necessary validation for the expressions that can be passed to ToDataTable. We need to validate against method calls and "second level" property access, this solves the issue of indexer properties automatically, as their expression type is method call too. 

Since we can build a shredder only from a MemberInfo[], I have now implemented expressions as "access indicators". I didn't realize this was necessary when I wrote my last mail. Getting the accessed member is not easy, e.g. field access is automatically boxed, but it can be solved:

private static MemberInfo GetAccessedMember(LambdaExpression lambda)
{
Expression body = lambda.Body;

// If it's a field access, boxing was used, we need the field
if ((body.NodeType == ExpressionType.Convert) || (body.NodeType == ExpressionType.ConvertChecked))
{
body = ((UnaryExpression)body).Operand;
}

// Check if the MemberExpression is valid and is a "first level" member access e.g. not a.b.c 
MemberExpression memberExpression = body as MemberExpression;
if ((memberExpression == null) || (memberExpression.Expression.NodeType != ExpressionType.Parameter))
{
return null;
}

return memberExpression.Member;
}

I have also split MemberInfo and DataTable preparation into seperate methods to make ToDataTable() clearer:

if (source == null) throw new ArgumentNullException("source");
if (table == null) throw new ArgumentNullException("table");


MemberInfo[] members = PrepareMemberInfos<T>(expressions);

PrepareDataTable(table, members);

var shredder = CreateShredder<T>(members);

//
// Builds rows out of elements in the sequence and
// add them to the table.
//

table.BeginLoadData();

try
{
foreach (var element in source)
{
var row = table.NewRow();
row.ItemArray = shredder(element);
table.Rows.Add(row);
}
}
finally
{
table.EndLoadData();
}

return table;


Since we do use MemberInfos internally anyway, we could also provide a ToDataTable overload accepting a predicate for MemberInfos. So I could to something like this:

testObjects.ToDataTable(m => !m.Name.StartsWith("Key"))

so the user can either filter from all MemberInfos or explicitly provide the members to be included via expressions.


Kind regards,
Johannes Rudolph

Atif Aziz

unread,
Jan 16, 2010, 8:27:38 PM1/16/10
to moreli...@googlegroups.com
Just curious, how have you done performance testing? 
Have you used a profiler?

I just used the poor man's profiler, a.k.a. System.Diagnostics.Stopwatch

- Atif

Atif Aziz

unread,
Jan 17, 2010, 6:32:10 PM1/17/10
to moreli...@googlegroups.com
 this approach is a little less flexible as it requires column ordering to be exactly the same as in the source or as in the expressions provided.

Why even have this limitation? It's not a difficult problem to solve so we should just take care of it while we're at it so that there's the least element of surprise.

We need to validate against..."second level" property access

Good catch!

Since we do use MemberInfos internally anyway, we could also provide a ToDataTable overload accepting a predicate for MemberInfos.
so the user can either filter from all MemberInfos or explicitly provide the members to be included via expressions.

Not sure this would be useful in real life. If one wants to limit the members, one should have projected the right shape (via Select) prior to calling ToDataTable. I would rather that someone report a real supporting case from the field rather than dream up all the possible scenarios. I mean, if it's just an overload, we can add it anytime.

- Atif

Johannes Rudolph

unread,
Jan 18, 2010, 2:32:40 AM1/18/10
to moreli...@googlegroups.com
Why even have this limitation? It's not a difficult problem to solve so we should just take care of it while we're at it so that there's the least element of surprise.

-> else we can't use row based shredding but are forced into column based shredding. 8-10% performance loss is not all too much. 

If one wants to limit the members, one should have projected the right shape (via Select) prior to calling ToDataTable.

Yes, I was aware of that. But I'm not sure how this affects general performance (project + ToDataTable()). However, in some cases (e. g. you want to include the result of method calls) you need to project anyway.  

Atif Aziz

unread,
Jan 19, 2010, 2:14:50 PM1/19/10
to moreli...@googlegroups.com
else we can't use row based shredding but are forced into column based shredding. 
8-10% performance loss is not all too much. 

Oh? :) Have a look and let me know what you think...
http://code.google.com/p/morelinq/source/detail?r=134

Atif Aziz

unread,
Jan 21, 2010, 2:52:28 PM1/21/10
to moreli...@googlegroups.com
Are we ready to merge this into the trunk?

Johannes Rudolph

unread,
Jan 21, 2010, 5:21:39 PM1/21/10
to moreli...@googlegroups.com
I think so, but haven't had the time to review it fully yet, since I have to prepare for my exams. Will be able to do so on saturday latest and will merge the code then.

Johannes Rudolph

unread,
Jan 23, 2010, 4:00:26 PM1/23/10
to moreli...@googlegroups.com
Besides some very small improvements, I found the code to be ready for merging and have just done so.

Kind regards,
Johannes Rudolph
Reply all
Reply to author
Forward
0 new messages