I am the owner of the modelshredder project http://code.google.com/p/modelshredder/,
providing a ToDataTable convenience method for IEnumerables.I have
discussed merging modelshredder into morelinq with Jon Skeet. He likes
the idea, however he can't get around to review the code at the
moment. I do hereby ask you to have a look at the current code base
and let me know if there's anything that should be fixed before the
code will end up in morelinq.
Kind Regards and a Merry Christmas
Johannes Rudolph
a quick few things that I noted in the project. Hope that helps.
* At the moment, your `IEnumerable` extension is only part of the Demo
project. I imagine it will be the main use of your library (even
though it can do so much more) so it should be moved into the library
itself.
* Some of your methods lack argument checks.
* `ShredderOptions` should probably be marked `sealed` since its
members cannot be modified from derived classes.
* `*.suo` files contain only user information; they shouldn't be
checked into source control.
Kind regards, and also a Merry Christmas
Konrad Rudolph (duh)
--
You received this message because you are subscribed to the Google Groups "MoreLINQ Developers" group.
To post to this group, send email to moreli...@googlegroups.com.
To unsubscribe from this group, send email to morelinq-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/morelinq-dev?hl=en.
LINQ operators should be lightweight, remain as oblivious to inputs and outputs, encode the algorithm of the operation and delegate the rest of the decisions to the caller. Adding an operator that helps with DataTables should be implemented in the same spirit.
You might have missed MoreEnumerable/ToDataTable.cs which provides the actual extension methods
Your idea of putting data into a strongly typed datatable (which thus already has a schema) is a little against the idea of modelshredder
ShredderDelegate can be replaced by Func<T, object[]>. I don't know if that's a real benefit, however. It's the same like e.g. using a Tuple<int, string, string> instead of a (anonymous) class with three members to me. What's your take on this?
In other words, if you make Shredder non-generic, everything would still work.
But then if you inline Shred into ToDataTable then you get rid of Shredder altogether. This would give us back the implementation I showed in my first review.
In theory, I believe we can further improve this and completely remove the need for nearly all supporting code via lambda expressions, but that may have to be a topic for another post.
Your idea of putting data into a strongly typed datatable (which thus already has a schema) is a little against the idea of modelshredder
Perhaps, but I was talking about keeping ToDataTable unbiased (as an operator implementation should be in MoreLINQ) rather than satisfying the idea of modelshredder especially when you're proposing to fold the latter into MoreLINQ.
ShredderDelegate can be replaced by Func<T, object[]>. I don't know if that's a real benefit, however. It's the same like e.g. using a Tuple<int, string, string> instead of a (anonymous) class with three members to me. What's your take on this?
Not sure I've understood your point here. Func<T, object[]> is only a generic version of ShredderDelegate.
It's just a matter of how one twists the brain. A delegate is an
object and closures are just syntax sugar for getting the compiler to
generate the classes for what you would've written otherwise anyhow.
So while it all looks and sounds very functional, we're still OOP-ish
behind-the-scenes except the compiler is doing all the heavy-
lifting. ;)
I think your modelshredder has all the right ingredients and extension
points from a design stand-point. We just need to rewire the
abstractions and mechanics so we avoid introducing new types. The
supporting code and implementation like reflection, schema building
and IL generation can be more tightly integrated as the default
behavior for simpler overloads. In the end, we wouldn't be losing
anything.
- Atif
On Jan 14, 9:18 am, Johannes Rudolph <jojo.rudo...@googlemail.com>
wrote:
> >> On Mon, Jan 11, 2010 at 7:14 PM, Atif Aziz <aziza...@gmail.com> wrote:
>
> >>> Hi Johannes,
>
> >>> I checked out r108 and it builds fine now. Thanks for taking care of
> >>> this.
> >>> BTW, I wrote some of it in a hurry so I hope some of the typos or
> >>> omissions didn't make it too hard to understand. Let me know if any passage
> >>> need to be clarified or if you want me to expand on any particular aspect.
>
> >>> - Atif
>
> >>> On Mon, Jan 11, 2010 at 6:32 PM, Johannes Rudolph <
> >>> jojo.rudo...@googlemail.com> wrote:
>
> >>>> Hi Atif,
>
> >>>> I'm grateful you have found time to thoroughly review my code, it's very
> >>>> valuable for me. I am very very sorry to have commited a broken version to
> >>>> svn, I did my best to check my commit beforehands but somehow I am not
> >>>> handling the csproj files correctly. I hope it is finally fixed in r108. Not
> >>>> a very good start for me, again, I am very sorry.
>
> >>>> I need to finish some important things for school today, but I will
> >>>> adress your points tomorrow at the latest.
>
> >>>> Kind regrads,
> >>>> Johannes
>
> >>>> On Mon, Jan 11, 2010 at 1:27 PM, Atif Aziz <aziza...@gmail.com> wrote:
>
> >>>>> Hi,
>
> >>>>> I'm sorry that I'm late to respond here. I had been meaning to since
> >>>>> the original request for review but couldn't find the time until now.
>
> >>>>> I'm afraid I find the integration of ModelShredder into MoreLINQ
> >>>>> against the spirit of MoreLINQ and LINQ operators. I believe the integration
> >>>>> introduces way too many new types and muddies scope of MoreLINQ with the
> >>>>> accompanying support code (schema building, reflection and IL generation).
> >>>>> Don't get me wrong. This is not criticism against ModelShredder, its utility
> >>>>> or implementation. Rather, I'm not sure if a full integration of the code as
> >>>>> it is sitting right now in the trunk (and BTW broken in compilation) is the
> >>>>> best approach. Anyway, here's a breakdown to walk through my arguments.
>
> >>>>> The main function of concern seems to be Shred:
>
> >>>>> public DataTable Shred(IEnumerable<T> source) {
> >>>>> DataTable table = new DataTable();
> >>>>> m_SchemaBuilder.BuildTableSchema(table, m_ShredderOptions);
> >>>>> IEnumerator e =
>
> ...
>
> read more »- Hide quoted text -
>
> - Show quoted text -
In theory, I believe we can further improve this and completely remove the need for nearly all supporting code via lambda expressions, but that may have to be a topic for another post.
Yes, thats what I was trying to say the last time. It feels to me like replacing object oriented with functional programming paradigmas. This also fits better with LINQ.
Interestingly enough, performance is Ok, but not as good as it could be. I have done some basic perf testing (1 mio. records) and Shredder is almost exactly twice as fast
Do you think I should commit to a branch in svn or are you happy to share code via patches/gist?
I don't like that the most overloaded ToDataTable is now a little over 100 LoC, refactoring would be easy though.
It doesn't take indexer properties into account, failing with low level exception at runtime.
but don't use them for real access for performance reasons (see above).
but don't use them for real access for performance reasons (see above)
Still applicable?
Just curious, how have you done performance testing?
Have you used a profiler?
this approach is a little less flexible as it requires column ordering to be exactly the same as in the source or as in the expressions provided.
We need to validate against..."second level" property access
Since we do use MemberInfos internally anyway, we could also provide a ToDataTable overload accepting a predicate for MemberInfos.
so the user can either filter from all MemberInfos or explicitly provide the members to be included via expressions.
Why even have this limitation? It's not a difficult problem to solve so we should just take care of it while we're at it so that there's the least element of surprise.
If one wants to limit the members, one should have projected the right shape (via Select) prior to calling ToDataTable.
else we can't use row based shredding but are forced into column based shredding.
8-10% performance loss is not all too much.