RFC on reorganization (issue #23)

11 views
Skip to first unread message

Atif Aziz

unread,
Apr 4, 2009, 6:39:17 PM4/4/09
to moreli...@googlegroups.com
Hi everyone,

I've just created review issue #23 on the project site. I've set the ownership to Jon, but I'd like to welcome everyone's feedback. The changes are pending in a branch called reorg. I've reproduced the text of the issue below. I suppose it would be best to provide comments directly under the issue, but it wouldn't be a sin either to post something here too. So here goes:

Purpose of code changes on this branch:

Re-organize MoreLINQ so that all operators are implemented under the Enumerable class, providing several advantages over the current structure that classifies operators under their so-called category class.

When reviewing my code changes, please focus on:

The branch puts all operators under a single class called Enumerable, much like the base LINQ operators it extends. However, each operator resides in its own file and uses a partial modifier to allow the Enumerable definition to fan out. The benefits of this new structure over the current are the following:
  • Regions are not needed to superficially organize and delimit operators within a class just because the class seems to be growing big as new overloads and operators are added to it. Regions work fine as a visual navigation aid if you are in a sophisticated IDE like VS, but not when you are browsing the source through other means, such as the online Subversion browser, terminal or e-mail.

  • Categorization of the operator is more of a documentation and learning aid than an implementation concern. Once the MoreLinq.Pull namespace is imported, all types extending IEnumerable<T> are available irrespective of how the authors categorized them. The point is, it doesn't serve as aid for the user of the pull operators. In the new organization, categorization could be done (if absolutely needed) with a custom XML doc summary element.

  • A single partial Enumerable is simple than multiple category classes. There is no added benefit of the latter. Naming LINQ operators is hard enough. Proper categorization adds additional subjectivity and headache for authors. Before you can add an operator, you need to think about how to name it and categorize (the exercise may not be entirely unhealthy or useless). With the new organization, you just put it under Enumerable.

  • If an operator needs to be located, one has to first try and remember the class category, open that file and then add the operator or overload an existing at the right place. With the new organization, you just create a new file, define Enumerable as partial and add the operator implementation. For an existing operator, locating its source code is dead easy because the file is named after the operator.

  • Since each operator lies in it own file in the proposed new organization, there is less contention or risk of edit conflicts between authors working on the same source file.

  • With the proposed new organization, providing on-line links to a single operator and all its overloads is as simple as providing a URL the operator's file, as in http://tinyurl.com/MoreLinq-DistinctBy, whereas with the current organization, the best you can do is point someone to the category class and then ask them to look for the operator under there.

The are, however, two downsides, none of which seem to outweigh the aforementioned advantages:
  • All 134 tests now appear under a single fat fixture. For now, I have classified the operators by category using the Category attribute from NUnit. With this, you can still decide to run tests of only a single category (using, for example, the /include switch from nunit-console) or visually distinguish them in the GUI.

  • If someone wants to use an operator that is not implemented as an extension method or invoke it without the extension invocation syntax, then the compiler may be confused between System.Linq.Enumerable and MoreLinq.Pull.Enumerable. The conflict occurs only if the System.Linq and MoreLinq.Pull namespaces are imported in the same source file. This could be resolved in one of two ways: (i) we use MoreEnumerable instead of Enumerable for the class name in MoreLINQ or (ii) let the user of the code decide by introducing type aliases to her liking, as in:

    using LinqEnumerable = System.Linq.Enumerable;
    using MoreEnumerable = MoreLinq.Pull.Enumerable;

After the review, I'll merge this branch into the trunk.

Thanks,
Atif

Jon Skeet

unread,
Apr 5, 2009, 2:40:34 AM4/5/09
to moreli...@googlegroups.com
(Second attempt to send - this time as jonatha...@googlemail.com)

My initial thought - not having looked at the code, which I'll try to do tonight - is that naming it MoreEnumerable would be a sensible thing to do. I don't like name clashes when they're easily avoidable.

Jon

2009/4/4 Atif Aziz <aziz...@gmail.com>

Atif Aziz

unread,
Apr 5, 2009, 11:46:41 AM4/5/09
to moreli...@googlegroups.com
is that naming it MoreEnumerable would be a sensible thing to do.

On the whole I agree and this is a quick change to make once the overall structure seems like a good idea moving forward. For now, I used Enumerable to get the impression how these conflicts would appear and get tackled in code.


not having looked at the code, which I'll try to do tonight

Looking forward to the rest of the feedback.

- Atif

Jon Skeet

unread,
Apr 5, 2009, 3:22:32 PM4/5/09
to moreli...@googlegroups.com
Okay, further thoughts:

1) I suggest we have a MoreEnumerable.cs to house the XML documentation for the class itself.
2) Other files can be subservient to that (i.e. expand MoreEnumerable.cs to see them all)
3) IdentityFunc.cs in the MoreLinq namespace should probably be Functions.cs so we can have other ones.
4) I don't see the benefit of making the test class partial. Just name each test class the same as its test file. That shouldn't be confusing, and will be easier to handle.

All of this could be done post-merge should we wish.

Basically +1 for merging fairly soon. Now I have a reasonably powerful netbook I should be able to do some work on MoreLinq on the train.

Jon


2009/4/5 Atif Aziz <aziz...@gmail.com>

Atif Aziz

unread,
Apr 6, 2009, 3:40:21 AM4/6/09
to moreli...@googlegroups.com
1) I suggest we have a MoreEnumerable.cs to house the XML documentation for the class itself.

Sounds good. Although I very much like the sound of MoreEnumerable, have you also given thought to PullEnumerable? I'm asking in case at some point we imagine having a PushEnumerable too? If both, the MoreLinq.Pull and future-MoreLinq.Push namespaces end up containing MoreEnumerable then again one source file using push and pull models may have to resolve naming conflicts. Mind you, this will be very rare and only if someone uses a non-extension invocation style (as is sometimes required for LINQ members like Enumerable.Empty).


3) IdentityFunc.cs in the MoreLinq namespace should probably be Functions.cs so we can have other ones.

Perhaps we can cross that bridge when we get there? IdentityFunc is internal so I'm not sure worrying about it right now is going to make much difference. Settling on the public structure was more of a burning issue as you don't want too make too many breaking changes with projects already using MoreLINQ.


4) I don't see the benefit of making the test class partial. Just name each test class the same as its test file. That shouldn't be confusing, and will be easier to handle.

Did it for consistency for now, but you suggest having a test fixture per operator and then put its test cases in there?


All of this could be done post-merge should we wish.

Definitely!

Basically +1 for merging fairly soon.

Cool! Will merge some time later today.

- Atif

konrad_...@madrat.net

unread,
Apr 6, 2009, 4:43:59 AM4/6/09
to MoreLINQ Developers
On Apr 5, 9:22 pm, Jon Skeet <jonathan.sk...@googlemail.com> wrote:
> Okay, further thoughts:
> 1) I suggest we have a MoreEnumerable.cs to house the XML documentation for
> the class itself.
> 2) Other files can be subservient to that (i.e. expand MoreEnumerable.cs to
> see them all)

If I understand right, this requires “hacking” the csproj XML file,
right?

Anyway, I'm all in favour of the changes. They make usage much
simpler, IMHO. I'd really like to participate more but I've just
banned Windows from my system completely and terminal + VIM is really
no alternative (at least for .NET programming). MonoDevelop is still
completely unusable on OS X. I'll see what I can do though.

Konrad

Jon Skeet

unread,
Apr 6, 2009, 5:12:25 AM4/6/09
to moreli...@googlegroups.com
2009/4/6 Atif Aziz <aziz...@gmail.com>

1) I suggest we have a MoreEnumerable.cs to house the XML documentation for the class itself.

Sounds good. Although I very much like the sound of MoreEnumerable, have you also given thought to PullEnumerable? I'm asking in case at some point we imagine having a PushEnumerable too? If both, the MoreLinq.Pull and future-MoreLinq.Push namespaces end up containing MoreEnumerable then again one source file using push and pull models may have to resolve naming conflicts. Mind you, this will be very rare and only if someone uses a non-extension invocation style (as is sometimes required for LINQ members like Enumerable.Empty).

Push LINQ doesn't really have an "enumerable" type - it works a bit differently.

However, I have been a bit concerned about the "Pull" bit - I suspect it may confuse people. I wonder whether we shouldn't just move all the pull stuff into MoreLinq, and still have MoreLinq.Push when I get round to moving that. "Pull" is clearly the dominant model - I don't think we really need namespace parity. That was a mistake on my part.
 
3) IdentityFunc.cs in the MoreLinq namespace should probably be Functions.cs so we can have other ones.

Perhaps we can cross that bridge when we get there? IdentityFunc is internal so I'm not sure worrying about it right now is going to make much difference. Settling on the public structure was more of a burning issue as you don't want too make too many breaking changes with projects already using MoreLINQ.

Yup, that's fair enough.
 
4) I don't see the benefit of making the test class partial. Just name each test class the same as its test file. That shouldn't be confusing, and will be easier to handle.

Did it for consistency for now, but you suggest having a test fixture per operator and then put its test cases in there?

Yup. I think pragmatism beats consistency this time :)
 
All of this could be done post-merge should we wish.

Definitely!

Basically +1 for merging fairly soon.

Cool! Will merge some time later today.

Great.

Jon
 

Jon Skeet

unread,
Apr 6, 2009, 5:14:09 AM4/6/09
to moreli...@googlegroups.com
2009/4/6 <konrad_...@madrat.net>


On Apr 5, 9:22 pm, Jon Skeet <jonathan.sk...@googlemail.com> wrote:
> Okay, further thoughts:
> 1) I suggest we have a MoreEnumerable.cs to house the XML documentation for
> the class itself.
> 2) Other files can be subservient to that (i.e. expand MoreEnumerable.cs to
> see them all)

If I understand right, this requires “hacking” the csproj XML file,
right?

Yup - it's pretty easy though. You just add a "DependentUpon" element or something like that. I have examples in MiscUtil :)
 
Anyway, I'm all in favour of the changes. They make usage much
simpler, IMHO. I'd really like to participate more but I've just
banned Windows from my system completely and terminal + VIM is really
no alternative (at least for .NET programming). MonoDevelop is still
completely unusable on OS X. I'll see what I can do though.

It'd be nice to just check that it all works on Mono periodically at least. I'm going to be working on getting Protocol Buffers building with Mono soon...

Jon

konrad_...@madrat.net

unread,
Apr 6, 2009, 6:58:43 AM4/6/09
to MoreLINQ Developers
> It'd be nice to just check that it all works on Mono periodically at least.
> I'm going to be working on getting Protocol Buffers building with Mono
> soon...

Well, Mono turned out to be a catastrophe since the gmcs compiler
apparently treats several warnings as errors by default -- which is
not that desirable in the test code. Still, I guess the code should be
warning-free anyway. Should I check in the necessary #pragma or is
there a better solution (configuring the compiler via `/nowarn` etc.)?

Jon Skeet

unread,
Apr 6, 2009, 7:31:35 AM4/6/09
to moreli...@googlegroups.com
Once it's all in, I'll try to fix the warnings anyway - why do we have them? (I haven't checked what they are yet.)

If we really can't avoid them, it's definitely better to use #pragma selectively than to use a compiler switch. We should probably treat warnings as errors in VS too.

konrad_...@madrat.net

unread,
Apr 6, 2009, 8:03:59 AM4/6/09
to MoreLINQ Developers
> Once it's all in, I'll try to fix the warnings anyway - why do we have them?
> (I haven't checked what they are yet.)

I've taken the liberty to fix them using pragmas. The
`NullReferenceException` warnings cannot really be prevented since
they *should* be triggered for the test (although it could be argued
that these tests are kind of useless since even the compiler
recognizes the error). The other warning relates to the implementation
of `Consume`. An alternative, warning-free implementation wouldn't use
`foreach`. I don't find much benefit in it:

\\\
using (IEnumerator<T> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
}
}
///

Konrad

Atif Aziz

unread,
Apr 6, 2009, 9:56:16 AM4/6/09
to moreli...@googlegroups.com
I wonder whether we shouldn't just move all the pull stuff into MoreLinq, and still have MoreLinq.Push when I get round to moving that. "Pull" is clearly the dominant model - I don't think we really need namespace parity.

I'll second that move any day! =)


Did it for consistency for now, but you suggest having a test fixture per operator and then put its test cases in there?
Yup. I think pragmatism beats consistency this time :)

I thought I was being pragmatic, but never mind. :) I'll let you take care of it then.


Cool! Will merge some time later today.
Great.

Merged in r82. Feel free to close issue #23.

- Atif

Atif Aziz

unread,
Apr 6, 2009, 10:00:18 AM4/6/09
to moreli...@googlegroups.com
Konrad,


I'll try to fix the warnings anyway - why do we have them? (I haven't checked what they are yet.)
I've taken the liberty to fix them using pragmas.

I think it would have been a good idea to create a new issue out of this and put the warning details in there, followed by a fix. Perhaps this should be forked to a different thread if it needs further discussion?

Jon Skeet

unread,
Apr 6, 2009, 12:46:35 PM4/6/09
to moreli...@googlegroups.com
2009/4/6 Atif Aziz <aziz...@gmail.com>

I wonder whether we shouldn't just move all the pull stuff into MoreLinq, and still have MoreLinq.Push when I get round to moving that. "Pull" is clearly the dominant model - I don't think we really need namespace parity.

I'll second that move any day! =)

Done in r83.
 
Did it for consistency for now, but you suggest having a test fixture per operator and then put its test cases in there?
Yup. I think pragmatism beats consistency this time :)

I thought I was being pragmatic, but never mind. :) I'll let you take care of it then.

 Again, done in r83.

I haven't made all the main files depend on "Enumerable" yet, nor have I renamed Enumerable to MoreEnumerable. Coming soon...

Jon

Jon Skeet

unread,
Apr 6, 2009, 12:55:37 PM4/6/09
to moreli...@googlegroups.com
2009/4/6 Jon Skeet <jonatha...@googlemail.com>

I haven't made all the main files depend on "Enumerable" yet, nor have I renamed Enumerable to MoreEnumerable. Coming soon...

Done both in r84.

Jon

Reply all
Reply to author
Forward
0 new messages