Startup performance improvement

134 views
Skip to first unread message

Peter Hsu

unread,
Jun 24, 2014, 7:29:53 PM6/24/14
to re-mot...@googlegroups.com
Hi

I am working on AspnetVNext and our current focus is to cut down our application start up time. We took a profile and identified that we are spending excessive JIT time on a few ReLinq classes' static constructors. For example SumExpressionNode. I am currently seeing initialization like this.

public static readonly MethodInfo[] SupportedMethods = new[] {
   
GetSupportedMethod (() => Queryable.Sum ((IQueryable<decimal>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<decimal?>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<double>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<double?>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<int>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<int?>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<long>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<long?>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<float>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<float?>) null)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (decimal) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (decimal?) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (double) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (double?) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (int) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (int?) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (long) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (long?) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (float) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (float?) 0)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<decimal>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<decimal?>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<double>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<double?>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<int>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<int?>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<long>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<long?>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<float>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<float?>) null)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (decimal) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (decimal?) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (double) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (double?) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (int) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (int?) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (long) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (long?) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (float) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (float?) 0))
  };


This code cause SumExpressionNode..cctor() to become huge in IL and even bigger in jitted code. Even the run performance is not ideal. On the other hand the following code would run much faster and has a tiny jit size compare to above code.

SupportedMethods = new Type[] { typeof(Queryable), typeof(Enumerable) }
       
.SelectMany(t => t.GetMethods(BindingFlags.Public | BindingFlags.Static))
       
.Where(m => "Sum" == m.Name)
       
.Select(GetRegisterableMethodDefinition)
       
.ToArray();

I understand the argument of explicitly enumerate supported method instead of reflectively scan for method to support. However, this is a big performance improvement we are talking about (very conservative estimation is at least 0.1~0.2 second by our profile). I have implemented a patch which applies this new code to expression nodes that support all methods of a specific name (All, Any, Average, Cast, Count, DefaultIfEmpty, First, Last, LongCount, Max, Min, OfType, Reverse, Single, Skip, Sum, Take). I also included the unit test so we can track any (unliky) API change for Linq.

I would appreciate it if I can get some input and we can figure out if this is a solution we can go forward with.

thanks
Peter
ScanLinqMethods.patch

Michael Ketting

unread,
Jun 25, 2014, 2:14:45 AM6/25/14
to re-mot...@googlegroups.com
Hi Peter!

Thanks for the important feedback on the startup performance and the suggested solution. I'm definitely interested in fixing this and crearted a JIRA issue: https://www.re-motion.org/jira/browse/RM-6184

First off, I'd like to understand a bit more about the actual issue (and should also write a performance repro sample to test this locally :) ) The two major changes you did (unless I missed something when I checked over the patch) was to simplify the code in the static constructors and to replace the following expression tree analysis with a reflection call to filter all methods of the Queryable and Enumerable types.

public static MethodInfo GetMethod<T> (Expression<Func<T>> wrappedCall)
   
{
     
ArgumentUtility.CheckNotNull ("wrappedCall", wrappedCall);
 
     
switch (wrappedCall.Body.NodeType)
     
{
       
case ExpressionType.Call:
         
return ((MethodCallExpression) wrappedCall.Body).Method;
       
case ExpressionType.MemberAccess:
         
var memberExpression = (MemberExpression) wrappedCall.Body;
         
var property = memberExpression.Member as PropertyInfo;
         
var method = property != null ? property.GetMethod : null;
         
if (method != null)
           
return method;
         
break;
     
}
 
     
throw new ArgumentException (string.Format ("Cannot extract a method from the given expression '{0}'.", wrappedCall.Body), "wrappedCall");
   
}

Did your analysis also show a performance problem with the way we retrived the MethodInfo from the Expression or was this simply a side-effect of your refactoring? To put it differently: From your experience, would it be possible to get the same performance improvement by moving current logic used to populate the static field to a point in  time after the type is initialized? The reason I'm asking is because I've wanted to change the "SupportedMethod" field to factory methods (but keep the existing, taxative syntax for retrieve the actual methods) for quite some time but there's never been an actual reason to do so before, except code beautification. If you don't know the answer, that's okay, too, I'll just need to performance test it then.

Best regards, Michael
    GetSupportedMethod (() =><spa
...

Peter Hsu

unread,
Jun 25, 2014, 3:14:01 PM6/25/14
to re-mot...@googlegroups.com

Thanks Michael

 

Sure, let me clarify by backing up a bit and tell you the whole story.

 

The motivation of this change goes back to our JIT analysis. We took a startup profile of our sample Mvc MusicStore app https://github.com/aspnet/MusicStore and end up seeing significant JIT time and JIT sizes on some of these classes's static constructor. (Significant on our perspective anyway)

 

Row Labels

JitTime MSec

IL Size

Native Size

Remotion.Linq.Parsing.Structure.IntermediateModel.SumExpressionNode..cctor()

15.046

4609

14480

Remotion.Linq.Parsing.Structure.IntermediateModel.AverageExpressionNode..cctor()

22.481

4609

14508

Remotion.Linq.Parsing.Structure.IntermediateModel.MaxExpressionNode..cctor()

9.569

2643

8354

Remotion.Linq.Parsing.Structure.IntermediateModel.MinExpressionNode..cctor()

11.749

2643

8354

 

I decided to take a look. I see the existing code is trying to compensate the fact that it is no trivial way to instantiate instances of MethodInfo of an existing class. This has impact on two ways.

 

1. The IL size and Native size become significant because note that delegates are compiled into ad-hoc classes. You can use ILDASM to check how big the ..cctor() gets

2. As you mentioned the computation needed to pull out the MethodInfo is significant, not to mention there was also cost in instantiating the Expression to begin with.


I got curious and extracted the code above for experiment. I end up seeing the following micro benchmark data

 

SumExperssionNode

Run time (ms)

IL Size (ms)

JIT Size (ms)

Profiled JIT Time (ms)

Expression based static constructor

15

4,609

11,586

10

Reflection based static constructor

2

130

310

3.3

 

(Note that the native size of my experiment is smaller than the first chart because my lab code is just a snapshot of SumExpressionNode)

 

I see value in terms of performance in moving  the population of SupportedMethod to a factory method. I was thinking about the same except I was hesitate to embark on such a significant code change. What I have in mind would be something like that.

void PopulateSupportedMethods()
{
       
var analyzer = new LinqMethodsAnalyzer();
        analyzer
.SetGlobalExclusion(NoUnsupportedDelegates);

        analyzer
.RegisterNode(SumExpressionNode.SupportedMethods, null, "Sum");
        analyzer
.RegisterNode(AggregateExpressionNode.SupportedMethods, AggregateWithSeed, "Aggregate");
        analyzer
.RegisterNode(FirstExpressionNode.SupportedMethods, null, "First", "FirstOrDefault");
       
//....
}

class LinqMethodsAnalyzer
{
       
class MethodScanLogic
       
{
           
public IList<MethodInfo> Target { get; set; }
           
public Predicate<MethodInfo> Inclusion { get; set; }
       
}

       
IDictionary<string, MethodScanLogic> _registry;
       
Predicate<MethodInfo> _globalInclusion;

       
public void SetGlobalExclusion(Predicate<MethodInfo> inclusionRule)
       
{
            _globalInclusion
= inclusionRule;
       
}

        // For each Linq method in TargetMethodNames, if inclusion rule applies, it should be added to the target list
       
public void RegisterNode(IList<MethodInfo> target, Predicate<MethodInfo> inclusionRule, params string[] TargetMethodNames)
       
{
           
foreach (var mName in TargetMethodNames)
           
{
                _registry
.Add(mName, new MethodScanLogic() { Target = target, Inclusion = inclusionRule });
           
}
       
}

       
public void PopulateNodes()
       
{
           
foreach (var m in new Type[] { typeof(IQueryable), typeof(IEnumerable) }.SelectMany(t => t.GetRuntimeMethods()))
           
{
               
if (m.IsStatic && m.IsPublic)
               
{
                   
MethodScanLogic localConfig;
                    _registry
.TryGetValue(m.Name, out localConfig);

                   
if (localConfig != null)
                   
{
                       
// this is where performance might get hit a bit because global rules may be expensive
                       
// because we are looking for generic types and in the case of Select, SelectMany and Where we even need
                       
// to look into the generic parameter of parameters to exclude methods that has start indicies
                       
// We can workaround this by getting rid of the global rule and making rules specific to each registry
                       
// This way we do not have to check for parameters that might not exist for some methods
                       
if ((_globalInclusion == null ? true : !_globalInclusion(m))
                           
&& (localConfig.Inclusion == null ? true : !localConfig.Inclusion(m)))
                       
{
                            localConfig
.Target.Add(m);
                       
}
                   
}
               
}
           
}
       
}
   
}


One benefit of this design is that we only scan the types once and assign them to the right nodes. I tried playing around in this direction but things got quite complicated and I am not sure if this arguably much more complex design can be justified. Therefore I decided to go with the simple route and just go for nodes that simply includes all linq methods of particular names without exclusion (inclusion) logics 



On Tuesday, June 24, 2014 11:14:45 PM UTC-7, Michael Ketting wrote:
Hi Peter!

Thanks for the important feedback on the startup performance and the suggested solution. I'm definitely interested in fixing this and crearted a JIRA issue: https://www.re-motion.org/jira/browse/RM-6184

First off, I'd like to understand a bit more about the actual issue (and should also write a performance repro sample to test this locally :) ) The two major changes you did (unless I missed something when I checked over the patch) was to simplify the code in the static constructors and to replace the following expression tree analysis with a reflection call to filter all methods of the Queryable and Enumerable types.

public static MethodInfo GetMethod<T> (Expression<Func<T>> wrappedCall)
   
{
     
ArgumentUtility.CheckNotNull ("wrappedCall", wrappedCall);
 
     
switch (wrappedCall.Body.NodeType)
     
{
       
case ExpressionType.Call:
         
return ((MethodCallExpression) wrappedCall.Body).Method;
       
case ExpressionType.MemberAccess:
         
var memberExpression = (MemberExpression) wrappedCall.Body;
         
var property = memberExpression.Member as PropertyInfo;
         
var method = property != null ? property.GetMethod : null;
         
if (method != null)
           
return method;
         
break;
     
}
 
     
throw new ArgumentException (string.Format ("Cannot extract a method from the given expression '{0}'.", wrappedCall.Body), "wrappedCall");
   
}

Did your analysis also show a performance problem with the way we retrived the MethodInfo from the Expression or was this simply a side-effect of your refactoring? To put it differently: From your experience, would it be possible to get the same performance improvement by moving current logic used to populate the static field to a point in  time after the type is initialized? The reason I'm asking is because I've wanted to change the "SupportedMethod" field to factory methods (but keep the existing, taxative syntax for retrieve the actual methods) for quite some time but there's never been an actual reason to do so before, except code beautification. If you don't know the answer, that's okay, too, I'll just need to performance test it then.

Best regards, Michael

On Wednesday, June 25, 2014 1:29:53 AM UTC+2, Peter Hsu wrote:
Hi

I am working on AspnetVNext and our current focus is to cut down our application start up time. We took a profile and identified that we are spending excessive JIT time on a few ReLinq classes' static constructors. For example SumExpressionNode. I am currently seeing initialization like this.

...

Peter Hsu

unread,
Jun 25, 2014, 6:29:54 PM6/25/14
to re-mot...@googlegroups.com
Apologies on some typos in my code because I was changing design from using exclusion rules to using inclusion rules as I wrote
 
void PopulateSupportedMethods()
{
       
var analyzer = new LinqMethodsAnalyzer();

        analyzer
.SetGlobalInclusion(NoUnsupportedDelegates);

        analyzer
.RegisterNode(SumExpressionNode.SupportedMethods, null, "Sum");
        analyzer
.RegisterNode(AggregateExpressionNode.SupportedMethods, AggregateWithNoSeed, "Aggregate");
        analyzer
.RegisterNode(AggregateFromSeedExpressionNode.SupportedMethods, AggregateWithSeed, "Aggregate");
        analyzer
.RegisterNode(FirstExpressionNode.SupportedMethods, null, "First", "FirstOrDefault");
       
//....
}


class LinqMethodsAnalyzer
{
       
class MethodScanLogic
       
{
           
public IList<MethodInfo> Target { get; set; }
           
public Predicate<MethodInfo> Inclusion { get; set; }
       
}

       
IDictionary<string, MethodScanLogic> _registry;
       
Predicate<MethodInfo> _globalInclusion;


       
public void SetGlobalInclusion(Predicate<MethodInfo> inclusionRule)
       
{

            _globalInclusion
= inclusionRule;
       
}

       
// For each Linq method in TargetMethodNames, if inclusion rule applies, it should be added to the target list
       
public void RegisterNode(IList<MethodInfo> target, Predicate<MethodInfo> inclusionRule, params string[] TargetMethodNames)
       
{
           
foreach (var mName in TargetMethodNames)
           
{
                _registry
.Add(mName, new MethodScanLogic() { Target = target, Inclusion = inclusionRule });
           
}
       
}

       
public void PopulateNodes()
       
{
           
foreach (var m in new Type[] { typeof(IQueryable), typeof(IEnumerable) }.SelectMany(t => t.GetRuntimeMethods()))
           
{
               
if (m.IsStatic && m.IsPublic)
               
{
                   
MethodScanLogic localConfig;
                    _registry
.TryGetValue(m.Name, out localConfig);

                   
if (localConfig != null)
                   
{
                       
// this is where performance might get hit a bit because global rules may be expensive
                       
// because we are looking for generic types and in the case of Select, SelectMany and Where we even need
                       
// to look into the generic parameter of parameters to exclude methods that has start indicies
                       
// We can workaround this by getting rid of the global rule and making rules specific to each registry
                       
// This way we do not have to check for parameters that might not exist for some methods
                       
if ((_globalInclusion == null ? true : !_globalInclusion(m))
                           
&& (localConfig.Inclusion == null ? true : !localConfig.Inclusion(m)))
                       
{
                            localConfig
.Target.Add(m);
                       
}
                   
}
               
}
           
}
       
}
   
}


I welcome thoughts on which implementation would be better for ReLinq 

Michael Ketting

unread,
Jun 27, 2014, 12:32:48 PM6/27/14
to re-mot...@googlegroups.com
Hello Peter!

Thanks for the ideas. I'll get back with a more detailed response over the weekend.


Best regards, Michael

On Wednesday, June 25, 2014 1:29:53 AM UTC+2, Peter Hsu wrote:
    GetSupportedMethod (() =><spa
...

Ketting, Michael

unread,
Jun 30, 2014, 3:29:44 AM6/30/14
to re-mot...@googlegroups.com

Hello Peter!

I've now setup a (very) small performance sample:

internal class Program
{
 
private static void Main (string[] args)
 
{
   
var stopwatch = Stopwatch.StartNew();
   
ExpressionTreeParser.CreateDefaultNodeTypeProvider();
    stopwatch
.Stop();
   
Console.WriteLine ("Time taken: {0}", stopwatch.Elapsed);
 
}
}


Since the code is still using reflection to find the relevant types, this sample should not support from pre-jitting, I  hope. Anyhow, with the turnk-code, I got about 70ms (release-build) of time for creating the NodeTypeProviders and after applying your patch, it dropped to 40ms.

The patch still needs a bit of adaption to our coding styles:
1. Whitespaces: We add a whitespace before the opening paranthesis, except when calling a method with an empty signature.
2. Assertion-Methods: We use the Assert.That (actual, Is.EqualTo (actual)) or Assert.That (actual, Is.True) syntax these days. I can't promise that all old tests have already been migrated, but all the newer tests are :)

For the count-check I'm thinking of reusing the previous production code used for registering the MethodInfos:

 

Assert.That (
   
AnyExpressionNode.SupportedMethods,
   
Is.Equivalent (
       
new[]
       
{

          GetSupportedMethod (() => Queryable.Any<object> (null)),

          GetSupportedMethod (() => Queryable.Any<object> (null, null)),

          GetSupportedMethod (() => Enumerable.Any<object> (null)),

          GetSupportedMethod (() => Enumerable.Any<object> (null, null))
    ));
}

That way, we can keep a compact and easily reviewed list of supported methods available. It might also be that I can then drop the then-unneeded individual tests.

You second idea is very intriguing, but yeah, it goes a bit father than what I was thinking. I'd like to keep the declaration of the supported methods for each node-type with the node type. I'm thinking something like this:

public class SumExpressionNode : ResultOperatorExpressionNodeBase

{

  public class ResultOperatorProvider : ISupportedMethodInfoProvider

  {

    public ResultOperatorProvider ()

    {

      // Default-ctor is required

    }

    public Type NodeType

    {

      get { return typeof (SumExpressionNode); }

    }

    public IEnumerable<MethodInfoGetSupportedMethodInfos ()

    {

      return ReflectionUtility.QueryableMethods.Where (mi => mi.Name == "Sum")

          .Concat (ReflectionUtility.EnumerableMethods.Where (mi => mi.Name == "Sum"));

    }

  }

  ...
}


Then I can change MethodInfoBasedNodeTypeRegistry to not look up member fields on the NodeTypes but instead simply look for and instantiate the providers. I'm also going to look into getting all the public static methods for Queryable and Enumerable just once and hold them in static fields. Anyhow, that second part can be a follow-up refactoring to the first one.


What do you think?

Best regards, Michael

--
You received this message because you are subscribed to the Google Groups "re-motion Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-de...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Hsu

unread,
Jun 30, 2014, 5:25:38 PM6/30/14
to re-mot...@googlegroups.com
Hey Michael

Apologizes for not actively updating. In fact I am working on a couple of prototypes. I didn't think the results are ready yet because I have not finished implementing testing. I am measuring 3 scenarios using a simple test code like this:

class Program {
   
static void Main(string[] args)   {
   
var sw = new Stopwatch();
    sw
.Start();

   
LinqMethodsAnalyzer.InitializeNodes();
   
VisitAllSupportedMethods();
 
    sw
.Stop();
   
Console.WriteLine("Time elapsed msc: " + sw.ElapsedMilliseconds);
   
Console.Read();
 
}
 
 
// visit all SupportedMethods collection to make sure everything is initialized
 
public static int VisitAllSupportedMethods()
 
{
   
int i = 0;
    i
+= AggregateExpressionNode.SupportedMethods.Count();
    i
+= AggregateFromSeedExpressionNode.SupportedMethods.Count();
    i
+= AllExpressionNode.SupportedMethods.Count();
    i
+= AnyExpressionNode.SupportedMethods.Count();
    i
+= AverageExpressionNode.SupportedMethods.Count();
    i
+= CastExpressionNode.SupportedMethods.Count();
    i
+= ContainsExpressionNode.SupportedMethods.Count();
    i
+= CountExpressionNode.SupportedMethods.Count();
    i
+= DefaultIfEmptyExpressionNode.SupportedMethods.Count();
    i
+= DistinctExpressionNode.SupportedMethods.Count();
    i
+= ExceptExpressionNode.SupportedMethods.Count();
    i
+= FirstExpressionNode.SupportedMethods.Count();
    i
+= GroupByExpressionNode.SupportedMethods.Count();
    i
+= GroupByWithResultSelectorExpressionNode.SupportedMethods.Count();
    i
+= GroupJoinExpressionNode.SupportedMethods.Count();
    i
+= IntersectExpressionNode.SupportedMethods.Count();
    i
+= JoinExpressionNode.SupportedMethods.Count();
    i
+= LastExpressionNode.SupportedMethods.Count();
    i
+= LongCountExpressionNode.SupportedMethods.Count();
    i
+= MaxExpressionNode.SupportedMethods.Count();
    i
+= MinExpressionNode.SupportedMethods.Count();
    i
+= OfTypeExpressionNode.SupportedMethods.Count();
    i
+= OrderByDescendingExpressionNode.SupportedMethods.Count();
    i
+= OrderByExpressionNode.SupportedMethods.Count();
    i
+= ReverseExpressionNode.SupportedMethods.Count();
    i
+= SelectExpressionNode.SupportedMethods.Count();
    i
+= SelectManyExpressionNode.SupportedMethods.Count();
    i
+= SingleExpressionNode.SupportedMethods.Count();
    i
+= SkipExpressionNode.SupportedMethods.Count();
    i
+= SumExpressionNode.SupportedMethods.Count();
    i
+= TakeExpressionNode.SupportedMethods.Count();
    i
+= ThenByDescendingExpressionNode.SupportedMethods.Count();
    i
+= ThenByExpressionNode.SupportedMethods.Count();
    i
+= UnionExpressionNode.SupportedMethods.Count();
    i
+= WhereExpressionNode.SupportedMethods.Count();
   
return i;
 
}
}

I am getting the following results for each scenarios.

 

Cold Start

Warm Start

Trunk Code

616

144

Patch Code

357

68

Methods Analyzer

264

57


This results is run in our lab machine. The Cold Start scenario is using a cache purging utility for windows to simulate reboot. Note that I had made an assumption of not calling on my MethodInfoBasedNodeTypeRegistry
.GetRegisterableMethodDefinition(m, true) on the analyzer class.

The analyzer class look like this so far (I am sure we need to discuss the correctness of the class and fix the code style of this is something we are interested in)

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;
 
namespace Remotion.Linq.Parsing.Structure.IntermediateModel
{
  public class LinqMethodsAnalyzer
  {
    static Func<MethodInfobool> ContainsSeed = m =>
    {
      var firstParamType = m.GetParameters()[1].ParameterType;
      return firstParamType.Name == "TAccumulate";
    };
 
    static Func<MethodInfobool> WithResultSelector = m =>
    {
      return m.GetParameters().Any(p => p.Name == "resultSelector");
    };
 
    static Func<MethodInfobool> NoIndexedSelector = m =>
    {
      // * Select uses the parameter name selector
      // * SelectMany uses selector for most methods but there where causes where there's both
      //   resultSelector and collectionSelector and only collectionSelector can be indexed
      // * Where uses the name predicate
      var seletorParam = m.GetParameters().SingleOrDefault(p => p.Name == "selector" || p.Name == "collectionSelector" || p.Name == "predicate");
      if (seletorParam == null)
      {
        return true;
      }
      else
      {
        var selectorType = seletorParam.ParameterType;
        // Enumerable taks a Func<...> but Querable takes an Expression<Func<...>>
        if (typeof(Expression).GetTypeInfo().IsAssignableFrom(selectorType.GetTypeInfo()))
        {
          selectorType = selectorType.GetTypeInfo().GenericTypeArguments[0];
        }
        return !typeof(int).GetTypeInfo().IsAssignableFrom(selectorType.GetTypeInfo().GenericTypeArguments[1].GetTypeInfo());
      }
    };
 
    static Func<MethodInfobool> NoEqualityComparer = m =>
    {
      return NoDelegateOfType(m, typeof(IEqualityComparer<>));
    };
 
    static Func<MethodInfobool> NoComparer = m =>
    {
      return NoDelegateOfType(m, typeof(IComparer<>));
    };
 
    static bool NoDelegateOfType(MethodInfo m, Type genericDelegateType)
    {
      foreach (var p in m.GetParameters())
      {
        var paramType = p.ParameterType;
        if (paramType.GetTypeInfo().IsGenericType &&
          genericDelegateType.GetTypeInfo().IsAssignableFrom(paramType.GetGenericTypeDefinition().GetTypeInfo()))
        {
          return false;
        }
      }
      return true;
    }
 
    public static void InitializeNodes()
    {
      var analyzer = new LinqMethodsAnalyzer();
 
      analyzer.RegisterNode(
        m => (ContainsSeed(m) ?
                AggregateFromSeedExpressionNode.SupportedMethods :
                AggregateExpressionNode.SupportedMethods),
        "Aggregate");
 
      analyzer.RegisterNode(AllExpressionNode.SupportedMethods, "All");
      analyzer.RegisterNode(AnyExpressionNode.SupportedMethods, "Any");
      analyzer.RegisterNode(AverageExpressionNode.SupportedMethods, "Average");
      analyzer.RegisterNode(CastExpressionNode.SupportedMethods, "Cast");
      analyzer.RegisterNode(ContainsExpressionNode.SupportedMethods, NoEqualityComparer, "Contains");
      analyzer.RegisterNode(CountExpressionNode.SupportedMethods, "Count");
      analyzer.RegisterNode(DefaultIfEmptyExpressionNode.SupportedMethods, "DefaultIfEmpty");
      analyzer.RegisterNode(DistinctExpressionNode.SupportedMethods, NoEqualityComparer, "Distinct");
      analyzer.RegisterNode(ExceptExpressionNode.SupportedMethods, NoEqualityComparer, "Except");
      analyzer.RegisterNode(FirstExpressionNode.SupportedMethods, "First""FirstOrDefault");
 
      analyzer.RegisterNode(
        m =>
        {
          if (NoEqualityComparer(m))
          {
            return WithResultSelector(m) ?
                    GroupByWithResultSelectorExpressionNode.SupportedMethods :
                    GroupByExpressionNode.SupportedMethods;
          }
          else
          {
            return null;
          }
        }, "GroupBy");
 
      analyzer.RegisterNode(GroupJoinExpressionNode.SupportedMethods, NoEqualityComparer, "GroupJoin");
      analyzer.RegisterNode(IntersectExpressionNode.SupportedMethods, NoEqualityComparer, "Intersect");
      analyzer.RegisterNode(JoinExpressionNode.SupportedMethods, NoEqualityComparer, "Join");
      analyzer.RegisterNode(LastExpressionNode.SupportedMethods, "Last""LastOrDefault");
      analyzer.RegisterNode(LongCountExpressionNode.SupportedMethods, "LongCount");
      analyzer.RegisterNode(MaxExpressionNode.SupportedMethods, "Max");
      analyzer.RegisterNode(MinExpressionNode.SupportedMethods, "Min");
      analyzer.RegisterNode(OfTypeExpressionNode.SupportedMethods, "OfType");
      analyzer.RegisterNode(OrderByDescendingExpressionNode.SupportedMethods, NoComparer, "OrderByDescending");
      analyzer.RegisterNode(OrderByExpressionNode.SupportedMethods, NoComparer, "OrderBy");
      analyzer.RegisterNode(ReverseExpressionNode.SupportedMethods, "Reverse");
      analyzer.RegisterNode(SelectExpressionNode.SupportedMethods, NoIndexedSelector, "Select");
      analyzer.RegisterNode(SelectManyExpressionNode.SupportedMethods, NoIndexedSelector, "SelectMany");
      analyzer.RegisterNode(SingleExpressionNode.SupportedMethods, "Single""SingleOrDefault");
      analyzer.RegisterNode(SkipExpressionNode.SupportedMethods, "Skip");
      analyzer.RegisterNode(SumExpressionNode.SupportedMethods, "Sum");
      analyzer.RegisterNode(TakeExpressionNode.SupportedMethods, "Take");
      analyzer.RegisterNode(ThenByDescendingExpressionNode.SupportedMethods, NoComparer, "ThenByDescending");
      analyzer.RegisterNode(ThenByExpressionNode.SupportedMethods, NoComparer, "ThenBy");
      analyzer.RegisterNode(UnionExpressionNode.SupportedMethods, NoEqualityComparer, "Union");
      analyzer.RegisterNode(WhereExpressionNode.SupportedMethods, NoIndexedSelector, "Where");
 
      analyzer.PopulateSupportedMethod();
 
      // Pick up member properties that CountExpressionNode supports
      CountExpressionNode.SupportedMethods.Add(GetPropertyAsRegisterableMethod(typeof(List<int>), "Count"));
      CountExpressionNode.SupportedMethods.Add(GetPropertyAsRegisterableMethod(typeof(ICollection<int>), "Count"));
      CountExpressionNode.SupportedMethods.Add(GetPropertyAsRegisterableMethod(typeof(ICollection), "Count"));
      CountExpressionNode.SupportedMethods.Add(GetPropertyAsRegisterableMethod(typeof(Array), "Length"));
 
      var arrayListType = Type.GetType("System.Collections.ArrayList"false);
      if (arrayListType != null)
      {
        var arrayListCountMethod = GetPropertyAsRegisterableMethod(arrayListType, "Count");
        System.Diagnostics.Contracts.Contract.Assert(arrayListCountMethod != null);
        if (arrayListCountMethod != null)
        {
          CountExpressionNode.SupportedMethods.Add(arrayListCountMethod);
        }
      }
 
      var arrayLongCountMethod = GetPropertyAsRegisterableMethod(typeof(Array), "LongLength");
      if (arrayLongCountMethod != null)
      {
        LongCountExpressionNode.SupportedMethods.Add(arrayLongCountMethod); ;
      }
    }
 
    static MethodInfo GetPropertyAsRegisterableMethod(Type t, string propName)
    {
      var property = t.GetRuntimeProperty(propName);
 
      // Under what condition would a property be not present? i.e. ArrayList.Count and Array.LongLength
      // System.Diagnostics.Contracts.Contract.Assert(property != null);
      if (property == null)
      {
        return null;
      }
      else
      {
        var getMethod = property.GetMethod;
        return getMethod;
        //return MethodInfoBasedNodeTypeRegistry.GetRegisterableMethodDefinition(getMethod, throwOnAmbiguousMatch: true);
      }
    }
 
    class InclusionLogic
    {
      public IList<MethodInfo> Target { getset; }
      public Func<MethodInfobool> Inclusion { getset; }
    }
 
    private IDictionary<stringInclusionLogic> _directRegistry = new Dictionary<stringInclusionLogic>();
    private IDictionary<stringFunc<MethodInfoIList<MethodInfo>>> _dispatchRegistry = new Dictionary<stringFunc<MethodInfoIList<MethodInfo>>>();
 
    // For each Linq method in TargetMethodNames, if inclusion rule applies, it should be added to the target list
    private void RegisterNode(IList<MethodInfo> target, Func<MethodInfobool> inclusionRule, params string[] TargetMethodNames)
    {
      foreach (var mName in TargetMethodNames)
      {
        _directRegistry.Add(mName, new InclusionLogic() { Target = target, Inclusion = inclusionRule });
      }
    }
    private void RegisterNode(IList<MethodInfo> target, params string[] TargetMethodNames)
    {
      RegisterNode(target, null, TargetMethodNames);
    }
 
    private void RegisterNode(Func<MethodInfoIList<MethodInfo>> dispatchLogic, params string[] TargetMethodNames)
    {
      foreach (var mName in TargetMethodNames)
      {
        _dispatchRegistry.Add(mName, dispatchLogic);
      }
    }
 
    public void PopulateSupportedMethod()
    {
      foreach (var m in new Type[] { typeof(Queryable), typeof(Enumerable) }.SelectMany(t => t.GetRuntimeMethods()))
      {
        //if (m.IsStatic && m.IsPublic)
        {
          InclusionLogic directConfig;
          _directRegistry.TryGetValue(m.Name, out directConfig);
 
          if (directConfig != null)
          {
            if (directConfig.Inclusion == null ? true : !directConfig.Inclusion(m))
            {
              //directConfig.Target.Add(MethodInfoBasedNodeTypeRegistry.GetRegisterableMethodDefinition(m, throwOnAmbiguousMatch: true));
              directConfig.Target.Add(m);
            }
          }
          else
          {
            Func<MethodInfoIList<MethodInfo>> dispatcher;
            _dispatchRegistry.TryGetValue(m.Name, out dispatcher);
 
            if (dispatcher != null)
            {
              //var target = dispatcher(MethodInfoBasedNodeTypeRegistry.GetRegisterableMethodDefinition(m, throwOnAmbiguousMatch: true));
              var target = dispatcher(m);
              if (target != null)
              {
                target.Add(m);
              }
            }
          }
        }
      }
    }
  }
}

Peter Hsu

unread,
Jul 1, 2014, 4:51:37 PM7/1/14
to re-mot...@googlegroups.com
So I am seeing quite a bit of unit tests failing, even with just the version 1 patch. Turns out it may be because unit tests makes assumption of the order of SupportedMethods. I.E. FirstExpressionNodeTest.Apply_DefaultAllowed pick the first SupportedMethod for testing. Even after I work around this constraint I am seeing it fails somewhere else. It could be quite consuming to fix all unit tests. I will just put this item on my back burner from now on.
 genericDelegateType)
    {
      foreach</s
...

Peter Hsu

unread,
Jul 1, 2014, 7:03:38 PM7/1/14
to re-mot...@googlegroups.com
I stand corrected. These unit tests version 1 patch broke was relatively to fix. I have them fixed now. However I am still have problem with the tests Analyzer broke. Potentially we can just commit to the first solution.

The thing for the version 1 patch is that it still is faster. However, we have these 2 sets of Nodes. The simple ones uses reflection to find supported method and the complex one uses explicit declaration. Thus I am not sure if we should commit to that one. Analyzer provides a much centralized logic and we can understand what methods can be support and what cannot in one single class. Not to mention that it is potentially faster.

Ketting, Michael

unread,
Jul 2, 2014, 6:03:51 PM7/2/14
to re-mot...@googlegroups.com
Hello Peter!

Thank you for your analysis of the matter.

Based on your thoughts, I believe a combination of methods 1 and 2 could be quite beneficial. Continuing from my previous mail, I recommend an iterative approach to the problem.

The first step would be to replace each SupportedMethods-field with a corresponding nested result operator providers type, utilizing a cached list of MethodInfos for Queryable and Enumerable:

public class SumExpressionNode : ResultOperatorExpressionNodeBase

{

  public class ResultOperatorProvider : ISupportedMethodInfoProvider

  {

    public ResultOperatorProvider ()

    {

      // Default-ctor is required

    }

    public Type NodeType

    {

      get { return typeof (SumExpressionNode); }

    }

    public IEnumerable<MethodInfoGetSupportedMethodInfos ()

    {

      return ReflectionUtility.QueryableMethods.Where (mi => mi.Name == "Sum")

          .Concat (ReflectionUtility.EnumerableMethods.Where (mi => mi.Name == "Sum"));

    }

  }

  ...
}


Then, in the MethodInfoBasedNodeTypeRegistry, inject a sequence of those instances instead of a sequence of types into the CreateFromTypes method.


Next, in ExpressionTreeParser, change the assembly/type lookup to a taxative instantiating and listing of the individual ResultOperatorProviders.


Finally (or first, depending on preferences), refactoring of the tests. For each node type test, we can replace the individual tests of the supported methods to a combined check:

Assert.That (
   
AnyExpressionNode.SupportedMethods,
   
Is.Equivalent (
       
new[]
       
{

          GetSupportedMethod (() => Queryable.Any<object> (null)),

          GetSupportedMethod (() => Queryable.Any<object> (null, null)),

          GetSupportedMethod (() => Enumerable.Any<object> (null)),

          GetSupportedMethod (() => Enumerable.Any<object> (null, null))
    ));
}

It would be necessary to
copy/move the GetSupportedMethod() method over to the tests into a utility class, but that's details.

When all this is done, we should have the same performance gain as would be possible via your intended Analyzer-approach (okay, the cached method-list would need to be iterated about 30 times, but that really should not matter, considering it's just 100-200 methods total in the two lists). But we would still retain a declaration of the supported methods without
adding much complexity to it.

In a follow up, we can do additional refactorings, such as naming-cleanups of the involved methods and we could also move the individual ResultOperatorProviders into a common parent-class to group them into a single file with nested types. Originally, we felt it helpful to keep them with the individual node types, but I'm not hung up on that.

I can setup a branch for this tomorrow and once the ICLA is done, I can also start applying your patches into the branch so we can discuss this on an incremental basis. (on a side note, yes, I'm already working on getting re-linq transplanted over to GitHub, but it's going to be a couple more weeks...)

What do you think of this approach?

Best regards, Michael

From: re-mot...@googlegroups.com [re-mot...@googlegroups.com] on behalf of Peter Hsu [xus...@gmail.com]
Sent: Wednesday, July 02, 2014 01:03
To: re-mot...@googlegroups.com
Subject: Re: [re-motion-dev] Re: Startup performance improvement

--

Michael Ketting

unread,
Jul 3, 2014, 2:11:59 AM7/3/14
to re-mot...@googlegroups.com
Hello Peter!

I've now created the feature branch at https://svn.re-motion.org/svn/Remotion/branches/RM-6184_ReLinq_StartupPerformance.
It also contains the (very simple) performance test to test the startup time. If you can share additional tests tailied to specifically to the starup or have recommendations on how to improve the performance test, I'm very interested. In particular, I'm curious about the time differences between our two measurements and would like to make sure that they're just du to the differences in hardware and not because my performance test isn't corretly measuring all the relevant times.

Best regards, Michael

From: Peter Hsu

Sent: Wednesday, July 02, 2014 01:03
To: re-motion-dev

    i
+= ThenByExpressionNode.SupportedMethods.<span style="color:#6
...

Peter Hsu

unread,
Jul 8, 2014, 8:59:11 PM7/8/14
to re-mot...@googlegroups.com
Hi Michael

Sorry for the late response. This sounds like a good solution. I have implemented something similar in my experiment but your design put things in better places. I would even propose we just have a centralized registry class and not have the SumExpressionNode.ResultOperatorProvider classes. One benefit is that we would then only have to scan through typeof(Enumerable) and typeof(Querable) just once.

So when I put together LinqMethodsAnalyzer, I realized that a lot of unit tests starts to fail and not sure how much more work it would involve in fixing them. I would be happy to check in the code (though it is a bit messy right now) and we can discuss more from here. I see that even the perf branch requires login. Should I sign the ICLA before commit?


    i
+= ThenByExpressionNode.SupportedMethods.<span style="color:#6
...

Michael Ketting

unread,
Jul 9, 2014, 3:09:56 AM7/9/14
to re-mot...@googlegroups.com
Hi Peter!


> Sorry for the late response.
Not a problem.


> This sounds like a good solution. I have implemented something similar in my experiment but your design put things in better places.
Thanks!


> I would even propose we just have a centralized registry class and not have the SumExpressionNode.ResultOperatorProvider classes.
> One benefit is that we would then only have to scan through typeof(Enumerable) and typeof(Querable) just once.
I can see where you're coming from. Had the same concern, thus my recommendation to scan just once over Enumerable and Queryable and cache the result in a list each. That's 300 items total for the two types spread over two lists. We would have to scan those when setting up the operator providers for each expression node type, that's just under 50 scans. I'd very much like to keep the provider-classes in the first iteration and measure the time taken. In fact, I just measuerd on my machine :)

a) To load the methods once and scan the list once with comparing the method name to a string-value takes 0.4ms (~33% for the lists, ~66% for the name-search).
b) To load the methods once and scan the list 50x with comparing the method name to a string-value takes 0.9ms (~20% for the lists, ~80% for the name-search).
c) To load the methods once into Lookups (key=name) and then perform 50 lookups takes 0.63ms (~95% for the lookup-creation), rest for the access)

While this is indeed a factor of 1.5 between options a) and c), the absolute time is still quite below 1 millisecond. I do understand that you're working to creating an incredible development experience where the restart of the system takes virtually no time at all, but I'd like to check out the first (hopefully simple) version of the code first and measure the total system performance. Then we could do the second iteration and see if we should take this further. The advantage of this approach is that we can have a working system with a very well understood transformation from where we are now. I.e. we can add the new unit-tests for the set-compariason, drop the no longer needed tests, rewrite the SupportedMethod lookup for each node type, but don't introduce any new concepts. In the end, everything should still work, and if it doesn't, we can easily check which part broke. That's why I'd like to start with a conservative approach.

I've also checked in my experiments, in case you want to look at them further (https://svn.re-motion.org/svn/Remotion/branches/RM-6184_ReLinq_StartupPerformance/Relinq/PerformanceTests).

Yes, we'd need the ICLA first, please.

Best regards, Michael
...

Peter Hsu

unread,
Jul 11, 2014, 6:25:26 PM7/11/14
to re-mot...@googlegroups.com
Finally get my changes into a somewhat presentable stage. As I am still awaiting my manager's approval for ICLA (he is currently away) I think I should just share the patch file for review. I apologize for the huge patch file and didn't break this down to more stages. I have moved the methods registry logic to one single place and move the explicit declarations to testing. I have also moved a couple of utility methods to testings only since they are no longer needed in the main code stream. Furthermore there are some tests that depend on the SupportedMethdos constants I have to fix into more generic test cases.

There are a few things I am unsure of. I think we need to test the negatives such as what methods should not be supported. I think I will add in that test case next. Also, I have noticed that the code don't support Queryable.Min and Queryable.Max. Why is that?

I haven't had a change to performance test this yet. But I will just post it here for review first.

...
LinqMethods.patch

Peter Hsu

unread,
Jul 11, 2014, 6:37:11 PM7/11/14
to re-mot...@googlegroups.com
Minor update to the patch


    i
+= LongCountExpressionNode.<span styl
...
LinqMethods.patch

Michael Ketting

unread,
Jul 12, 2014, 5:08:36 AM7/12/14
to re-mot...@googlegroups.com
Hello Peter!

Thank you for the update! I can see that you managed to find the bug in your all-in-one approach :)

I've run my simple performance test and your approach is about 50% faster (30ms vs 60ms) than the original version.

There's going to be some refactoring requests coming up, e.g. some better separation of concerns, some coding-style stuff, maybe getting rid of the odd breaking change that I might prefer to avoid by adding an overload, aligning the implementations for method-info based and method-name based lookups, etc... But I'll document those once we have the initial implementation committed in the branch.

In the tests: I noticed that the LinqMethodsAnalyzerTest is quite massive but I think if we change the syntax from

CreateLinqSample ("Max with int?",
 
(IQueryable<int?> q) => Queryable.Max (q),
 
(IEnumerable<int?> e) => Enumerable.Max (e)),

to

CreateLinqSample<int, int> ("Max with int", q => Queryable.Max (q), e => Enumerable.Max (e)),

for the straight-forward samples and remove the new lines between the single line samples, we can make this test fixture a little easier to handle.


> There are a few things I am unsure of. I think we need to test the negatives such as what methods should not be supported.

Yes. With your approach we'll definitely want that. Basically, I think every method in Enumerable and Queryable should be either in the whitelist (via CreateLinqSample) or a blacklist (e.g. "CreateNotSupportedLinqSample").


> Also, I have noticed that the code don't support Queryable.Min and Queryable.Max. Why is that?

I'm not sure I understand that question. We have the Max and Min Node types, you wrote the tests for object-based and number-based overloads, so what's missing?

I've also gone through the LinqMethodsAnalyzer and it's a very comprehensive implementation. Bear with me, it's been early in the morning when I read the code, but I believe I now understand how it all fits together. Anyhow, I spotted some bits that we could simplify and/or separate the concerns of. I'll be documenting those once we have the code in the branch and a performance baseline to work from.

Also, what would also be very helpful, would be if you could help me with setting up the right set of linq-startup performance tests. Basically, you can check out the tests I've written in the PerformanceTests.exe, but I'm not 100% sure that they cover everything that would impact the startup time. That way, I can make sure we don't have performance regressions should we do additional refactorings of the method registrations.

Best regards, Michael

    i
+= LastExpressionNode.SupportedMethods.Count();<span
...

Michael Ketting

unread,
Jul 16, 2014, 1:26:11 PM7/16/14
to re-mot...@googlegroups.com
Hello Peter!

I've had a chance to discuss the proposed LinqMethodsAnalyzer with my colleagues today and a very interesting point has been brought up:

With our present approach, the declaration of the supported methods and the code required to support the respective methods is located in the same place. From the perspective of the present-day re-linq developers, this means should we add support for an additional overload, we can see both the specification and the code in one place. This is actually a feature of our code that we'd like to keep, so long as it is in line with the new performance goals (and I think my approach from earlier in the thread should allow that). Of course, we'd need to measure.

I know that your preference is to get the specifications in a single place and I'd like to get your input on the advantage of that: Does it make using re-linq easier, i.e. paint you a picture of what your system can support, or is it mostly related getting into the code base?

I'm sorry I've brought this aspect up a bit late in the game :/

Best regards, Michael

   
Console.WriteLine("Time elapsed msc: " </s
...

Peter Hsu

unread,
Jul 21, 2014, 2:34:13 PM7/21/14
to re-mot...@googlegroups.com
Hi Michael

thanks for taking the time to review the code. Our startup performance test basically involves launching MvcMusicStore (https://github.com/aspnet/MusicStore) via a script like this

 Write-Host "Launching IIS Express" ...
 $iisexpress
=   ${env:ProgramFiles(x86)} + "\IIS Express\iisexpress.exe"
 
Write-Host "Starting web request ..."
 $timer
= [System.Diagnostics.Stopwatch]::StartNew();
 $timer
.Start();
 $process
= Start-Process -FilePath $iisexpress -ArgumentList ("/path:"+$pwd+" /clr:4.0 /port:8080") -PassThru
 $r
= Invoke-WebRequest -Uri http://localhost:8080

 $timer
.Stop();
 
 
Write-Host Status Code $r.StatusCode
 
 $elapsedTime
= "{0:N2}" -f ($timer.ElapsedMilliseconds/1000.00)
 
Write-Host "Requets Time:" $elapsedTime "seconds"

 
Read-Host "Press enter to kill host..."
 $process
.Kill();

We launch the application via iisexpress and hit the service with 1 request and measure the time for the response to coming back. The code startup scenario is performed on a freshly rebooted machine. (Note that it is easier for us because we are doing this in a lab) We are running the code against our new KRuntime/DesktopCLR and we are measuring both the cold startup (reboot system) and warm startup. Note that our concern is cloud optimization and cold start is as important to us as any other scenario because we are taking consideration of service maintenance and redeployment.

In terms of having specification in the node and code in the same place. I can think of something that can adapt to your requirement. Let me know if this is more suitable. Currently the nodes specific discovery logic is put here:

public MethodInfoBasedNodeTypeRegistry BuildMethodInfoBasedNodeTypeRegistry ()
{
 
RegisterNode (
    m
=> (ContainsSeed (m) ?
           
typeof(AggregateFromSeedExpressionNode) :
           
typeof(AggregateExpressionNode)),
   
"Aggregate");
 
 
RegisterNode (typeof (AllExpressionNode), "All");
 
RegisterNode (typeof (AnyExpressionNode), "Any");
 
RegisterNode (typeof (AverageExpressionNode), "Average");
 
RegisterNode (typeof (CastExpressionNode), "Cast");
 
RegisterNode (typeof (ContainsExpressionNode), NoEqualityComparer, "Contains");
 
RegisterNode (typeof (CountExpressionNode), "Count");
 
RegisterNode (typeof (DefaultIfEmptyExpressionNode), "DefaultIfEmpty");
 
RegisterNode (typeof (DistinctExpressionNode), NoEqualityComparer, "Distinct");
 
RegisterNode (typeof (ExceptExpressionNode), NoEqualityComparer, "Except");
 
RegisterNode (typeof (FirstExpressionNode), "First", "FirstOrDefault");

We can move those back to the node classes

public class AllExpressionNode : ResultOperatorExpressionNodeBase
 
{
   
public static Remotion.Linq.Parsing.Structure.IntermediateModel.LinqMethodsAnalyzer.InclusionLogic SupportsMethods
     
= new LinqMethodsAnalyzer.InclusionLogic ()
     
{
       
MethodNames = new[]{"All"};
     
};

Also like this

  public class AggregateExpressionNode : ResultOperatorExpressionNodeBase
 
{
   
public static Remotion.Linq.Parsing.Structure.IntermediateModel.LinqMethodsAnalyzer.InclusionLogic SupportsMethods
     
= new LinqMethodsAnalyzer.InclusionLogic ()
     
{
       
MethodNames = new[]{"Aggregate"};
       
Includes = m => !ContainsSeed(m);
     
};

Moreover

    public static Remotion.Linq.Parsing.Structure.IntermediateModel.LinqMethodsAnalyzer.InclusionLogic SupportsMethods
     
= new LinqMethodsAnalyzer.InclusionLogic (){
       
MethodNames = new []{ "Count" };
       
AdditionalMethods = GetCountMethods(); /* Implementations are in CountExpressionNode itself */
     
}

Then all LinqMethodsAnalyzer has to do is go to these node classes and look for these rule based declarations. I would actually argue that this declaration is much easier to read for a novice than the current explicit declaration. Personally it took me a long time to figure out what all these nulls really mean even now I have a had time to understand which line specifies which specific overload. Let me know how you think about this and I could fix up a new CR

thanks
Peter

Peter Hsu

unread,
Jul 22, 2014, 5:04:12 PM7/22/14
to re-mot...@googlegroups.com
New patch file. I haven't fix the styling yet.


On Monday, July 21, 2014 11:34:13 AM UTC-7, Peter Hsu wrote:
Hi Michael

CreateLinqSample<int, int> ("Max with int", q => Queryable.Max (q), e => Enumerable.<span style="color:#606
...
SupportedMethodsSpecifications.patch

Michael Ketting

unread,
Jul 23, 2014, 11:25:24 AM7/23/14
to re-mot...@googlegroups.com

Hi Peter!

Thank you for the updated patch and the great input on the performance measuring. I’ll get back to that, but the patch first :)

The SupportedMethods-field is already quite good at grouping things. I’d like to discuss the following additional extension to your concept:

I would like to move the method selection from a specification which basically acts as a filter to a get-method of the following fashion:

public class WhereExpressionNode : MethodCallExpressionNodeBase
{
  
public static IEnumerable<MethodInfoGetSupportedMethods ()
  {
    
return ReflectionUtility.EnumerableAndQueryableMethodsByName["Where"].Where (SupportedMethodSpecifications.NoIndexedSelector);
  }
}

ReflectionUtility caches the total set of methods so we only have a one-time performance hit:

public static class ReflectionUtility
{
  
private static readonly Lazy<ILookup<stringMethodInfo>> s_enumerableAndQueryableMethodsByName =
      
new Lazy<ILookup<stringMethodInfo>> (GetEnumerableAndQueryableMethodsByName);
 
  
public static ILookup<stringMethodInfoEnumerableAndQueryableMethodsByName
  {
    
get { return s_enumerableAndQueryableMethodsByName.Value; }
  }
 
  
private static ILookup<stringMethodInfoGetEnumerableAndQueryableMethodsByName ()
  {
    
var enumerableMethods = typeof (Enumerable).GetRuntimeMethods();
    
var queryableMethods = typeof (Queryable).GetRuntimeMethods();
 
    
return enumerableMethods.Concat (queryableMethods).ToLookup (mi => mi.Name);
  }
}

I would expect the performance of this code to be at least comparable to what we have in the patch*) but there would be one big advantage:

The main reason why we prefer to have the specification of the methods together with the code that has to process those methods is because they are part of the same concern. By additionally depending on the Enumerable and Queryable methods directly during the specification, we tighten this concern even further and make the code more compact. At least, I think it’s more compact and easy to read, but I’m quite interesting in reading getting your input as someone who’s not as familiar with the code.

*) Performance: I already measured the ratio between the different method-lookup strategies and the LookUp-strategy kind-of-won but I didn’t have a chance to run the entire setup against the LinqMethodsAnalyzer version.

Now, with the
GetSupportedMethods approach, we could simply put the following code into the ExpressionTreeParser:

public static CompoundNodeTypeProvider CreateDefaultNodeTypeProvider ()
{
  
var searchedTypes = typeof (MethodInfoBasedNodeTypeRegistry).GetTypeInfo().Assembly.DefinedTypes.Select (ti => ti.AsType()).ToList();
  
  
var innerProviders = new INodeTypeProvider[]
                       {
                           
MethodInfoBasedNodeTypeRegistry.CreateFromLinqAssembly(),
                           
MethodNameBasedNodeTypeRegistry.CreateFromTypes(searchedTypes)
                       };
  
return new CompoundNodeTypeProvider (innerProviders);
}


The creation of the default
MethodInfoBasedNodeTypeRegistry  could be made like this:

public sealed class MethodInfoBasedNodeTypeRegistry : INodeTypeProvider
{
  
private static readonly Lazy<IReadOnlyDictionary<TypeIReadOnlyCollection<MethodInfo>>> s_supportedMethodsByExpressionNodeTypes =
      
new Lazy<IReadOnlyDictionary<TypeIReadOnlyCollection<MethodInfo>>> (GetSupportedMethodsByExpressionNodeTypes);
 
  
public static MethodInfoBasedNodeTypeRegistry CreateFromLinqAssembly ()
  {
    
var registry = new MethodInfoBasedNodeTypeRegistry();
 
    
foreach (var methodsForType in s_supportedMethodsByExpressionNodeTypes.Value)
      registry.
Register (methodsForType.ValuemethodsForType.Key);
 
    
return registry;
  }
 
  
private static IReadOnlyDictionary<TypeIReadOnlyCollection<MethodInfo>> GetSupportedMethodsByExpressionNodeTypes ()
  {
    
var dictionary = new Dictionary<TypeIReadOnlyCollection<MethodInfo>>();
    dictionary.
Add (typeof (AggregateExpressionNode), AggregateExpressionNode.GetSupportedMethods().ToArray());
    
// other Expression Nodes
    dictionary.
Add (typeof (WhereExpressionNode), WhereExpressionNode.GetSupportedMethods().ToArray());
    
return dictionary;
  }
}

If this approach works, we would have the minimum-possible change in the codebase but should get all the readability, maintainability, and performance-gains of the individual approaches.
(Let’s talk about the unit testing strategies after we finish the details here)



What do you think?

Best regards, Michael

Peter Hsu

unread,
Jul 23, 2014, 1:20:47 PM7/23/14
to re-mot...@googlegroups.com
Thank Michael

I didn't want to scan Enumerable and Queryable multiple times and this was why I build LinqMethodsAnalyzer. We could definitely check the performance of using Reflection utility and have each class scan the linq methods separately. We can double check on the performance impact of that.


On Wednesday, July 23, 2014 8:25:24 AM UTC-7, Michael Ketting wrote:

Hi Peter!

Thank you for the updated patch and the great input on the performance measuring. I’ll get back to that, but the patch first :)

The SupportedMethods-field is already quite good at grouping things. I’d like to discuss the following additional extension to your concept:

I would like to move the method selection from a specification which basically acts as a filter to a get-method of the following fashion:

public class WhereExpressionNode : MethodCallExpressionNodeBase
{
  
public static IEnumerable<MethodInfoGetSupportedMethods ()
  {
    
return ReflectionUtility.EnumerableAndQueryableMethodsByName["Where"].Where (SupportedMethodSpecifications.NoIndexedSelector);
  }
}

<span style="font-size:11.0pt;font-family:"Calibr

...

Michael Ketting

unread,
Jul 23, 2014, 2:51:28 PM7/23/14
to re-mot...@googlegroups.com
Hello Peter!


We could definitely check the performance of using Reflection utility and have each class scan the linq methods separately. We can double check on the performance impact of that.

Great, thanks! Looking forward to your findings.

Best regards, Michael

Peter Hsu

unread,
Jul 23, 2014, 3:58:08 PM7/23/14
to re-mot...@googlegroups.com
Perhaps meanwhile you could check with your team and see if they have any concerns on this rule based methods discovery approach in general?
thanks

Michael Ketting

unread,
Jul 23, 2014, 4:27:50 PM7/23/14
to re-mot...@googlegroups.com
Hi Peter!

Are you referring to the unified design I posted last or to a different aspect? This latest design returning the sequence of MethodInfos already got our internal approval and the filter-conditions you set up via the static SupportedMethodSpecifications will also work.
If this design works out in terms of performance, the test-code could probably also remain mostly as it is in the original. I'm very much open to adding this single integration test that specifies the complete set of supported and not-supported methods via the new syntax from your patch. And whether we should rewrite the individual tests with the more concise
Is.Equivalent (MethodInfo[] {...})
constraint is really just polishing then.

Best regards, Michael

Peter Hsu

unread,
Aug 10, 2014, 6:32:23 PM8/10/14
to re-mot...@googlegroups.com
Hi Michael

I am sorry for not updating for a long time. I have not forgotten about this feature yet but I am quite busy lately because I will be taking an extended leave in September. I plan to do some work in this feature during that time. I just wanted to let you know my plan.

thanks
Peter

Fabian Schmied

unread,
Aug 12, 2014, 5:08:50 AM8/12/14
to re-motion Developers
Hi Peter,

Michael is currently on holiday, so no worries :)

Best regards,
Fabian


--

Michael Ketting

unread,
Aug 31, 2014, 12:40:18 PM8/31/14
to re-mot...@googlegroups.com
Hi Peter!

I know the idea about doing stuff while away from the office. Been busy, too, moving re-linq towards GIT. It's not done yet, but I already have an (mostly) isolated re-linq frontend, i.e. the part you're using. There's still the eager-fetching stuff in there that I want to move out and once that's done, I'll migrate the whole shebang to GIT.

In the mean time, you can find the new repository here: https://svn.re-motion.org/svn/Relinq/trunk

Please note, the build scripts for the local build aren't completely back to working order yet but should be in a couple of days... But doing a VS build and calling nuget pack on the project works :) (you'll need to pass "-Properties ExtraTags=;DocumentationFolder=bin", but the nuget command prompt would tell you anyway)

Best regards, Michael

PS: Thanks for the note, Fabian :)


On Tuesday, August 12, 2014 11:08:50 AM UTC+2, Fabian Schmied wrote:
Hi Peter,

Michael is currently on holiday, so no worries :)

Best regards,
Fabian

Michael Ketting

unread,
Sep 2, 2014, 11:46:13 AM9/2/14
to re-mot...@googlegroups.com
Hi Peter!

Good news! re-linq frontend can now be found at its new home in https://github.com/re-motion/Relinq

I hope everything does as it should. It worked on my machine and a colleague's fresh system, so I am hopeful :)

Best regards, Michael

Peter Hsu

unread,
Sep 15, 2014, 4:43:25 PM9/15/14
to re-mot...@googlegroups.com
Hi Michael 

Congratulations on moving to github.

Here's the status on our side: we are able to use ngen to eliminate jit costs on our side. This is likely going to be a solution to this cold start problem. Note that what was blocking our progress was mostly the jit time. I think even though changing code may squeeze a little more performance for startup. We would most like want to shelve this work item.

Thank you so much for helping on this issue. Let's close it for now!

Cheers

Peter

Michael Ketting

unread,
Sep 16, 2014, 2:08:20 AM9/16/14
to re-mot...@googlegroups.com
Hello Peter!

Thanks for letting us know!

Best regards, Michael

Michael Ketting

unread,
Mar 22, 2015, 12:46:56 PM3/22/15
to re-mot...@googlegroups.com
I've now gotten to rewriting the NodeTypeProvider initialization code and created a pull-request that will get merged in the next couple of days and then I do another alpha-release.
https://github.com/re-motion/Relinq/pull/3

On my system, initializing the NodeTypeProvider took 77ms before the rewrite, now it's down to 36ms. I also tested this with ngen, and the time taken for the NodeTypeProvider would go down to 13ms.

I also investigated the 36ms for additional optimizations, but it looks like that 50% of the remaining time is spent when the parameter types of the various Enumerable and Queryable methods get loaded during the filter process.

Best regards, Michael

Michael Ketting

unread,
Mar 27, 2015, 11:39:20 AM3/27/15
to re-mot...@googlegroups.com
Quick follow-up: I also tested the 2nd patch (from June 12th, 2014) and got the same timings as with the final version.

~Michael
Reply all
Reply to author
Forward
0 new messages