Startup performance improvement

134 views
Skip to first unread message

Peter Hsu

unread,
Jun 24, 2014, 7:29:53 PM6/24/14
to re-mot...@googlegroups.com
Hi

I am working on AspnetVNext and our current focus is to cut down our application start up time. We took a profile and identified that we are spending excessive JIT time on a few ReLinq classes' static constructors. For example SumExpressionNode. I am currently seeing initialization like this.

public static readonly MethodInfo[] SupportedMethods = new[] {
   
GetSupportedMethod (() => Queryable.Sum ((IQueryable<decimal>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<decimal?>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<double>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<double?>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<int>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<int?>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<long>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<long?>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<float>) null)),
    GetSupportedMethod (() => Queryable.Sum ((IQueryable<float?>) null)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (decimal) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (decimal?) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (double) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (double?) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (int) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (int?) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (long) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (long?) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (float) 0)),
    GetSupportedMethod (() => Queryable.Sum<object> (null, o => (float?) 0)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<decimal>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<decimal?>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<double>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<double?>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<int>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<int?>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<long>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<long?>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<float>) null)),
    GetSupportedMethod (() => Enumerable.Sum ((IEnumerable<float?>) null)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (decimal) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (decimal?) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (double) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (double?) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (int) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (int?) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (long) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (long?) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (float) 0)),
    GetSupportedMethod (() => Enumerable.Sum<object> (null, o => (float?) 0))
  };


This code cause SumExpressionNode..cctor() to become huge in IL and even bigger in jitted code. Even the run performance is not ideal. On the other hand the following code would run much faster and has a tiny jit size compare to above code.

SupportedMethods = new Type[] { typeof(Queryable), typeof(Enumerable) }
       
.SelectMany(t => t.GetMethods(BindingFlags.Public | BindingFlags.Static))
       
.Where(m => "Sum" == m.Name)
       
.Select(GetRegisterableMethodDefinition)
       
.ToArray();

I understand the argument of explicitly enumerate supported method instead of reflectively scan for method to support. However, this is a big performance improvement we are talking about (very conservative estimation is at least 0.1~0.2 second by our profile). I have implemented a patch which applies this new code to expression nodes that support all methods of a specific name (All, Any, Average, Cast, Count, DefaultIfEmpty, First, Last, LongCount, Max, Min, OfType, Reverse, Single, Skip, Sum, Take). I also included the unit test so we can track any (unliky) API change for Linq.

I would appreciate it if I can get some input and we can figure out if this is a solution we can go forward with.

thanks
Peter
ScanLinqMethods.patch

Michael Ketting

unread,
Jun 25, 2014, 2:14:45 AM6/25/14
to re-mot...@googlegroups.com
Hi Peter!

Thanks for the important feedback on the startup performance and the suggested solution. I'm definitely interested in fixing this and crearted a JIRA issue: https://www.re-motion.org/jira/browse/RM-6184

First off, I'd like to understand a bit more about the actual issue (and should also write a performance repro sample to test this locally :) ) The two major changes you did (unless I missed something when I checked over the patch) was to simplify the code in the static constructors and to replace the following expression tree analysis with a reflection call to filter all methods of the Queryable and Enumerable types.

public static MethodInfo GetMethod<T> (Expression<Func<T>> wrappedCall)
   
{
     
ArgumentUtility.CheckNotNull ("wrappedCall", wrappedCall);
 
     
switch (wrappedCall.Body.NodeType)
     
{
       
case ExpressionType.Call:
         
return ((MethodCallExpression) wrappedCall.Body).Method;
       
case ExpressionType.MemberAccess:
         
var memberExpression = (MemberExpression) wrappedCall.Body;
         
var property = memberExpression.Member as PropertyInfo;
         
var method = property != null ? property.GetMethod : null;
         
if (method != null)
           
return method;
         
break;
     
}
 
     
throw new ArgumentException (string.Format ("Cannot extract a method from the given expression '{0}'.", wrappedCall.Body), "wrappedCall");
   
}

Did your analysis also show a performance problem with the way we retrived the MethodInfo from the Expression or was this simply a side-effect of your refactoring? To put it differently: From your experience, would it be possible to get the same performance improvement by moving current logic used to populate the static field to a point in  time after the type is initialized? The reason I'm asking is because I've wanted to change the "SupportedMethod" field to factory methods (but keep the existing, taxative syntax for retrieve the actual methods) for quite some time but there's never been an actual reason to do so before, except code beautification. If you don't know the answer, that's okay, too, I'll just need to performance test it then.

Best regards, Michael
    GetSupportedMethod (() =><spa
...

Peter Hsu

unread,
Jun 25, 2014, 3:14:01 PM6/25/14
to re-mot...@googlegroups.com

Thanks Michael

 

Sure, let me clarify by backing up a bit and tell you the whole story.

 

The motivation of this change goes back to our JIT analysis. We took a startup profile of our sample Mvc MusicStore app https://github.com/aspnet/MusicStore and end up seeing significant JIT time and JIT sizes on some of these classes's static constructor. (Significant on our perspective anyway)

 

Row Labels

JitTime MSec

IL Size

Native Size

Remotion.Linq.Parsing.Structure.IntermediateModel.SumExpressionNode..cctor()

15.046

4609

14480

Remotion.Linq.Parsing.Structure.IntermediateModel.AverageExpressionNode..cctor()

22.481

4609

14508

Remotion.Linq.Parsing.Structure.IntermediateModel.MaxExpressionNode..cctor()

9.569

2643

8354

Remotion.Linq.Parsing.Structure.IntermediateModel.MinExpressionNode..cctor()

11.749

2643

8354

 

I decided to take a look. I see the existing code is trying to compensate the fact that it is no trivial way to instantiate instances of MethodInfo of an existing class. This has impact on two ways.

 

1. The IL size and Native size become significant because note that delegates are compiled into ad-hoc classes. You can use ILDASM to check how big the ..cctor() gets

2. As you mentioned the computation needed to pull out the MethodInfo is significant, not to mention there was also cost in instantiating the Expression to begin with.


I got curious and extracted the code above for experiment. I end up seeing the following micro benchmark data

 

SumExperssionNode

Run time (ms)

IL Size (ms)

JIT Size (ms)

Profiled JIT Time (ms)

Expression based static constructor

15

4,609

11,586

10

Reflection based static constructor

2

130

310

3.3

 

(Note that the native size of my experiment is smaller than the first chart because my lab code is just a snapshot of SumExpressionNode)

 

I see value in terms of performance in moving  the population of SupportedMethod to a factory method. I was thinking about the same except I was hesitate to embark on such a significant code change. What I have in mind would be something like that.

void PopulateSupportedMethods()
{
       
var analyzer = new LinqMethodsAnalyzer();
        analyzer
.SetGlobalExclusion(NoUnsupportedDelegates);

        analyzer
.RegisterNode(SumExpressionNode.SupportedMethods, null, "Sum");
        analyzer
.RegisterNode(AggregateExpressionNode.SupportedMethods, AggregateWithSeed, "Aggregate");
        analyzer
.RegisterNode(FirstExpressionNode.SupportedMethods, null, "First", "FirstOrDefault");
       
//....
}

class LinqMethodsAnalyzer
{
       
class MethodScanLogic
       
{
           
public IList<MethodInfo> Target { get; set; }
           
public Predicate<MethodInfo> Inclusion { get; set; }
       
}

       
IDictionary<string, MethodScanLogic> _registry;
       
Predicate<MethodInfo> _globalInclusion;

       
public void SetGlobalExclusion(Predicate<MethodInfo> inclusionRule)
       
{
            _globalInclusion
= inclusionRule;
       
}

        // For each Linq method in TargetMethodNames, if inclusion rule applies, it should be added to the target list
       
public void RegisterNode(IList<MethodInfo> target, Predicate<MethodInfo> inclusionRule, params string[] TargetMethodNames)
       
{
           
foreach (var mName in TargetMethodNames)
           
{
                _registry
.Add(mName, new MethodScanLogic() { Target = target, Inclusion = inclusionRule });
           
}
       
}

       
public void PopulateNodes()
       
{
           
foreach (var m in new Type[] { typeof(IQueryable), typeof(IEnumerable) }.SelectMany(t => t.GetRuntimeMethods()))
           
{
               
if (m.IsStatic && m.IsPublic)
               
{
                   
MethodScanLogic localConfig;
                    _registry
.TryGetValue(m.Name, out localConfig);

                   
if (localConfig != null)
                   
{
                       
// this is where performance might get hit a bit because global rules may be expensive
                       
// because we are looking for generic types and in the case of Select, SelectMany and Where we even need
                       
// to look into the generic parameter of parameters to exclude methods that has start indicies
                       
// We can workaround this by getting rid of the global rule and making rules specific to each registry
                       
// This way we do not have to check for parameters that might not exist for some methods
                       
if ((_globalInclusion == null ? true : !_globalInclusion(m))
                           
&& (localConfig.Inclusion == null ? true : !localConfig.Inclusion(m)))
                       
{
                            localConfig
.Target.Add(m);
                       
}
                   
}
               
}
           
}
       
}
   
}


One benefit of this design is that we only scan the types once and assign them to the right nodes. I tried playing around in this direction but things got quite complicated and I am not sure if this arguably much more complex design can be justified. Therefore I decided to go with the simple route and just go for nodes that simply includes all linq methods of particular names without exclusion (inclusion) logics 



On Tuesday, June 24, 2014 11:14:45 PM UTC-7, Michael Ketting wrote:
Hi Peter!

Thanks for the important feedback on the startup performance and the suggested solution. I'm definitely interested in fixing this and crearted a JIRA issue: https://www.re-motion.org/jira/browse/RM-6184

First off, I'd like to understand a bit more about the actual issue (and should also write a performance repro sample to test this locally :) ) The two major changes you did (unless I missed something when I checked over the patch) was to simplify the code in the static constructors and to replace the following expression tree analysis with a reflection call to filter all methods of the Queryable and Enumerable types.

public static MethodInfo GetMethod<T> (Expression<Func<T>> wrappedCall)
   
{
     
ArgumentUtility.CheckNotNull ("wrappedCall", wrappedCall);
 
     
switch (wrappedCall.Body.NodeType)
     
{
       
case ExpressionType.Call:
         
return ((MethodCallExpression) wrappedCall.Body).Method;
       
case ExpressionType.MemberAccess:
         
var memberExpression = (MemberExpression) wrappedCall.Body;
         
var property = memberExpression.Member as PropertyInfo;
         
var method = property != null ? property.GetMethod : null;
         
if (method != null)
           
return method;
         
break;
     
}
 
     
throw new ArgumentException (string.Format ("Cannot extract a method from the given expression '{0}'.", wrappedCall.Body), "wrappedCall");
   
}

Did your analysis also show a performance problem with the way we retrived the MethodInfo from the Expression or was this simply a side-effect of your refactoring? To put it differently: From your experience, would it be possible to get the same performance improvement by moving current logic used to populate the static field to a point in  time after the type is initialized? The reason I'm asking is because I've wanted to change the "SupportedMethod" field to factory methods (but keep the existing, taxative syntax for retrieve the actual methods) for quite some time but there's never been an actual reason to do so before, except code beautification. If you don't know the answer, that's okay, too, I'll just need to performance test it then.

Best regards, Michael

On Wednesday, June 25, 2014 1:29:53 AM UTC+2, Peter Hsu wrote:
Hi

I am working on AspnetVNext and our current focus is to cut down our application start up time. We took a profile and identified that we are spending excessive JIT time on a few ReLinq classes' static constructors. For example SumExpressionNode. I am currently seeing initialization like this.

...

Peter Hsu

unread,
Jun 25, 2014, 6:29:54 PM6/25/14
to re-mot...@googlegroups.com
Apologies on some typos in my code because I was changing design from using exclusion rules to using inclusion rules as I wrote
 
void PopulateSupportedMethods()
{
       
var analyzer = new LinqMethodsAnalyzer();

        analyzer
.SetGlobalInclusion(NoUnsupportedDelegates);

        analyzer
.RegisterNode(SumExpressionNode.SupportedMethods, null, "Sum");
        analyzer
.RegisterNode(AggregateExpressionNode.SupportedMethods, AggregateWithNoSeed, "Aggregate");
        analyzer
.RegisterNode(AggregateFromSeedExpressionNode.SupportedMethods, AggregateWithSeed, "Aggregate");
        analyzer
.RegisterNode(FirstExpressionNode.SupportedMethods, null, "First", "FirstOrDefault");
       
//....
}


class LinqMethodsAnalyzer
{
       
class MethodScanLogic
       
{
           
public IList<MethodInfo> Target { get; set; }
           
public Predicate<MethodInfo> Inclusion { get; set; }
       
}

       
IDictionary<string, MethodScanLogic> _registry;
       
Predicate<MethodInfo> _globalInclusion;


       
public void SetGlobalInclusion(Predicate<MethodInfo> inclusionRule)
       
{

            _globalInclusion
= inclusionRule;
       
}

       
// For each Linq method in TargetMethodNames, if inclusion rule applies, it should be added to the target list
       
public void RegisterNode(IList<MethodInfo> target, Predicate<MethodInfo> inclusionRule, params string[] TargetMethodNames)
       
{
           
foreach (var mName in TargetMethodNames)
           
{
                _registry
.Add(mName, new MethodScanLogic() { Target = target, Inclusion = inclusionRule });
           
}
       
}

       
public void PopulateNodes()
       
{
           
foreach (var m in new Type[] { typeof(IQueryable), typeof(IEnumerable) }.SelectMany(t => t.GetRuntimeMethods()))
           
{
               
if (m.IsStatic && m.IsPublic)
               
{
                   
MethodScanLogic localConfig;
                    _registry
.TryGetValue(m.Name, out localConfig);

                   
if (localConfig != null)
                   
{
                       
// this is where performance might get hit a bit because global rules may be expensive
                       
// because we are looking for generic types and in the case of Select, SelectMany and Where we even need
                       
// to look into the generic parameter of parameters to exclude methods that has start indicies
                       
// We can workaround this by getting rid of the global rule and making rules specific to each registry
                       
// This way we do not have to check for parameters that might not exist for some methods
                       
if ((_globalInclusion == null ? true : !_globalInclusion(m))
                           
&& (localConfig.Inclusion == null ? true : !localConfig.Inclusion(m)))
                       
{
                            localConfig
.Target.Add(m);
                       
}
                   
}
               
}
           
}
       
}
   
}


I welcome thoughts on which implementation would be better for ReLinq 

Michael Ketting

unread,
Jun 27, 2014, 12:32:48 PM6/27/14
to re-mot...@googlegroups.com
Hello Peter!

Thanks for the ideas. I'll get back with a more detailed response over the weekend.


Best regards, Michael

On Wednesday, June 25, 2014 1:29:53 AM UTC+2, Peter Hsu wrote:
    GetSupportedMethod (() =><spa
...

Ketting, Michael

unread,
Jun 30, 2014, 3:29:44 AM6/30/14
to re-mot...@googlegroups.com

Hello Peter!

I've now setup a (very) small performance sample:

internal class Program
{
 
private static void Main (string[] args)
 
{
   
var stopwatch = Stopwatch.StartNew();
   
ExpressionTreeParser.CreateDefaultNodeTypeProvider();
    stopwatch
.Stop();
   
Console.WriteLine ("Time taken: {0}", stopwatch.Elapsed);
 
}
}


Since the code is still using reflection to find the relevant types, this sample should not support from pre-jitting, I  hope. Anyhow, with the turnk-code, I got about 70ms (release-build) of time for creating the NodeTypeProviders and after applying your patch, it dropped to 40ms.

The patch still needs a bit of adaption to our coding styles:
1. Whitespaces: We add a whitespace before the opening paranthesis, except when calling a method with an empty signature.
2. Assertion-Methods: We use the Assert.That (actual, Is.EqualTo (actual)) or Assert.That (actual, Is.True) syntax these days. I can't promise that all old tests have already been migrated, but all the newer tests are :)

For the count-check I'm thinking of reusing the previous production code used for registering the MethodInfos:

 

Assert.That (
   
AnyExpressionNode.SupportedMethods,
   
Is.Equivalent (
       
new[]
       
{

          GetSupportedMethod (() => Queryable.Any<object> (null)),

          GetSupportedMethod (() => Queryable.Any<object> (null, null)),

          GetSupportedMethod (() => Enumerable.Any<object> (null)),

          GetSupportedMethod (() => Enumerable.Any<object> (null, null))
    ));
}

That way, we can keep a compact and easily reviewed list of supported methods available. It might also be that I can then drop the then-unneeded individual tests.

You second idea is very intriguing, but yeah, it goes a bit father than what I was thinking. I'd like to keep the declaration of the supported methods for each node-type with the node type. I'm thinking something like this:

public class SumExpressionNode : ResultOperatorExpressionNodeBase

{

  public class ResultOperatorProvider : ISupportedMethodInfoProvider

  {

    public ResultOperatorProvider ()

    {

      // Default-ctor is required

    }

    public Type NodeType

    {

      get { return typeof (SumExpressionNode); }

    }

    public IEnumerable<MethodInfoGetSupportedMethodInfos ()

    {

      return ReflectionUtility.QueryableMethods.Where (mi => mi.Name == "Sum")

          .Concat (ReflectionUtility.EnumerableMethods.Where (mi => mi.Name == "Sum"));

    }

  }

  ...
}


Then I can change MethodInfoBasedNodeTypeRegistry to not look up member fields on the NodeTypes but instead simply look for and instantiate the providers. I'm also going to look into getting all the public static methods for Queryable and Enumerable just once and hold them in static fields. Anyhow, that second part can be a follow-up refactoring to the first one.


What do you think?

Best regards, Michael

--
You received this message because you are subscribed to the Google Groups "re-motion Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to re-motion-de...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Hsu

unread,
Jun 30, 2014, 5:25:38 PM6/30/14
to re-mot...@googlegroups.com
Hey Michael

Apologizes for not actively updating. In fact I am working on a couple of prototypes. I didn't think the results are ready yet because I have not finished implementing testing. I am measuring 3 scenarios using a simple test code like this:

class Program {
   
static void Main(string[] args)   {
   
var sw = new Stopwatch();
    sw
.Start();

   
LinqMethodsAnalyzer.InitializeNodes();
   
VisitAllSupportedMethods();
 
    sw
.Stop();
   
Console.WriteLine("Time elapsed msc: " + sw.ElapsedMilliseconds);
   
Console.Read();
 
}
 
 
// visit all SupportedMethods collection to make sure everything is initialized
 
public static int VisitAllSupportedMethods()
 
{
   
int i = 0;
    i
+= AggregateExpressionNode.SupportedMethods.Count();
    i
+= AggregateFromSeedExpressionNode.SupportedMethods.Count();
    i
+= AllExpressionNode.SupportedMethods.Count();
    i
+= AnyExpressionNode.SupportedMethods.Count();
    i
+= AverageExpressionNode.SupportedMethods.Count();
    i
+= CastExpressionNode.SupportedMethods.Count();
    i
+= ContainsExpressionNode.SupportedMethods.Count();
    i
+= CountExpressionNode.SupportedMethods.Count();
    i
+= DefaultIfEmptyExpressionNode.SupportedMethods.Count();
    i
+= DistinctExpressionNode.SupportedMethods.Count();
    i
+= ExceptExpressionNode.SupportedMethods.Count();
    i
+= FirstExpressionNode.SupportedMethods.Count();
    i
+= GroupByExpressionNode.SupportedMethods.Count();
    i
+= GroupByWithResultSelectorExpressionNode.SupportedMethods.Count();
    i
+= GroupJoinExpressionNode.SupportedMethods.Count();
    i
+= IntersectExpressionNode.SupportedMethods.Count();
    i
+= JoinExpressionNode.SupportedMethods.Count();
    i
+= LastExpressionNode.SupportedMethods.Count();
    i
+= LongCountExpressionNode.SupportedMethods.Count();
    i
+= MaxExpressionNode.SupportedMethods.Count();
    i
+= MinExpressionNode.SupportedMethods.Count();
    i
+= OfTypeExpressionNode.SupportedMethods.Count();
    i
+= OrderByDescendingExpressionNode.SupportedMethods.Count();
    i
+= OrderByExpressionNode.SupportedMethods.Count();
    i
+= ReverseExpressionNode.SupportedMethods.Count();
    i
+= SelectExpressionNode.SupportedMethods.Count();
    i
+= SelectManyExpressionNode.SupportedMethods.Count();
    i
+= SingleExpressionNode.SupportedMethods.Count();
    i
+= SkipExpressionNode.SupportedMethods.Count();
    i
+= SumExpressionNode.SupportedMethods.Count();
    i
+= TakeExpressionNode.SupportedMethods.Count();
    i
+= ThenByDescendingExpressionNode.SupportedMethods.Count();
    i
+= ThenByExpressionNode.SupportedMethods.Count();
    i
+= UnionExpressionNode.SupportedMethods.Count();
    i
+= WhereExpressionNode.SupportedMethods.Count();
   
return i;
 
}
}

I am getting the following results for each scenarios.

 

Cold Start

Warm Start

Trunk Code

616

144

Patch Code

357

68

Methods Analyzer

264

57


This results is run in our lab machine. The Cold Start scenario is using a cache purging utility for windows to simulate reboot. Note that I had made an assumption of not calling on my MethodInfoBasedNodeTypeRegistry
.GetRegisterableMethodDefinition(m, true) on the analyzer class.

The analyzer class look like this so far (I am sure we need to discuss the correctness of the class and fix the code style of this is something we are interested in)

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;
 
namespace Remotion.Linq.Parsing.Structure.IntermediateModel
{
  public class LinqMethodsAnalyzer
  {
    static Func<MethodInfobool> ContainsSeed = m =>
    {
      var firstParamType = m.GetParameters()[1].ParameterType;
      return firstParamType.Name == "TAccumulate";
    };
 
    static Func<MethodInfobool> WithResultSelector = m =>
    {
      return m.GetParameters().Any(p => p.Name == "resultSelector");
    };
 
    static Func<MethodInfobool> NoIndexedSelector = m =>
    {
      // * Select uses the parameter name selector
      // * SelectMany uses selector for most methods but there where causes where there's both
      //   resultSelector and collectionSelector and only collectionSelector can be indexed
      // * Where uses the name predicate
      var seletorParam = m.GetParameters().SingleOrDefault(p => p.Name == "selector" || p.Name == "collectionSelector" || p.Name == "predicate");
      if (seletorParam == null)
      {
        return true;
      }
      else
      {
        var selectorType = seletorParam.ParameterType;
        // Enumerable taks a Func<...> but Querable takes an Expression<Func<...>>
        if (typeof(Expression).GetTypeInfo().IsAssignableFrom(selectorType.GetTypeInfo()))
        {
          selectorType = selectorType.GetTypeInfo().GenericTypeArguments[0];
        }
        return !typeof(int).GetTypeInfo().IsAssignableFrom(selectorType.GetTypeInfo().GenericTypeArguments[1].GetTypeInfo());
      }
    };
 
    static Func<MethodInfobool> NoEqualityComparer = m =>
    {
      return NoDelegateOfType(m, typeof(IEqualityComparer<>));
    };
 
    static Func<MethodInfobool> NoComparer = m =>
    {
      return NoDelegateOfType(m, typeof(IComparer<>));
    };
 
    static bool NoDelegateOfType(MethodInfo m, Type genericDelegateType)
    {
      foreach (var p in m.GetParameters())
      {
        var paramType = p.ParameterType;
        if (paramType.GetTypeInfo().IsGenericType &&
          genericDelegateType.GetTypeInfo().IsAssignableFrom(paramType.GetGenericTypeDefinition().GetTypeInfo()))
        {
          return false;
        }
      }
      return true;
    }
 
    public static void InitializeNodes()
    {
      var analyzer = new LinqMethodsAnalyzer();
 
      analyzer.RegisterNode(
        m => (ContainsSeed(m) ?
                AggregateFromSeedExpressionNode.SupportedMethods :
                AggregateExpressionNode.SupportedMethods),
        "Aggregate");
 
      analyzer.RegisterNode(AllExpressionNode.SupportedMethods, "All");
      analyzer.RegisterNode(AnyExpressionNode.SupportedMethods, "Any");
      analyzer.RegisterNode(AverageExpressionNode.SupportedMethods, "Average");
      analyzer.RegisterNode(CastExpressionNode.SupportedMethods, "Cast");
      analyzer.RegisterNode(ContainsExpressionNode.SupportedMethods, NoEqualityComparer, "Contains");
      analyzer.RegisterNode(CountExpressionNode.SupportedMethods, "Count");
      analyzer.RegisterNode(DefaultIfEmptyExpressionNode.SupportedMethods, "DefaultIfEmpty");
      analyzer.RegisterNode(DistinctExpressionNode.SupportedMethods, NoEqualityComparer, "Distinct");
      analyzer.RegisterNode(ExceptExpressionNode.SupportedMethods, NoEqualityComparer, "Except");
      analyzer.RegisterNode(FirstExpressionNode.SupportedMethods, "First""FirstOrDefault");
 
      analyzer.RegisterNode(
        m =>
        {
          if (NoEqualityComparer(m))
          {
            return WithResultSelector(m) ?
                    GroupByWithResultSelectorExpressionNode.SupportedMethods :
                    GroupByExpressionNode.SupportedMethods;
          }
          else
          {
            return null;
          }
        }, "GroupBy");
 
      analyzer.RegisterNode(GroupJoinExpressionNode.SupportedMethods, NoEqualityComparer, "GroupJoin");
      analyzer.RegisterNode(IntersectExpressionNode.SupportedMethods, NoEqualityComparer, "Intersect");
      analyzer.RegisterNode(JoinExpressionNode.SupportedMethods, NoEqualityComparer, "Join");
      analyzer.RegisterNode(LastExpressionNode.SupportedMethods, "Last""LastOrDefault");
      analyzer.RegisterNode(LongCountExpressionNode.SupportedMethods, "LongCount");
      analyzer.RegisterNode(MaxExpressionNode.SupportedMethods, "Max");
      analyzer.RegisterNode(MinExpressionNode.SupportedMethods, "Min");
      analyzer.RegisterNode(OfTypeExpressionNode.SupportedMethods, "OfType");
      analyzer.RegisterNode(OrderByDescendingExpressionNode.SupportedMethods, NoComparer, "OrderByDescending");
      analyzer.RegisterNode(OrderByExpressionNode.SupportedMethods, NoComparer, "OrderBy");
      analyzer.RegisterNode(ReverseExpressionNode.SupportedMethods, "Reverse");
      analyzer.RegisterNode(SelectExpressionNode.SupportedMethods, NoIndexedSelector, "Select");
      analyzer.RegisterNode(SelectManyExpressionNode.SupportedMethods, NoIndexedSelector, "SelectMany");
      analyzer.RegisterNode(SingleExpressionNode.SupportedMethods, "Single""SingleOrDefault");
      analyzer.RegisterNode(SkipExpressionNode.SupportedMethods, "Skip");
      analyzer.RegisterNode(SumExpressionNode.SupportedMethods, "Sum");
      analyzer.RegisterNode(TakeExpressionNode.SupportedMethods, "Take");
      analyzer.RegisterNode(ThenByDescendingExpressionNode.SupportedMethods, NoComparer, "ThenByDescending");
      analyzer.RegisterNode(ThenByExpressionNode.SupportedMethods, NoComparer, "ThenBy");
      analyzer.RegisterNode(UnionExpressionNode.SupportedMethods, NoEqualityComparer, "Union");
      analyzer.RegisterNode(WhereExpressionNode.SupportedMethods, NoIndexedSelector, "Where");
 
      analyzer.PopulateSupportedMethod();
 
      // Pick up member properties that CountExpressionNode supports
      CountExpressionNode.SupportedMethods.Add(GetPropertyAsRegisterableMethod(typeof(List<int>), "Count"));
      CountExpressionNode.SupportedMethods.Add(GetPropertyAsRegisterableMethod(typeof(ICollection<int>), "Count"));
      CountExpressionNode.SupportedMethods.Add(GetPropertyAsRegisterableMethod(typeof(ICollection), "Count"));
      CountExpressionNode.SupportedMethods.Add(GetPropertyAsRegisterableMethod(typeof(Array), "Length"));
 
      var arrayListType = Type.GetType("System.Collections.ArrayList"false);
      if (arrayListType != null)
      {
        var arrayListCountMethod = GetPropertyAsRegisterableMethod(arrayListType, "Count");
        System.Diagnostics.Contracts.Contract.Assert(arrayListCountMethod != null);
        if (arrayListCountMethod != null)
        {
          CountExpressionNode.SupportedMethods.Add(arrayListCountMethod);
        }
      }
 
      var arrayLongCountMethod = GetPropertyAsRegisterableMethod(typeof(Array), "LongLength");
      if (arrayLongCountMethod != null)
      {
        LongCountExpressionNode.SupportedMethods.Add(arrayLongCountMethod); ;
      }
    }
 
    static MethodInfo GetPropertyAsRegisterableMethod(Type t, string propName)
    {
      var property = t.GetRuntimeProperty(propName);
 
      // Under what condition would a property be not present? i.e. ArrayList.Count and Array.LongLength
      // System.Diagnostics.Contracts.Contract.Assert(property != null);
      if (property == null)
      {
        return null;
      }
      else
      {
        var getMethod = property.GetMethod;
        return getMethod;
        //return MethodInfoBasedNodeTypeRegistry.GetRegisterableMethodDefinition(getMethod, throwOnAmbiguousMatch: true);
      }
    }
 
    class InclusionLogic
    {
      public IList<MethodInfo> Target { getset; }
      public Func<MethodInfobool> Inclusion { getset; }
    }
 
    private IDictionary<stringInclusionLogic> _directRegistry = new Dictionary<stringInclusionLogic>();
    private IDictionary<stringFunc<MethodInfoIList<MethodInfo>>> _dispatchRegistry = new Dictionary<stringFunc<MethodInfoIList<MethodInfo>>>();
 
    // For each Linq method in TargetMethodNames, if inclusion rule applies, it should be added to the target list
    private void RegisterNode(IList<MethodInfo> target, Func<MethodInfobool> inclusionRule, params string[] TargetMethodNames)
    {
      foreach (var mName in TargetMethodNames)
      {
        _directRegistry.Add(mName, new InclusionLogic() { Target = target, Inclusion = inclusionRule });
      }
    }
    private void RegisterNode(IList<MethodInfo> target, params string[] TargetMethodNames)
    {
      RegisterNode(target, null, TargetMethodNames);
    }
 
    private void RegisterNode(Func<MethodInfoIList<MethodInfo>> dispatchLogic, params string[] TargetMethodNames)
    {
      foreach (var mName in TargetMethodNames)
      {
        _dispatchRegistry.Add(mName, dispatchLogic);
      }
    }
 
    public void PopulateSupportedMethod()
    {
      foreach (var m in new Type[] { typeof(Queryable), typeof(Enumerable) }.SelectMany(t => t.GetRuntimeMethods()))
      {
        //if (m.IsStatic && m.IsPublic)
        {
          InclusionLogic directConfig;
          _directRegistry.TryGetValue(m.Name, out directConfig);
 
          if (directConfig != null)
          {
            if (directConfig.Inclusion == null ? true : !directConfig.Inclusion(m))
            {
              //directConfig.Target.Add(MethodInfoBasedNodeTypeRegistry.GetRegisterableMethodDefinition(m, throwOnAmbiguousMatch: true));
              directConfig.Target.Add(m);
            }
          }
          else
          {
            Func<MethodInfoIList<MethodInfo>> dispatcher;
            _dispatchRegistry.TryGetValue(m.Name, out dispatcher);
 
            if (dispatcher != null)
            {
              //var target = dispatcher(MethodInfoBasedNodeTypeRegistry.GetRegisterableMethodDefinition(m, throwOnAmbiguousMatch: true));
              var target = dispatcher(m);
              if (target != null)
              {
                target.Add(m);
              }
            }
          }
        }
      }
    }
  }
}

Peter Hsu

unread,
Jul 1, 2014, 4:51:37 PM7/1/14
to re-mot...@googlegroups.com
So I am seeing quite a bit of unit tests failing, even with just the version 1 patch. Turns out it may be because unit tests makes assumption of the order of SupportedMethods. I.E. FirstExpressionNodeTest.Apply_DefaultAllowed pick the first SupportedMethod for testing. Even after I work around this constraint I am seeing it fails somewhere else. It could be quite consuming to fix all unit tests. I will just put this item on my back burner from now on.
 genericDelegateType)
    {
      foreach</s
...

Peter Hsu

unread,
Jul 1, 2014, 7:03:38 PM7/1/14
to re-mot...@googlegroups.com
I stand corrected. These unit tests version 1 patch broke was relatively to fix. I have them fixed now. However I am still have problem with the tests Analyzer broke. Potentially we can just commit to the first solution.

The thing for the version 1 patch is that it still is faster. However, we have these 2 sets of Nodes. The simple ones uses reflection to find supported method and the complex one uses explicit declaration. Thus I am not sure if we should commit to that one. Analyzer provides a much centralized logic and we can understand what methods can be support and what cannot in one single class. Not to mention that it is potentially faster.

Ketting, Michael

unread,
Jul 2, 2014, 6:03:51 PM7/2/14
to re-mot...@googlegroups.com
Hello Peter!

Thank you for your analysis of the matter.

Based on your thoughts, I believe a combination of methods 1 and 2 could be quite beneficial. Continuing from my previous mail, I recommend an iterative approach to the problem.

The first step would be to replace each SupportedMethods-field with a corresponding nested result operator providers type, utilizing a cached list of MethodInfos for Queryable and Enumerable:

public class SumExpressionNode : ResultOperatorExpressionNodeBase

{

  public class ResultOperatorProvider : ISupportedMethodInfoProvider

  {

    public ResultOperatorProvider ()

    {

      // Default-ctor is required

    }

    public Type NodeType

    {

      get { return typeof (SumExpressionNode); }

    }

    public IEnumerable<MethodInfoGetSupportedMethodInfos ()

    {

      return ReflectionUtility.QueryableMethods.Where (mi => mi.Name == "Sum")

          .Concat (ReflectionUtility.EnumerableMethods.Where (mi => mi.Name == "Sum"));

    }

  }

  ...
}


Then, in the MethodInfoBasedNodeTypeRegistry, inject a sequence of those instances instead of a sequence of types into the CreateFromTypes method.


Next, in ExpressionTreeParser, change the assembly/type lookup to a taxative instantiating and listing of the individual ResultOperatorProviders.


Finally (or first, depending on preferences), refactoring of the tests. For each node type test, we can replace the individual tests of the supported methods to a combined check:

Assert.That (
   
AnyExpressionNode.SupportedMethods,
   
Is.Equivalent (
       
new[]
       
{

          GetSupportedMethod (() => Queryable.Any<object> (null)),

          GetSupportedMethod (() => Queryable.Any<object> (null, null)),

          GetSupportedMethod (() => Enumerable.Any<object> (null)),

          GetSupportedMethod (() => Enumerable.Any<object> (null, null))
    ));
}

It would be necessary to
copy/move the GetSupportedMethod() method over to the tests into a utility class, but that's details.

When all this is done, we should have the same performance gain as would be possible via your intended Analyzer-approach (okay, the cached method-list would need to be iterated about 30 times, but that really should not matter, considering it's just 100-200 methods total in the two lists). But we would still retain a declaration of the supported methods without
adding much complexity to it.

In a follow up, we can do additional refactorings, such as naming-cleanups of the involved methods and we could also move the individual ResultOperatorProviders into a common parent-class to group them into a single file with nested types. Originally, we felt it helpful to keep them with the individual node types, but I'm not hung up on that.

I can setup a branch for this tomorrow and once the ICLA is done, I can also start applying your patches into the branch so we can discuss this on an incremental basis. (on a side note, yes, I'm already working on getting re-linq transplanted over to GitHub, but it's going to be a couple more weeks...)

What do you think of this approach?

Best regards, Michael

From: re-mot...@googlegroups.com [re-mot...@googlegroups.com] on behalf of Peter Hsu [xus...@gmail.com]
Sent: Wednesday, July 02, 2014 01:03
To: re-mot...@googlegroups.com
Subject: Re: [re-motion-dev] Re: Startup performance improvement

--

Michael Ketting

unread,
Jul 3, 2014, 2:11:59 AM7/3/14
to re-mot...@googlegroups.com
Hello Peter!

I've now created the feature branch at https://svn.re-motion.org/svn/Remotion/branches/RM-6184_ReLinq_StartupPerformance.
It also contains the (very simple) performance test to test the startup time. If you can share additional tests tailied to specifically to the starup or have recommendations on how to improve the performance test, I'm very interested. In particular, I'm curious about the time differences between our two measurements and would like to make sure that they're just du to the differences in hardware and not because my performance test isn't corretly measuring all the relevant times.

Best regards, Michael

From: Peter Hsu

Sent: Wednesday, July 02, 2014 01:03
To: re-motion-dev

    i
+= ThenByExpressionNode.SupportedMethods.<span style="color:#6
...

Peter Hsu

unread,
Jul 8, 2014, 8:59:11 PM7/8/14