[LLVMdev] Adding a custom GC safe point creation phase

18 views
Skip to first unread message

Nicolas Geoffray

unread,
Oct 31, 2011, 5:21:17 PM10/31/11
to Chris Lattner, gordonh...@mac.com, llvmdev
Hi Chris, Gordon,

Here's a patch to allow a GCStrategy to customize the places where it wants to insert safe points. I'm not sure who maintains the GC code today in LLVM (I'd be happy to take ownership, if needed).

The patch just adds up a custom safepoints flag, similar to the way the GCStrategy can customize intrinsics lowering, or roots initialization. It works pretty well, as I've tested it on VMKit. So the patch in itself should not be controversial.

Chris, I'd like to discuss with you on how we can improve the current GC framework to take advantage of metadata. The reason I add this custom safe point creation method is because it's difficult for a front-end to pass custom information to the GC framework. And I think metadata could help us in this situation. Ideally, I'd like to get to a point where we can write:

15 = load %MyPointer* %2 !safepoint

And the GCStrategy would find this metadata information in the MachineInstruction (because the safe points are created based on MachineInstruction). This solution looks very much like the !nontemporal metadata.

Unfortunately, it looks like metadata are not passed down to MachineInstruction. How could we achieve that? Should we write custom code in the Instruction -> MachineInstruction transformation to pass down that !safepoint metadata?

Thanks for any input!
Nicolas


Index: include/llvm/CodeGen/GCStrategy.h
===================================================================
--- include/llvm/CodeGen/GCStrategy.h (revision 141409)
+++ include/llvm/CodeGen/GCStrategy.h (working copy)
@@ -37,6 +37,7 @@
 #define LLVM_CODEGEN_GCSTRATEGY_H
 
 #include "llvm/CodeGen/GCMetadata.h"
+#include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/Support/Registry.h"
 #include <string>
 
@@ -68,6 +69,8 @@
     bool CustomReadBarriers;   //< Default is to insert loads.
     bool CustomWriteBarriers;  //< Default is to insert stores.
     bool CustomRoots;          //< Default is to pass through to backend.
+    bool CustomSafePoints;     //< Default is to use NeededSafePoints
+                               //  to find safe points.
     bool InitRoots;            //< If set, roots are nulled during lowering.
     bool UsesMetadata;         //< If set, backend must emit metadata tables.
     
@@ -87,7 +90,9 @@
 
     /// needsSafePoitns - True if safe points of any kind are required. By
     //                    default, none are recorded.
-    bool needsSafePoints() const { return NeededSafePoints != 0; }
+    bool needsSafePoints() const {
+      return CustomSafePoints || NeededSafePoints != 0;
+    }
     
     /// needsSafePoint(Kind) - True if the given kind of safe point is
     //                          required. By default, none are recorded.
@@ -109,6 +114,11 @@
     ///               can generate a stack map. If true, then
     //                performCustomLowering must delete them.
     bool customRoots() const { return CustomRoots; }
+
+    /// customSafePoints - By default, the GC analysis will find safe
+    ///                    points according to NeededSafePoints. If true,
+    ///                    then findCustomSafePoints must create them.
+    bool customSafePoints() const { return CustomSafePoints; }
     
     /// initializeRoots - If set, gcroot intrinsics should initialize their
     //                    allocas to null before the first use. This is
@@ -135,6 +145,7 @@
     /// which the LLVM IR can be modified.
     virtual bool initializeCustomLowering(Module &F);
     virtual bool performCustomLowering(Function &F);
+    virtual bool findCustomSafePoints(GCFunctionInfo& FI, MachineFunction& MF);
   };
   
 }
Index: lib/CodeGen/GCStrategy.cpp
===================================================================
--- lib/CodeGen/GCStrategy.cpp (revision 141409)
+++ lib/CodeGen/GCStrategy.cpp (working copy)
@@ -97,6 +97,7 @@
   CustomReadBarriers(false),
   CustomWriteBarriers(false),
   CustomRoots(false),
+  CustomSafePoints(false),
   InitRoots(true),
   UsesMetadata(false)
 {}
@@ -116,6 +117,14 @@
   return 0;
 }
 
+
+bool GCStrategy::findCustomSafePoints(GCFunctionInfo& FI, MachineFunction &F) {
+  dbgs() << "gc " << getName() << " must override findCustomSafePoints.\n";
+  llvm_unreachable(0);
+  return 0;
+}
+
+
 GCFunctionInfo *GCStrategy::insertFunctionInfo(const Function &F) {
   GCFunctionInfo *FI = new GCFunctionInfo(F, *this);
   Functions.push_back(FI);
@@ -370,6 +379,7 @@
     MCSymbol* Label = InsertLabel(*CI->getParent(), RAI, CI->getDebugLoc());
     FI->addSafePoint(GC::PostCall, Label, CI->getDebugLoc());
   }
+
 }
 
 void MachineCodeAnalysis::FindSafePoints(MachineFunction &MF) {
@@ -405,9 +415,13 @@
   
   // Find the size of the stack frame.
   FI->setFrameSize(MF.getFrameInfo()->getStackSize());
-  
   // Find all safe points.
-  FindSafePoints(MF);
+  if (FI->getStrategy().customSafePoints()) {
+    FI->getStrategy().findCustomSafePoints(*FI, MF);
+  } else {
+    FindSafePoints(MF);
+  }
   
   // Find the stack offsets for all roots.
   FindStackOffsets(MF);

Gordon Henriksen

unread,
Oct 31, 2011, 9:21:39 PM10/31/11
to Nicolas Geoffray, llvmdev
On 2011-10-31, at 17:21, Nicolas Geoffray wrote:

> Here's a patch to allow a GCStrategy to customize the places where it wants to insert safe points. I'm not sure who maintains the GC code today in LLVM (I'd be happy to take ownership, if needed).
>
> The patch just adds up a custom safepoints flag, similar to the way the GCStrategy can customize intrinsics lowering, or roots initialization. It works pretty well, as I've tested it on VMKit. So the patch in itself should not be controversial.

Seems a perfectly reasonable patch on its own merits.

I would be curious what your findCustomSafePoints implementation looks like. Perhaps it's possible to extract some of it into the framework.

> Chris, I'd like to discuss with you on how we can improve the current GC framework to take advantage of metadata. The reason I add this custom safe point creation method is because it's difficult for a front-end to pass custom information to the GC framework. And I think metadata could help us in this situation. Ideally, I'd like to get to a point where we can write:
>
> 15 = load %MyPointer* %2 !safepoint
>
> And the GCStrategy would find this metadata information in the MachineInstruction (because the safe points are created based on MachineInstruction). This solution looks very much like the !nontemporal metadata.
>
> Unfortunately, it looks like metadata are not passed down to MachineInstruction. How could we achieve that? Should we write custom code in the Instruction -> MachineInstruction transformation to pass down that !safepoint metadata?

Could such a program survive IR optimizations intact?

— Gordon
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Nicolas Geoffray

unread,
Nov 1, 2011, 4:47:47 AM11/1/11
to Gordon Henriksen, llvmdev
Thanks for the review Gordon.

On Tue, Nov 1, 2011 at 2:21 AM, Gordon Henriksen <gordonh...@mac.com> wrote:
On 2011-10-31, at 17:21, Nicolas Geoffray wrote:

> Here's a patch to allow a GCStrategy to customize the places where it wants to insert safe points. I'm not sure who maintains the GC code today in LLVM (I'd be happy to take ownership, if needed).
>
> The patch just adds up a custom safepoints flag, similar to the way the GCStrategy can customize intrinsics lowering, or roots initialization. It works pretty well, as I've tested it on VMKit. So the patch in itself should not be controversial.

Seems a perfectly reasonable patch on its own merits.

I would be curious what your findCustomSafePoints implementation looks like. Perhaps it's possible to extract some of it into the framework.

The code for my findCustomSafePoints is like this:

void MyGC::findCustomSafePoints(GCFunctionInfo& FI, MachineFunction &MF) {
  for (MachineFunction::iterator BBI = MF.begin(),
                                 BBE = MF.end(); BBI != BBE; ++BBI) {
    for (MachineBasicBlock::iterator MI = BBI->begin(),
                                     ME = BBI->end(); MI != ME; ++MI) {
      if (MI->getDesc().isCall()) {
               /// Standard code need for adding a post call safe point;
      } else if (MI->getDebucLoc().getCol() == 1) {
              /// Standard code need for adding a post call safe point;
      }
  }
}

So it really looks like what we already have, except this special trick about checking the debug location. Piggy backing on the debug location is easy and works for me because:
1) Debug locations are passed from LLVM Instructions to MachineInstruction
2) I don't have column numbers in my front-end.

The instructions that get this special marker on the debug location are instructions that may trigger a segmentation fault. Because I enter a signal handler, that may invoke a collection, I need to know the roots at the faulting instruction's address.


> Chris, I'd like to discuss with you on how we can improve the current GC framework to take advantage of metadata. The reason I add this custom safe point creation method is because it's difficult for a front-end to pass custom information to the GC framework. And I think metadata could help us in this situation. Ideally, I'd like to get to a point where we can write:
>
> 15 = load %MyPointer* %2 !safepoint
>
> And the GCStrategy would find this metadata information in the MachineInstruction (because the safe points are created based on MachineInstruction). This solution looks very much like the !nontemporal metadata.
>
> Unfortunately, it looks like metadata are not passed down to MachineInstruction. How could we achieve that? Should we write custom code in the Instruction -> MachineInstruction transformation to pass down that !safepoint metadata?

Could such a program survive IR optimizations intact?

Well, we must be sure that any instruction that is a safepoint does get some metadata associated with it. If such instruction does not get this metadata, then then program would not survive. So the 'safepoint' metadata *must* be preserved across optimizations if the instruction gets emitted.
 
Nicolas


— Gordon

Gordon Henriksen

unread,
Nov 1, 2011, 5:48:05 AM11/1/11
to Nicolas Geoffray, llvmdev
On Nov 1, 2011, at 4:47 AM, Nicolas Geoffray <nicolas....@gmail.com> wrote:

Thanks for the review Gordon.

On Tue, Nov 1, 2011 at 2:21 AM, Gordon Henriksen <gordonh...@mac.com> wrote:
On 2011-10-31, at 17:21, Nicolas Geoffray wrote:

> Here's a patch to allow a GCStrategy to customize the places where it wants to insert safe points. I'm not sure who maintains the GC code today in LLVM (I'd be happy to take ownership, if needed).
>
> The patch just adds up a custom safepoints flag, similar to the way the GCStrategy can customize intrinsics lowering, or roots initialization. It works pretty well, as I've tested it on VMKit. So the patch in itself should not be controversial.

Seems a perfectly reasonable patch on its own merits.

I would be curious what your findCustomSafePoints implementation looks like. Perhaps it's possible to extract some of it into the framework.

The code for my findCustomSafePoints is like this:

void MyGC::findCustomSafePoints(GCFunctionInfo& FI, MachineFunction &MF) {
  for (MachineFunction::iterator BBI = MF.begin(),
                                 BBE = MF.end(); BBI != BBE; ++BBI) {
    for (MachineBasicBlock::iterator MI = BBI->begin(),
                                     ME = BBI->end(); MI != ME; ++MI) {
      if (MI->getDesc().isCall()) {
               /// Standard code need for adding a post call safe point;
      } else if (MI->getDebucLoc().getCol() == 1) {
              /// Standard code need for adding a post call safe point;
      }
  }
}

So it really looks like what we already have, except this special trick about checking the debug location. Piggy backing on the debug location is easy and works for me because:
1) Debug locations are passed from LLVM Instructions to MachineInstruction
2) I don't have column numbers in my front-end.

The instructions that get this special marker on the debug location are instructions that may trigger a segmentation fault. Because I enter a signal handler, that may invoke a collection, I need to know the roots at the faulting instruction's address.

I have my doubts about the robustness of this strategy, but if it's working well enough for your needs, then great.

Your patch seems appropriately factored. The alternative would seem to be calling a virtualized predicate on each machine instruction, as I don't think your debug info approach is something we would want to ‘bless’ just yet.

You may need to align your safe point machine code analysis with [GCStrategy.cpp]LowerIntrinsics::CouldBecomeSafePoint. Note that this predicate operates on IR instructions instead of machine code.

— Gordon

Nicolas Geoffray

unread,
Nov 1, 2011, 5:53:50 AM11/1/11
to Gordon Henriksen, llvmdev


2011/11/1 Gordon Henriksen <gordonh...@mac.com>

On Nov 1, 2011, at 4:47 AM, Nicolas Geoffray <nicolas....@gmail.com> wrote:

Thanks for the review Gordon.

On Tue, Nov 1, 2011 at 2:21 AM, Gordon Henriksen <gordonh...@mac.com> wrote:
On 2011-10-31, at 17:21, Nicolas Geoffray wrote:

> Here's a patch to allow a GCStrategy to customize the places where it wants to insert safe points. I'm not sure who maintains the GC code today in LLVM (I'd be happy to take ownership, if needed).
>
> The patch just adds up a custom safepoints flag, similar to the way the GCStrategy can customize intrinsics lowering, or roots initialization. It works pretty well, as I've tested it on VMKit. So the patch in itself should not be controversial.

Seems a perfectly reasonable patch on its own merits.

I would be curious what your findCustomSafePoints implementation looks like. Perhaps it's possible to extract some of it into the framework.

The code for my findCustomSafePoints is like this:

void MyGC::findCustomSafePoints(GCFunctionInfo& FI, MachineFunction &MF) {
  for (MachineFunction::iterator BBI = MF.begin(),
                                 BBE = MF.end(); BBI != BBE; ++BBI) {
    for (MachineBasicBlock::iterator MI = BBI->begin(),
                                     ME = BBI->end(); MI != ME; ++MI) {
      if (MI->getDesc().isCall()) {
               /// Standard code need for adding a post call safe point;
      } else if (MI->getDebucLoc().getCol() == 1) {
              /// Standard code need for adding a post call safe point;
      }
  }
}

So it really looks like what we already have, except this special trick about checking the debug location. Piggy backing on the debug location is easy and works for me because:
1) Debug locations are passed from LLVM Instructions to MachineInstruction
2) I don't have column numbers in my front-end.

The instructions that get this special marker on the debug location are instructions that may trigger a segmentation fault. Because I enter a signal handler, that may invoke a collection, I need to know the roots at the faulting instruction's address.

I have my doubts about the robustness of this strategy, but if it's working well enough for your needs, then great.

I have my doubts too. I would really like to see the metadata approach take over.

Nicolas
Reply all
Reply to author
Forward
0 new messages