Now that Ruby.NET 0.9 is out the door, we can perhaps start thinking
about features of the next release. I would imagine that Rails support
is something that a lot of people are interested in, so it might be
worth focusing our efforts in that direction. On the issues list there
are a couple of bugs that are blocking Rails support. Issue 5 is a
rather tricky problem regarding Ruby source file loading, so I will
spell out the situation and then open the floor for any thoughts on
this issue. In particular any wisdom about how MRI handles the tricky
cases of file loading would be appreciated.
The problem is that if you require different Ruby source files with
the same filename but in different directories, then the first file
gets loaded twice. You can test this with "dir1\file.rb" and
"dir2\file.rb" and a source file containing:
require 'dir1/file'
require 'dir2/file'
This occurs because Ruby.NET currently does not store either relative
or absolute path information for its compiled source files, either at
runtime or in the compiled assembly. When Ruby.NET encounters the
second "require", it sees that "file" is in the list of loaded Ruby
programs and loads "dir1/file" again.
I will briefly outline how Ruby.NET currently handles file loading.
When a Ruby source file gets compiled, we generate a CLR class in the
_Internal namespace called SourceFile_filename (where filename is the
filename of the Ruby source file, without path and extension). This
class is used by the Ruby.NET runtime: it executes any toplevel code,
as well as loading and initialising the Ruby classes in the source
file.
Ruby.NET keeps track of loaded Ruby source files at runtime using a
dictionary called Programs. The dictionary keys are based on the names
of the SourceFile_??? classes in the loaded assembly. Suppose that you
compile Ruby source files foo.rb and bar.rb into foo.dll. This DLL
will contain (along with a lot of other things) the CLR classes
_Internal.SourceFile_foo and _Internal.SourceFile_bar. If you then
require this DLL in a Ruby program, the Ruby runtime will load the
assembly and populate the Programs dictionary with the keys "foo" and
"bar" (which it infers from the class names SourceFile_foo and
SourceFile.bar). If you attempt to load any Ruby source file or DLL
called "foo" or "bar" - no matter where it is located on the file
system - then the existing "foo" or "bar" will be loaded again.
I hope this is not too confusing, but it should make the nature of the
problem clear: we need to store either relative or absolute path
information somewhere, either in the compiled assembly or at runtime.
The naive solution would be to encode the absolute pathnames in the
compiled _Internal.SourceFile_??? classes - but this is somewhat ugly
and raises some portability issues if you move those assemblies to
different directories on different machines.
Here are some off-the-top-of-my-head, thinking-out-loud design
questions that we might want to ponder:
* How much relative or absolute path information should we keep track
of, and where should it be stored? Perhaps we need a filename
attribute in the assembly (but bear in mind that we need to avoid
conflicts with class names in the _Internal namespace...)
* What about conflicts between any path information in the compiled
assembly, and the actual location of the file at runtime?
* How does MRI handle the tricky cases (e.g. referring to the same
file from different relative/absolute paths?)
Any thoughts would be appreciated...
Brian Blackwell
--
Research Assistant
Programming Languages and Systems
Queensland University of Technology
> Here are some off-the-top-of-my-head, thinking-out-loud design > questions that we might want to ponder:
> * How much relative or absolute path information should we keep track > of, and where should it be stored? Perhaps we need a filename > attribute in the assembly (but bear in mind that we need to avoid > conflicts with class names in the _Internal namespace...) > * What about conflicts between any path information in the compiled > assembly, and the actual location of the file at runtime? > * How does MRI handle the tricky cases (e.g. referring to the same > file from different relative/absolute paths?)
Regarding the last one, requiring 'foo", 'c:/foo', 'c:\foo', and 'c:\windows\..\foo' will load the file four times.
(However, requiring 'foo' and 'foo.rb' only loads it once.)
This suggests that the fix needs to come from how the Programs dictionary is populated.
I wonder if it might be possible to identify files by content rather than by name. eg _Internal.SourceFile_DEADBEEF
Attached is a hacky patch that attemts to do exactly that, I've replaced filenames with SHA1 hashes. I'm not familiar enough with this area so numerous bits are still incomplete/broken. But the trivial example works...
This breaks the ability to ship precompiled assemblies without source code (assuming we have that?). Again I haven't groked exactly how the resolution happens.. Maybe we can cache a map from relative compile time path to hash.
> Now that Ruby.NET 0.9 is out the door, we can perhaps start thinking > about features of the next release. I would imagine that Rails support > is something that a lot of people are interested in, so it might be > worth focusing our efforts in that direction. On the issues list there > are a couple of bugs that are blocking Rails support. Issue 5 is a > rather tricky problem regarding Ruby source file loading, so I will > spell out the situation and then open the floor for any thoughts on > this issue. In particular any wisdom about how MRI handles the tricky > cases of file loading would be appreciated.
> The problem is that if you require different Ruby source files with > the same filename but in different directories, then the first file > gets loaded twice. You can test this with "dir1\file.rb" and > "dir2\file.rb" and a source file containing:
> require 'dir1/file' > require 'dir2/file'
> This occurs because Ruby.NET currently does not store either relative > or absolute path information for its compiled source files, either at > runtime or in the compiled assembly. When Ruby.NET encounters the > second "require", it sees that "file" is in the list of loaded Ruby > programs and loads "dir1/file" again.
> I will briefly outline how Ruby.NET currently handles file loading. > When a Ruby source file gets compiled, we generate a CLR class in the > _Internal namespace called SourceFile_filename (where filename is the > filename of the Ruby source file, without path and extension). This > class is used by the Ruby.NET runtime: it executes any toplevel code, > as well as loading and initialising the Ruby classes in the source > file.
> Ruby.NET keeps track of loaded Ruby source files at runtime using a > dictionary called Programs. The dictionary keys are based on the names > of the SourceFile_??? classes in the loaded assembly. Suppose that you > compile Ruby source files foo.rb and bar.rb into foo.dll. This DLL > will contain (along with a lot of other things) the CLR classes > _Internal.SourceFile_foo and _Internal.SourceFile_bar. If you then > require this DLL in a Ruby program, the Ruby runtime will load the > assembly and populate the Programs dictionary with the keys "foo" and > "bar" (which it infers from the class names SourceFile_foo and > SourceFile.bar). If you attempt to load any Ruby source file or DLL > called "foo" or "bar" - no matter where it is located on the file > system - then the existing "foo" or "bar" will be loaded again.
> I hope this is not too confusing, but it should make the nature of the > problem clear: we need to store either relative or absolute path > information somewhere, either in the compiled assembly or at runtime. > The naive solution would be to encode the absolute pathnames in the > compiled _Internal.SourceFile_??? classes - but this is somewhat ugly > and raises some portability issues if you move those assemblies to > different directories on different machines.
> Here are some off-the-top-of-my-head, thinking-out-loud design > questions that we might want to ponder:
> * How much relative or absolute path information should we keep track > of, and where should it be stored? Perhaps we need a filename > attribute in the assembly (but bear in mind that we need to avoid > conflicts with class names in the _Internal namespace...) > * What about conflicts between any path information in the compiled > assembly, and the actual location of the file at runtime? > * How does MRI handle the tricky cases (e.g. referring to the same > file from different relative/absolute paths?)
> Any thoughts would be appreciated...
> Brian Blackwell > -- > Research Assistant > Programming Languages and Systems > Queensland University of Technology
[
0001-Identify-source-files-by-content-hash.patch 9K ] From 0eff5c1923b90b31cdc174452275822515f41a66 Mon Sep 17 00:00:00 2001 From: Douglas Stockwell <d...@11011.net> Date: Mon, 10 Dec 2007 12:08:32 +0900 Subject: [PATCH] Identify source files by content hash
That's an interesting suggestion. We could use the hash for generating
the name for the SourceFile_??? class, but as Johannes has pointed
out, we still need to load the same file twice if it is referenced
using different paths at runtime.
I could be wrong about this, but it seems that the runtime shouldn't
be concerned with the compiletime filename/paths of loaded assemblies
- what is relevant at runtime is the current filename and location of
the assemblies. Of course this gets a bit tricky when there are
multiple source files compiled into a single assembly...
Brian Blackwell
--
Research Assistant
Programming Languages and Systems
Queensland University of Technology
On Dec 10, 1:25 pm, Douglas Stockwell <d...@11011.net> wrote:
> I wonder if it might be possible to identify files by content rather
> than by name. eg _Internal.SourceFile_DEADBEEF
> Attached is a hacky patch that attemts to do exactly that, I've replaced
> filenames with SHA1 hashes. I'm not familiar enough with this area so
> numerous bits are still incomplete/broken. But the trivial example works...
> This breaks the ability to ship precompiled assemblies without source
> code (assuming we have that?). Again I haven't groked exactly how the
> resolution happens.. Maybe we can cache a map from relative compile time
> path to hash.
> Doug
> Brian Blackwell wrote:
> > Hi all,
> > Now that Ruby.NET 0.9 is out the door, we can perhaps start thinking
> > about features of the next release. I would imagine that Rails support
> > is something that a lot of people are interested in, so it might be
> > worth focusing our efforts in that direction. On the issues list there
> > are a couple of bugs that are blocking Rails support. Issue 5 is a
> > rather tricky problem regarding Ruby source file loading, so I will
> > spell out the situation and then open the floor for any thoughts on
> > this issue. In particular any wisdom about how MRI handles the tricky
> > cases of file loading would be appreciated.
> > The problem is that if you require different Ruby source files with
> > the same filename but in different directories, then the first file
> > gets loaded twice. You can test this with "dir1\file.rb" and
On Dec 9, 2007 10:22 PM, Brian Blackwell <meaningis...@gmail.com> wrote:
> That's an interesting suggestion. We could use the hash for generating > the name for the SourceFile_??? class, but as Johannes has pointed > out, we still need to load the same file twice if it is referenced > using different paths at runtime.
> I could be wrong about this, but it seems that the runtime shouldn't > be concerned with the compiletime filename/paths of loaded assemblies > - what is relevant at runtime is the current filename and location of > the assemblies. Of course this gets a bit tricky when there are > multiple source files compiled into a single assembly...
This is just DLL Hell circa 2007, the difference, of course, is that we're dealing with text source files instead of compiled libraries and path+filename instead of library+progid. There's a few suggestions located @ http://en.wikipedia.org/wiki/DLL_hell#Solutions which might spark a few ideas, but similar to the way MSFT approached things with .NET, I certainly believe that going down the hash path is the right direction. Obviously the notion of strongly named assemblies doesn't translate well to text source files, but if you take the hash of the reduced filename (reduced, as in taking c:\bar, foo/bar, c:/foo/../bar, etc. and reducing them all down to bar) + a standard separator (e.g. ':') + the hash of the file itself and use the resulting concatenation as the key, storing the source of the file itself as the value inside of a hashtable created at run time, you should be enabled to then take the hash of the reduced "require 'path/to/foo/bar'", take the hash of the file, concat them together, look to see if it exists in the hashtable, and if it does, don't re-load it. If it doesn't, add it to the hashtable and continue forward.
With the above in mind, you can then take SourceFile_filename and replace filename with the associated key name specified in the hashtable, and then use the same "strong name" impersonation technique mentioned above to ensure that each time you come across a new reference to 'bar', you can quickly determine if the source has already been compiled, move on if it has, or load, compile, and add another entry to the 'Programs' dictionary if it has not.
This is just DLL Hell circa 2007, the difference, of course, is that we're dealing with text source files instead of compiled libraries and path+filename instead of library+progid. Actually, it's not simply source files, it's both source files and dlls.
A load or require statement may load either a source file or a dll.
If the program loads or requires say "foo.rb" and the corresponding directory contains a foo.dll, then that dll will be loaded provided it is newer than both the source code file and the compiler.
Also, multiple Ruby source files may be compiled into a single dll (or exe). Only the first of these source files is loaded automatically when the assembly is loaded, the others must be loaded or required explicitly. Ie, the code for these other source files is precompiled, but only loaded as requested - so we need a mechanism within our assemblies of mapping the original source file names to their precompiled .NET classes within that assembly. Currently this is doe via a custom attribute. We don't however, currently include the path of these source files.
The question is should we be using absolute or relative paths, and if relative, relative to what? The problem with absolute paths is that such Ruby.NET assemblies may be deployed to new machines at a different absolute path.
> The question is should we be using absolute or relative paths, and if > relative, relative to what? The problem with absolute paths is that such > Ruby.NET assemblies may be deployed to new machines at a different absolute > path.
Relative, and relative to load path, i.e. $:, which can be modified by -I command line option and RUBYLIB.
On Dec 11, 2007 11:54 PM, Wayne Kelly <w.ke...@qut.edu.au> wrote:
> This is just DLL Hell circa 2007, the difference, of course, is that > we're dealing with text source files instead of compiled libraries and > path+filename instead of library+progid.
> Actually, it's not simply source files, it's both source files and dlls.
> A load or require statement may load either a source file or a dll.
Thanks for the clarification, Wayne! This is quite a different problem to solve, indeed.
> If the program loads or requires say "foo.rb" and the corresponding > directory contains a foo.dll, then that dll will be loaded > provided it is newer than both the source code file and the compiler.
By 'compiler' I assume you mean the in-memory compiled instance of the source file?
> Also, multiple Ruby source files may be compiled into a single dll (or > exe). Only the first of these source files is loaded > automatically when the assembly is loaded, the others must be loaded or > required explicitly. Ie, the code for these > other source files is precompiled, but only loaded as requested - so we > need a mechanism within our assemblies of mapping the original source file > names to their precompiled .NET classes within that assembly. Currently this > is doe via a custom attribute. We don't however, currently include the path > of these source files.
Yeah, and I don't think there could be any sane reasoning behind including the absolute path either. At least not for Ruby.NET libraries shipped as compiled assemblies rather than as the original source code. I can see why it _might_ make sense to include the absolute path when dealing with multiple assemblies/source files with the same name at runtime, but I'm not sure if making the distinction between compile time and runtime referencing would make all that much sense, either.
> The question is should we be using absolute or relative paths, and if > relative, relative to what? The problem with absolute paths is that such > Ruby.NET assemblies may be deployed to new machines at a different > absolute path.
Yeah, that makes complete sense, and I agree. I think I need to go back to issue #5 to make sure I properly understand the issue at hand as it relates to RoR, though if anybody can help clarify things (again, as it relates specifically to running RoR via Ruby.NET) I would certainly appreciate it.
On Dec 12, 2007 12:22 AM, M. David Peterson <xmlhac...@gmail.com> wrote:
> Yeah, that makes complete sense, and I agree. I think I need to go back > to issue #5 to make sure I properly understand the issue at hand as it > relates to RoR, though if anybody can help clarify things (again, as it > relates specifically to running RoR via Ruby.NET) I would certainly > appreciate it.
> rubygems (hence rails) needs net/http, which in turn needs uri/http.
Uh, yeah, nevermind. I don't think it could be any more clear than this. ;-)
So in this regard, why not use the hash of the relative URI as the key? Obviously I'm not the first to think of this, so I guess what I am really asking is why use only the file name instead of the relative path+filename?
On Dec 12, 2007 12:32 AM, M. David Peterson <xmlhac...@gmail.com> wrote:
> So in this regard, why not use the hash of the relative URI as the key? > Obviously I'm not the first to think of this, so I guess what I am really > asking is why use only the file name instead of the relative path+filename?
Or to ask this another way (I realize that if the DLL is shipped as is, then the relative path is almost certainly going to be meaningless), if the problem with getting RoR to run has to do with > rubygems (hence rails) needs net/http, which in turn needs uri/http. < then if the relative path is accessible, then using the relative path+filename would certainly be one way to fix the problem as it relates specifically to RoR, would it not?
M. David Peterson wrote: > On Dec 12, 2007 12:32 AM, M. David Peterson > <xmlhac...@gmail.com > <mailto:xmlhac...@gmail.com>> wrote:
> So in this regard, why not use the hash of the relative URI as the > key? Obviously I'm not the first to think of this, so I guess what > I am really asking is why use only the file name instead of the > relative path+filename?
> Or to ask this another way (I realize that if the DLL is shipped as is, > then the relative path is almost certainly going to be meaningless), if > the problem with getting RoR to run has to do with > rubygems (hence > rails) needs net/http, which in turn needs uri/http. < then if the > relative path is accessible, then using the relative path+filename would > certainly be one way to fix the problem as it relates specifically to > RoR, would it not?
I think there is also an issue of assembly name identity.
Given net/http and uri/http, if compiled independently both are currently named "http" - if I'm not mistaken this causes problems at runtime.
The assembly name itself is probably unimportant as long as long as it is unique.
Wayne Kelly wrote: > Also, multiple Ruby source files may be compiled into a single dll (or > exe). Only the first of these source files is loaded > automatically when the assembly is loaded, the others must be loaded or > required explicitly. Ie, the code for these > other source files is precompiled, but only loaded as requested - so we > need a mechanism within our assemblies of mapping the original source > file names to their precompiled .NET classes within that assembly. > Currently this is doe via a custom attribute. We don't however, > currently include the path of these source files.
Hi Wayne,
I have some questions about understanding multiple source in one assembly.
What is the benefit of compiling multiple source files into a single assembly?
What are the scenarios where multiple files would be compiled together?
These ones are probably still to be decided: How is the final name (filename) of the assembly to be decided? Should the assembly be loaded when any of the files it contains are required, or just the first/main file?
eg. a.rb, dir/b.rb, and ../c.rb eg. dir/a.rb dir2/b.rb
One option may be to name and place the assembly alongside the first/main file. All other files would be encoded in the assembly with a path relative to the first file.
This assumes we only "load" the assembly when the first file is required.
eg. a.rb, dir/b.rb, and ../c.rb = a.dll: SourceFile_a Sourcefile_dir/b SourceFile_../c
Then if at runtime I issue a "require 'somewhere/a'" which resolves to the a.dll from above we can populate a list of mappings by combining 'somewhere' with the relative paths stored with the compiled versions of the other included files.
list = 'somewhere/dir/b' = a::SourceFile_dir/b 'c' = a::SourceFile_../c
For further "require"s we would then resolve by scanning the list of mappings in addition to the filesystem.
> -----Original Message----- > From: RubyDOTNET@googlegroups.com > [mailto:RubyDOTNET@googlegroups.com] On Behalf Of Douglas Stockwell > Sent: Wednesday, 12 December 2007 6:25 PM > To: RubyDOTNET@googlegroups.com > Subject: Re: Issues with source file loading - any thoughts?
> Wayne Kelly wrote: > > Also, multiple Ruby source files may be compiled into a > single dll (or > > exe). Only the first of these source files is loaded automatically > > when the assembly is loaded, the others must be loaded or required > > explicitly. Ie, the code for these other source files is > precompiled, > > but only loaded as requested - so we need a mechanism within our > > assemblies of mapping the original source file names to their > > precompiled .NET classes within that assembly. > > Currently this is doe via a custom attribute. We don't however, > > currently include the path of these source files.
> Hi Wayne,
> I have some questions about understanding multiple source in > one assembly.
> What is the benefit of compiling multiple source files into a > single assembly?
That's largely to do with providing the traditional Visual Studio project option for Ruby.NET, where projects consist of multiple source files that are compiled ahead of time into a single assembly.
The other scenario is where a single source file is compiled into a single assembly (with the same root file name). It is only in this later scenario that I expect us to automatically load a dll rather than load and recompile the source file.
So, if a.rb, b.rb and c.rb is compiled into say foo.dll, then other source code would need to explicitly load or require "foo.dll" in order to use the dll. However, if a.rb requires say b.rb then we should use the precompiled version in the assembly rather than searching for source file b.rb.