Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ICU data file location issues

6 views
Skip to first unread message

Dan Sugalski

unread,
Apr 14, 2004, 10:49:53 AM4/14/04
to perl6-i...@perl.org
Just came across an interesting quirk with the current usage of
ICU--if you do it, you can't run parrot unless your current directory
is the base parrot directory. Trying it from elsewhere throws a
"string_set_data_directory: ICU data files not found" error.

Symlinking parrot's blib/ dir into the current dir works as a
workaround, but we need to do something a bit more permanent. (If
this means we need to work on an actual functioning install target,
well... that's OK too)
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Jeff Clites

unread,
Apr 14, 2004, 10:42:37 PM4/14/04
to Dan Sugalski, perl6-i...@perl.org
On Apr 14, 2004, at 7:49 AM, Dan Sugalski wrote:

> Just came across an interesting quirk with the current usage of
> ICU--if you do it, you can't run parrot unless your current directory
> is the base parrot directory. Trying it from elsewhere throws a
> "string_set_data_directory: ICU data files not found" error.
>
> Symlinking parrot's blib/ dir into the current dir works as a
> workaround, but we need to do something a bit more permanent. (If this
> means we need to work on an actual functioning install target, well...
> that's OK too)

You can tell it to find its data files with something like this:

perl Configure.pl --icudatadir="`pwd`/blib/lib/icu/2.6.1"

(that will make it use an absolute path).

By default it uses a relative path. The upside of a relative path is
that you can move your whole source directory to a new location and it
will still look in the "right" place; the downside is that you have to
run from inside that directory. With an absolute path, it's the
opposite. Once we are install-ing, it won't be a problem, but meanwhile
you can choose which you like better.

(Note: This location is certainly settable at runtime--it's just a
matter of teaching a new command-line paramter to parrot. Currently the
path gets flowed into a #define in config.h called
DEFAULT_ICU_DATA_DIR, so it has been named in preparation for being
overridden at runtime, and just treating the config'd value as a
fallback.)

BTW, I think we currently have the same problem with the "runtime" dir
for finding dynamic extensions.

Is there yet a way (or really, an easy way) to access parrot's config's
at runtime from inside the parrot code? I know we have
library/config.fpmc, but I didn't know if anything was reading this
from inside the parrot core. Or alternatively, should parrot have a
separate (smaller) config file for parameters like these that it will
always have to read, so that it doesn't pull in the full config on
every run?

JEff

Jarkko Hietaniemi

unread,
Apr 15, 2004, 1:20:36 AM4/15/04
to perl6-i...@perl.org
> Just came across an interesting quirk with the current usage of
> ICU--if you do it, you can't run parrot unless your current directory
> is the base parrot directory. Trying it from elsewhere throws a
> "string_set_data_directory: ICU data files not found" error.
>
> Symlinking parrot's blib/ dir into the current dir works as a
> workaround, but we need to do something a bit more permanent. (If
> this means we need to work on an actual functioning install target,
> well... that's OK too)

TT (Tangentially Topical): it would be nice if Parrot could avoid as
many hardcoded paths as possible for configs, libraries, and such, so
that the Parrot installation could be relocated as freely as possible.
(Finding stuff relative to the executable/DLL would be coolest scheme,
but that is admittedly somewhat tricky to get working cross-platform.
Environment variables are another possibility-- but that in turn raises
"interesting" security issues.)


Jeff Clites

unread,
Apr 15, 2004, 2:25:22 AM4/15/04
to Jarkko Hietaniemi, perl6-i...@perl.org
On Apr 14, 2004, at 10:20 PM, Jarkko Hietaniemi wrote:

> Finding stuff relative to the executable/DLL would be coolest scheme,
> but that is admittedly somewhat tricky to get working cross-platform.

Excellent idea. Pretty much every single resource in Cocoa applications
and frameworks on Mac OS X is located using a scheme such as this, and
I believe it all used to work correctly for OpenStep applications on
Windows, so there's a good chance it could be made to work.

For Unix platforms at least, you should be able to do this:

executablePath = isAbsolute($0) ? dirname($0) : cwd().dirname($0)

(to mix a bunch of syntaxes)

during initialization before you've had a chance to chdir, and store
that away on the interpreter struct. That should work unless you've
gone out of your way to execute parrot with argv[0] set to something
"fake". I don't know what you'd do on Windows, but there must be
something.

An embedded parrot would need to be told explicitly where to find its
resources, just by using the API that standalone parrot would call to
store this information.

JEff

Jonathan Worthington

unread,
Apr 15, 2004, 5:04:38 AM4/15/04
to Jeff Clites, Jarkko Hietaniemi, perl6-i...@perl.org
"Jeff Clites" <jcl...@mac.com> wrote:
> On Apr 14, 2004, at 10:20 PM, Jarkko Hietaniemi wrote:
>
> > Finding stuff relative to the executable/DLL would be coolest scheme,
> > but that is admittedly somewhat tricky to get working cross-platform.
>
> Excellent idea. Pretty much every single resource in Cocoa applications
> and frameworks on Mac OS X is located using a scheme such as this, and
> I believe it all used to work correctly for OpenStep applications on
> Windows, so there's a good chance it could be made to work.
>
> For Unix platforms at least, you should be able to do this:
>
> executablePath = isAbsolute($0) ? dirname($0) : cwd().dirname($0)
>
> (to mix a bunch of syntaxes)
>
> during initialization before you've had a chance to chdir, and store
> that away on the interpreter struct. That should work unless you've
> gone out of your way to execute parrot with argv[0] set to something
> "fake". I don't know what you'd do on Windows, but there must be
> something.
>
Strangely enough, I'm in the middle of putting something like this in place
for another project... On Win32 you do:-

GetModuleFileName(NULL, buffer, buffer_size)

Passing NULL in as the first parameter returns the path to the executable
the currently executing process (e.g. Parrot in our case) was created from.
You then just need to chop off the executable name to find your path.

Jonathan


Nicholas Clark

unread,
Apr 15, 2004, 6:03:14 AM4/15/04
to Jeff Clites, perl6-i...@perl.org
On Wed, Apr 14, 2004 at 11:25:22PM -0700, Jeff Clites wrote:

> For Unix platforms at least, you should be able to do this:
>
> executablePath = isAbsolute($0) ? dirname($0) : cwd().dirname($0)
>
> (to mix a bunch of syntaxes)
>
> during initialization before you've had a chance to chdir, and store
> that away on the interpreter struct. That should work unless you've
> gone out of your way to execute parrot with argv[0] set to something
> "fake". I don't know what you'd do on Windows, but there must be
> something.

I think that it can be fun on HP-UX (where for #! the kernel sets argv[0]
to the path of the script not the interpreter, despite the fact that the
script's path is going to be somewhere else in argv) and AIX (where it seems
that the kernel sets argv[0] to only the leafname of the interpreter,
rather than the full path).

But all this is from memory, and in turn for #! invocation one can always
parse the #! line to work out where the interpreter was (mmm. race
condition)

Nicholas Clark

Dan Sugalski

unread,
Apr 15, 2004, 10:24:53 AM4/15/04
to perl6-i...@perl.org
At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote:
>TT (Tangentially Topical): it would be nice if Parrot could avoid as
>many hardcoded paths as possible for configs, libraries, and such, so
>that the Parrot installation could be relocated as freely as possible.

Well, then...

Given that everyone's weighing in on this one, it seems worthy of
sane consideration. (I keep not thinking about this, as I'm used to
the nicely sane VMS logical system :)

As we've got the unpleasant issues of OSes with Really Lame schemes,
and embedders that may want to use alternate resource locations, it
seems like the right thing to do here is to make this a part of the
embedding interface and have the main parrot wrapper set it.

So, I'm thinking a few things:

1) We add a Parrot_set_library_base(char *lib_path, int length)
function to set the base library path
2) We add a Parrot_get_base_library_path() function to the
platform-specific interface so platforms can return the base path
3) Parrot itself (the main executable) has a static, global 1K buffer
in it that starts and ends with some recognizable string (like, say,
"***+++***START|" and "|END***+++***") so we can find it and
overwrite the contents if the library gets moved, for use on
platforms where the only way to put a path in is to stick it
statically in the executable.

#3, I should point out, will *only* be used on those platforms that
don't have a better scheme, and only by the
Parrot_get_base_library_path() function.

Sound sane? I can see splitting up the library base path into
sections, but I'm not sure it's worth it. Now'd be the time to argue
that, though :)

Jarkko Hietaniemi

unread,
Apr 15, 2004, 11:23:45 AM4/15/04
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski wrote:
> At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote:
>
>>TT (Tangentially Topical): it would be nice if Parrot could avoid as
>>many hardcoded paths as possible for configs, libraries, and such, so
>>that the Parrot installation could be relocated as freely as possible.
>
>
> Well, then...
>
> Given that everyone's weighing in on this one, it seems worthy of
> sane consideration. (I keep not thinking about this, as I'm used to
> the nicely sane VMS logical system :)

Brag :-)

(in case someone is wondering, the VMS "logicals" nicely solve this
problem, basically by each piece of software being installed into and
used/accessed throuh a "super environment variable"-- so basically Dan
can't understand why us others are having these problems and talk of it
as a new fancy thing :-)

Dan Sugalski

unread,
Apr 15, 2004, 11:32:39 AM4/15/04
to Jarkko Hietaniemi, perl6-i...@perl.org
At 6:23 PM +0300 4/15/04, Jarkko Hietaniemi wrote:
>Dan Sugalski wrote:
>> At 8:20 AM +0300 4/15/04, Jarkko Hietaniemi wrote:
>>
>>>TT (Tangentially Topical): it would be nice if Parrot could avoid as
>>>many hardcoded paths as possible for configs, libraries, and such, so
>>>that the Parrot installation could be relocated as freely as possible.
>>
>>
>> Well, then...
>>
>> Given that everyone's weighing in on this one, it seems worthy of
>> sane consideration. (I keep not thinking about this, as I'm used to
>> the nicely sane VMS logical system :)
>
>Brag :-)

:-P

>(in case someone is wondering, the VMS "logicals" nicely solve this
>problem, basically by each piece of software being installed into and
>used/accessed throuh a "super environment variable"-- so basically Dan
>can't understand why us others are having these problems and talk of it
>as a new fancy thing :-)

Oh, and have I mentioned they're group and system wide, persistent,
group-protected, and leveled by protection, so they're actually safe
to trust? (So if you look for an entry in a system logical table you
can trust it, since someone needed compromise-the-world privs to set
it in the first place so you've got bigger things to worry about if
it's bad? :)

Not to, y'know, show off or anything. :)

Jeff Clites

unread,
Apr 15, 2004, 11:44:46 AM4/15/04
to Nicholas Clark, perl6-i...@perl.org
On Apr 15, 2004, at 3:03 AM, Nicholas Clark wrote:

> But all this is from memory, and in turn for #! invocation one can
> always
> parse the #! line to work out where the interpreter was (mmm. race
> condition)

And a race isn't too bad here actually--even if we know the path
reliably, it's always possible to move or alter the resources which
we're trying to locate, at any time (before, during, or after launching
the process). So we need to treat them with as much skepticism as
anything else on the file system.

And for parrot-the-executable we should offer a command-line parameter
to override the location. That would give people an escape hatch for
special situations (for instance, if you are going to chroot or
something).

JEff

Jeff Clites

unread,
Apr 15, 2004, 11:35:29 AM4/15/04
to Dan Sugalski, perl6-i...@perl.org
On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:

> Sound sane? I can see splitting up the library base path into
> sections, but I'm not sure it's worth it. Now'd be the time to argue
> that, though :)

Makes sense to me to just store the path--keep it simple. As long as
we've stored it away, anything using it later can chop it up into
pieces itself if it wants too--anything we could have done in splitting
it up, the consumer can do too. The only thing we really have to do is
grab the info before it's too late--before something might have
chdir'd, and before argv is either inaccessible, or could have been
overwritten.

JEff

Jeff Clites

unread,
Apr 15, 2004, 11:54:37 AM4/15/04
to Dan Sugalski, perl6-i...@perl.org
On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:

> At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
>> On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:
>>
>>> Sound sane? I can see splitting up the library base path into
>>> sections, but I'm not sure it's worth it. Now'd be the time to argue
>>> that, though :)
>>
>> Makes sense to me to just store the path--keep it simple.
>

> That's what I'm thinking, but I can see wanting to have separate paths
> for parrot's low-level libraries (basically the things we need for
> parrot to run in the first place) and higher-level libraries (modules
> installed off of CPAN and whatnot).

That's true. But as long as we grab the "here's where the executable
is", we can (later) build API on top of that if we want. For instance,
we could decide that core, low-level resources will be located relative
to that path, and one of those resources will undoubtedly be a config
file of some sort, and that config file could contain the path(s) to
look for higher-level stuff. As long as we've "rescued" and stored our
location, we've sort of bootstrapped that process.

(And to loop back a bit, the nice thing about bootstrapping this stuff
based on our executable's location is that it makes it a no-brainer to
have multiple, relocatable installs of parrot. And people would even be
able to have 10 different versions of parrot sitting around, but have
them all configured to share the same high-level resources.)

JEff

Dan Sugalski

unread,
Apr 15, 2004, 11:41:10 AM4/15/04
to Jeff Clites, perl6-i...@perl.org
At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
>On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:
>
>>Sound sane? I can see splitting up the library base path into
>>sections, but I'm not sure it's worth it. Now'd be the time to
>>argue that, though :)
>
>Makes sense to me to just store the path--keep it simple.

That's what I'm thinking, but I can see wanting to have separate

paths for parrot's low-level libraries (basically the things we need
for parrot to run in the first place) and higher-level libraries

(modules installed off of CPAN and whatnot). I'm firmly in the "Don't
care" camp here, so I figured I'd open it to discussion before
enshrining the result in the API. :)

Brent 'Dax' Royal-Gordon

unread,
Apr 15, 2004, 12:05:05 PM4/15/04
to Perl 6 Internals
Dan Sugalski wrote:
> 1) We add a Parrot_set_library_base(char *lib_path, int length) function
> to set the base library path
> 2) We add a Parrot_get_base_library_path() function to the
> platform-specific interface so platforms can return the base path

Works for me...

> 3) Parrot itself (the main executable) has a static, global 1K buffer in
> it that starts and ends with some recognizable string (like, say,
> "***+++***START|" and "|END***+++***") so we can find it and overwrite
> the contents if the library gets moved, for use on platforms where the
> only way to put a path in is to stick it statically in the executable.

That's pretty disgusting, but I don't know that I have a better idea.

> #3, I should point out, will *only* be used on those platforms that
> don't have a better scheme, and only by the
> Parrot_get_base_library_path() function.

System registry on Windows? /etc file on Unixen?

Actually, one thing I'd like to see is if it wasn't the library's base
path hardcoded in, but the base path of a frozen data structure or
program that encoded Parrot's settings. That would allow it to carry
the runtime library path, the paths to ICU's tables, the paths to search
for PMCs, and whatever else we can think of, without a hardcoded limit.

> Sound sane? I can see splitting up the library base path into sections,
> but I'm not sure it's worth it. Now'd be the time to argue that, though :)

--
Brent "Dax" Royal-Gordon <br...@brentdax.com>
Perl and Parrot hacker

Oceania has always been at war with Eastasia.


Dan Sugalski

unread,
Apr 15, 2004, 12:41:09 PM4/15/04
to Brent 'Dax' Royal-Gordon, Perl 6 Internals
At 9:05 AM -0700 4/15/04, Brent 'Dax' Royal-Gordon wrote:

>Dan Sugalski wrote:
>>3) Parrot itself (the main executable) has a static, global 1K
>>buffer in it that starts and ends with some recognizable string
>>(like, say, "***+++***START|" and "|END***+++***") so we can find
>>it and overwrite the contents if the library gets moved, for use on
>>platforms where the only way to put a path in is to stick it
>>statically in the executable.
>
>That's pretty disgusting, but I don't know that I have a better idea.

There isn't one, alas, at least for some people.

>>#3, I should point out, will *only* be used on those platforms that
>>don't have a better scheme, and only by the
>>Parrot_get_base_library_path() function.
>
>System registry on Windows? /etc file on Unixen?

That's global. Bad idea, it messes up multiple installs of the same
version, or similar-enough versions that they're indistinguishable.

>Actually, one thing I'd like to see is if it wasn't the library's base
>path hardcoded in, but the base path of a frozen data structure or
>program that encoded Parrot's settings. That would allow it to carry
>the runtime library path, the paths to ICU's tables, the paths to search
>for PMCs, and whatever else we can think of, without a hardcoded limit.

This wouldn't be a bad thing, nope. I could see security issues--it'd
probably be better to link the config file right into parrot.

Dan Sugalski

unread,
Apr 15, 2004, 12:36:53 PM4/15/04
to Jeff Clites, perl6-i...@perl.org
At 8:54 AM -0700 4/15/04, Jeff Clites wrote:
>On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:
>
>>At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
>>>On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:
>>>
>>>>Sound sane? I can see splitting up the library base path into
>>>>sections, but I'm not sure it's worth it. Now'd be the time to
>>>>argue that, though :)
>>>
>>>Makes sense to me to just store the path--keep it simple.
>>
>>That's what I'm thinking, but I can see wanting to have separate
>>paths for parrot's low-level libraries (basically the things we
>>need for parrot to run in the first place) and higher-level
>>libraries (modules installed off of CPAN and whatnot).
>
>That's true. But as long as we grab the "here's where the executable
>is", we can (later) build API on top of that if we want.

Well, yeah, but... where the executable is ought, honestly, to be
irrelevant. If I've stuck Parrot in /usr/bin it seems unlikely that
I'll have parrot's library files hanging off of /usr/bin. And if I've
got a few hundred machines with parrot's library NFS mounted in
different places (to match conflicting vendor standards and other
whackjob breakage which is endemic in, well, the world) it really
falls down. :) Add to that you can't always figure out where Parrot
really is both because of chroot behaviour and some odd "where am I
really" problems with suid scripts in some places.

There are a couple of folks who could make your brain melt and flow
out your ears with all this stuff too.

Having the executable path as an optional way to get the info's not
necessarily a bad thing, but I think it's safe to say that it's not
The Right Thing. (If there even is one)

If nothing else this has convinced me we need a way to specify site
policy at build time for all this nonsense^Wfun. :)

Brent 'Dax' Royal-Gordon

unread,
Apr 15, 2004, 12:59:47 PM4/15/04
to Dan Sugalski, Perl 6 Internals
Dan Sugalski wrote:
>>> #3, I should point out, will *only* be used on those platforms that
>>> don't have a better scheme, and only by the
>>> Parrot_get_base_library_path() function.
>>
>> System registry on Windows? /etc file on Unixen?
>
> That's global. Bad idea, it messes up multiple installs of the same
> version, or similar-enough versions that they're indistinguishable.

Good point.

>> Actually, one thing I'd like to see is if it wasn't the library's base
>> path hardcoded in, but the base path of a frozen data structure or
>> program that encoded Parrot's settings. That would allow it to carry
>> the runtime library path, the paths to ICU's tables, the paths to search
>> for PMCs, and whatever else we can think of, without a hardcoded limit.
>
> This wouldn't be a bad thing, nope. I could see security issues--it'd
> probably be better to link the config file right into parrot.

Install it with root ownership and 644 permissions, in a directory with
similar settings. (Or the system's equivalent, of course.) Then put
big blinking security warnings wherever the documentation talks about
editing that file. We can't protect sysadmins from their own idiocy.

Jeff Clites

unread,
Apr 15, 2004, 1:23:06 PM4/15/04
to Dan Sugalski, perl6-i...@perl.org
On Apr 15, 2004, at 9:36 AM, Dan Sugalski wrote:

> At 8:54 AM -0700 4/15/04, Jeff Clites wrote:
>> On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:
>>
>>> At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
>>>> On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:
>>>>
>>>>> Sound sane? I can see splitting up the library base path into
>>>>> sections, but I'm not sure it's worth it. Now'd be the time to
>>>>> argue that, though :)
>>>>
>>>> Makes sense to me to just store the path--keep it simple.
>>>
>>> That's what I'm thinking, but I can see wanting to have separate
>>> paths for parrot's low-level libraries (basically the things we need
>>> for parrot to run in the first place) and higher-level libraries
>>> (modules installed off of CPAN and whatnot).
>>
>> That's true. But as long as we grab the "here's where the executable
>> is", we can (later) build API on top of that if we want.
>
> Well, yeah, but... where the executable is ought, honestly, to be
> irrelevant.

Yes, in a sense it's irrelevant, but it's the only thing that's 1:1
with a particular "copy" of parrot. It's the only thing (that I can
think of) which continues to work if you move your distro around, and
which naturally avoids problems with having multiple copies, and lets
things work even if you don't "install".

> If I've stuck Parrot in /usr/bin it seems unlikely that I'll have
> parrot's library files hanging off of /usr/bin.

Right, so you do what Mac OS X does with the java executable--you put a
symlink in /usr/bin, pointing to the real location. And your "path to
the executable" has to call realpath() or the equivalent to resolve
such symlinks (which you need to do in order for path logic to
do-the-right-thing).

> And if I've got a few hundred machines with parrot's library NFS
> mounted in different places (to match conflicting vendor standards and
> other whackjob breakage which is endemic in, well, the world) it
> really falls down. :)

I'm not sure I get your meaning here. By "executable", I mean
standalone-parrot, not libparrot, of course. If you mean that libparrot
might end up in 100 different places, then you'll not end up with the
dynamic linker finding things properly, so you'll have a bigger problem
to solve. If you mean that standalone-parrot could end up in 100
different places, then you're going to have 100 different ways you need
to set up $PATH just to launch it, but once it's executing you'd still
be fine. Or each host will have its own separate symlink in /usr/bin to
the right location for that host, and everything will just be fine.

> Add to that you can't always figure out where Parrot really is both
> because of chroot behaviour and some odd "where am I really" problems
> with suid scripts in some places.

With chroot, frankly, you have the same problem with DLLs, and you end
up needing to have all of your necessary external resources located in
your chroot-dir so that their paths after the chroot match their paths
before. So that was a bad example on my part, really. (And, if you are
chroot-ing from within a parrot script, you're in a place where you'd
want to re-point your config dir path to match.)

But with interpreter files we could have the problem that the kernel
hides the info from us. But for bytecode files, if they're launched
like java apps are launched, with "parrot foo", then that problem
wouldn't come up.

> Having the executable path as an optional way to get the info's not
> necessarily a bad thing, but I think it's safe to say that it's not
> The Right Thing. (If there even is one)

Yeah, I don't think there's a 100% solution, but it would be nice to
have something which works 95% of the time and is flexible/convenient,
in preference to something that works 96% of the time and is less
powerful.

I think a reasonable approach would be:

1) Always allow the config location to be overridden via a command-line
parameter, and change-able from bytecode. (That let's you be 100%
unambiguous, at the cost of needing to execute parrot in a particular
way. And it's convenient for testing against a whole bunch of different
sets of configs without rebuilding.)

2a) On platforms which support it, auto-find the executable, and base
the config path on that.

2b) On platforms which don't support that (and even, as a compile-time
option for those which support it), have a compiled-in path to use.

This basically matches the API you mentioned before, and boils down to
what gets passed to Parrot_set_library_base() (or, call it
Parrot_set_configuration_base maybe) at launch time--it gets passed
either an explicitly supplied value, an inferred value, or a
compiled-in value).

JEff

Jarkko Hietaniemi

unread,
Apr 15, 2004, 1:19:14 PM4/15/04
to Dan Sugalski, Jeff Clites, perl6-i...@perl.org
> Well, yeah, but... where the executable is ought, honestly, to be
> irrelevant. If I've stuck Parrot in /usr/bin it seems unlikely that
> I'll have parrot's library files hanging off of /usr/bin.

Bah. BAH, I say. The /usr/bin/parrot is of course a symlink
to, say, /platform/os/version/parrot/version/bin/parrot, and we
parse the real path, not the symlink.

> And if I've got a few hundred machines with parrot's library NFS mounted in
> different places (to match conflicting vendor standards and other
> whackjob breakage which is endemic in, well, the world) it really
> falls down. :) Add to that you can't always figure out where Parrot
> really is both because of chroot behaviour and some odd "where am I
> really" problems with suid scripts in some places.
>
> There are a couple of folks who could make your brain melt and flow
> out your ears with all this stuff too.

Yes, I was once one of those people :-)

Leopold Toetsch

unread,
Apr 15, 2004, 12:29:56 PM4/15/04
to Brent 'Dax' Royal-Gordon, perl6-i...@perl.org
Brent 'Dax' Royal-Gordon <br...@brentdax.com> wrote:

> Dan Sugalski wrote:
>> "***+++***START|" and "|END***+++***") so we can find it and overwrite

> That's pretty disgusting, but I don't know that I have a better idea.

Same scheme as with fingerprint.c?

leo

Jeff Clites

unread,
Apr 15, 2004, 1:30:40 PM4/15/04
to Brent 'Dax' Royal-Gordon, Perl 6 Internals
On Apr 15, 2004, at 9:05 AM, Brent 'Dax' Royal-Gordon wrote:

> Dan Sugalski wrote:
>> 3) Parrot itself (the main executable) has a static, global 1K buffer
>> in it that starts and ends with some recognizable string (like, say,
>> "***+++***START|" and "|END***+++***") so we can find it and
>> overwrite the contents if the library gets moved, for use on
>> platforms where the only way to put a path in is to stick it
>> statically in the executable.
>
> That's pretty disgusting, but I don't know that I have a better idea.

It's yucky, but it matches what's done for dynamic libs, at least on
some platforms. (That is, at build-time a library gets its
path-where-I'll-be-installed compiled into it, and apps linked against
that lib copy that path into themselves, so that at runtime the dynamic
linker searches that location, in addition to standard locations, to
find the library. And, there's then a tool which lets you modify you
library to change its built-in install location, without re-compiling.)
So that least there's precedent.

>> #3, I should point out, will *only* be used on those platforms that
>> don't have a better scheme, and only by the
>> Parrot_get_base_library_path() function.
>
> System registry on Windows? /etc file on Unixen?
>
> Actually, one thing I'd like to see is if it wasn't the library's base
> path hardcoded in, but the base path of a frozen data structure or
> program that encoded Parrot's settings. That would allow it to carry
> the runtime library path, the paths to ICU's tables, the paths to
> search
> for PMCs, and whatever else we can think of, without a hardcoded limit.

The idea (for me, at least) was to specify a directory, and the config
file could be a conventional name relative to that--that lets you
locate multiple resources without having do read on the config file in
order to find them. And semantically, I think of it not as the
executable's path--that just happens to be something that's 1:1 with a
particular copy of parrot. And definitely not libparrot's
path--embedded cases would have to specify the path explicitly, though
they could partially mimic the same scheme.

JEff

Dan Sugalski

unread,
Apr 15, 2004, 1:48:17 PM4/15/04
to Jeff Clites, perl6-i...@perl.org
At 10:23 AM -0700 4/15/04, Jeff Clites wrote:
>On Apr 15, 2004, at 9:36 AM, Dan Sugalski wrote:
>
>>At 8:54 AM -0700 4/15/04, Jeff Clites wrote:
>>>On Apr 15, 2004, at 8:41 AM, Dan Sugalski wrote:
>>>
>>>>At 8:35 AM -0700 4/15/04, Jeff Clites wrote:
>>>>>On Apr 15, 2004, at 7:24 AM, Dan Sugalski wrote:
>>>>>
>>>>>>Sound sane? I can see splitting up the library base path into
>>>>>>sections, but I'm not sure it's worth it. Now'd be the time to
>>>>>>argue that, though :)
>>>>>
>>>>>Makes sense to me to just store the path--keep it simple.
>>>>
>>>>That's what I'm thinking, but I can see wanting to have separate
>>>>paths for parrot's low-level libraries (basically the things we
>>>>need for parrot to run in the first place) and higher-level
>>>>libraries (modules installed off of CPAN and whatnot).
>>>
>>>That's true. But as long as we grab the "here's where the
>>>executable is", we can (later) build API on top of that if we want.
>>
>>Well, yeah, but... where the executable is ought, honestly, to be irrelevant.
>
>Yes, in a sense it's irrelevant, but it's the only thing that's 1:1
>with a particular "copy" of parrot. It's the only thing (that I can
>think of) which continues to work if you move your distro around,
>and which naturally avoids problems with having multiple copies, and
>lets things work even if you don't "install".

At this point I can say I don't honestly care all that much, and most
of my worries are based on vague feelings that there are platforms
out there where finding the actual executable name is somewhere
between hard and impossible. I will, then, do the sensible thing and
just punt on this--we can work out a best practices thing and
enshrine it as the default on systems which can support it and be
done with it.

The other question, then, is do we see the need for multiple
categories of library which would want separately settable library
paths? (Don't, here, forget the potential needs of embedders such as
Apache) Once we get that thumped out I'll make the API additions.

Jeff Clites

unread,
Apr 15, 2004, 1:42:49 PM4/15/04
to Dan Sugalski, Brent 'Dax' Royal-Gordon, Perl 6 Internals
On Apr 15, 2004, at 9:41 AM, Dan Sugalski wrote:

>> Actually, one thing I'd like to see is if it wasn't the library's base
>> path hardcoded in, but the base path of a frozen data structure or
>> program that encoded Parrot's settings. That would allow it to carry
>> the runtime library path, the paths to ICU's tables, the paths to
>> search
>> for PMCs, and whatever else we can think of, without a hardcoded
>> limit.
>
> This wouldn't be a bad thing, nope. I could see security issues--it'd
> probably be better to link the config file right into parrot.

There'll be the same security issue with anything located on the
filesystem--the config is not particularly worse than anything else
(DLLs, etc.). The security of anything you run is only as good as the
integrity of the filesystem used to locate the resources.
(Specifically, if I were a hacker and could compromise your system by
replacing the config, I just as easily replace parrot itself.) But it
would be nice to "bake in" things which you can't really change without
rebuilding anyway--thinks like UINTVAL size, etc. Monkeying with them
after-the-fact would be a definite security risk (buffer overruns,
etc.), and wouldn't ever be useful. But stuff like finding ICU's data
files (or add-on libraries) we'd want to be easily changeable without a
rebuild. (And again, if you have to rebuild to change them, then people
will tend to keep around the tools needed to do that, which would give
a hacker the tools they need to do the same.) But we certainly need to
define/articulate a security model, no matter what approach we take.
(But my gut reaction is always against something which decreases
flexibility, and only _seems_ to increase security.)

But there of course are security issues with anything located relative
the the cwd(). (That is, if resources are located relative to the cwd,
then I can trick you into loading my copies by taking you into
chdir-ing into my home directory.)

JEff

Jeff Clites

unread,
Apr 15, 2004, 1:54:45 PM4/15/04
to Brent 'Dax' Royal-Gordon, Perl 6 Internals
On Apr 15, 2004, at 10:30 AM, Jeff Clites wrote:

> And semantically, I think of it not as the executable's path--that
> just happens to be something that's 1:1 with a particular copy of
> parrot. And definitely not libparrot's path--embedded cases would have
> to specify the path explicitly, though they could partially mimic the
> same scheme.

I take that back--the path to the library might actually work just as
well (and may or may not be less ambiguous to find; the dynamic linker
had to find it, and may have left breadcrumbs). This is all, by the
way, exactly the NSBundle/CFBundle API from Mac OS X (and before that,
OpenStep). See:
<http://developer.apple.com/documentation/Cocoa/Reference/Foundation/
ObjC_classic/Classes/NSBundle.html>.

JEff

Jeff Clites

unread,
Apr 15, 2004, 9:58:22 PM4/15/04
to Dan Sugalski, perl6-i...@perl.org
On Apr 15, 2004, at 10:48 AM, Dan Sugalski wrote:

> At this point I can say I don't honestly care all that much, and most
> of my worries are based on vague feelings that there are platforms out
> there where finding the actual executable name is somewhere between
> hard and impossible. I will, then, do the sensible thing and just punt
> on this--we can work out a best practices thing and enshrine it as the
> default on systems which can support it and be done with it.

I think it's worth trying out--if it works out, we can build on it; if
it doesn't, we can rip it out/redo it. (And, the API could probably
stay the same.)

> The other question, then, is do we see the need for multiple
> categories of library which would want separately settable library
> paths? (Don't, here, forget the potential needs of embedders such as
> Apache) Once we get that thumped out I'll make the API additions.

We should probably start simple and build, but this would make sense to
me (API names are just suggestions):

Parrot_get_configuration_base_path() -- returns the automagically
determined path, unless the corresponding
Parrot_set_configuration_base_path() had been called to set it to
something else.

We could then have individual API to pick out specific resources based
on that, but instead, this would be cleaner/simpler:

Parrot_get_path_for_resource(STRING *resource_name) -- returns the
equivalent of Parrot_get_configuration_base_path()."/".resource_name,
unless you had called Parrot_set_path_for_resource(STRING
*resource_name, STRING *path) to set the path for this particular
resource to something else. Internally, this could special case certain
resources, if needed.

This setup let's us have a stable API, but over time add to the list of
things we would look up.

So (assuming for the moment a default layout similar to what we current
have), in-core I can call
Parrot_get_path_for_resource("library/config.pimc") and
Parrot_get_path_for_resource("runtime/parrot/dynext") to locate these
resources, by default inside of the base dir. But if I want to have a
totally funky layout (in an embedding context, or just if I'm in a
weird mood), all I need to do is explicitly call the "set" method (from
setup code or from bytecode) to re-point where I find a particular
resource.

(So the logic for that could just be to do a hash lookup for any
explicitly set values, and fall back to simple concatenation if nothing
was in the hash.)

That would all be fairly simple, yet expandable.

JEff

Steve Fink

unread,
Apr 17, 2004, 2:29:34 AM4/17/04
to Jeff Clites, Jarkko Hietaniemi, perl6-i...@perl.org
On Apr-14, Jeff Clites wrote:
> For Unix platforms at least, you should be able to do this:
>
> executablePath = isAbsolute($0) ? dirname($0) : cwd().dirname($0)

Nope.

sub executablePath {
return dirname($0) if isAbsolute($0);
return cwd().dirname($0) if hasSlash($0);
foreach dir in $PATH {
return $dir if -x "$dir/$0";
}
return "bastard process";
}

which is why on Linux I give up on portability and say:

return readlink("/proc/self/exe");

(ok, to match that'd need to be dirname(readlink(...)))

Gordon Henriksen

unread,
Apr 17, 2004, 2:17:23 PM4/17/04
to Jeff Clites, perl6-i...@perl.org
On Thursday, April 15, 2004, at 02:25 , Jeff Clites wrote:

> For Unix platforms at least, you should be able to do this:
>
> executablePath = isAbsolute($0) ? dirname($0) : cwd().dirname($0)

That absolutely does not work, as already pointed out. Ths looks like a
reasonable reference implementation (LGPL), though:

http://www.opensource.apple.com/darwinsource/10.3/libiconv-9/libiconv/srclib/
progreloc.c

On Windows and Linux, it uses Win32 and /proc to provide a robust
implementation. Otherwise, it guesses by looking at $0 and $ENV{PATH}.

My guess is that there's a more reliable (and non-portable) way to do
this on Mac OS X, since Carbon applications need to reliably open the
resource fork of the executable.

Gordon Henriksen
mali...@mac.com

Gordon Henriksen

unread,
Apr 17, 2004, 2:49:52 PM4/17/04
to Jeff Clites, perl6-i...@perl.org

Ah! Indeed there is.

http://developer.apple.com/documentation/Carbon/Reference/Process_Manager/
index.html

And, indeed, it is witheringly non-portable.

CFDictionaryRef dict = ProcessInformationCopyDictionary(
kCurrentProcess, kProcessDictionaryIncludeAllInformationMask);
CFString cfPath = (CFString *) CFDictionaryGetValue(dict,
kIOBundleExecutableKey);
CFIndex length = CFStringGetMaximumSizeForEncoding(cfPath,
kCFEncodingUTF8);
char *path = (char *) malloc(length + 1);
CFStringGetCString(cfPath, path, length + 1, kCFEncodingUTF8);
CFRelease(dict);

Ahem.

I'm sure the ProcessInformationCopyDictionary API is implemented in
terms of something sane at the Darwin level, but God only knows what it
is.

Gordon Henriksen
mali...@mac.com

Gordon Henriksen

unread,
Apr 17, 2004, 10:35:31 PM4/17/04
to Dan Sugalski, Jeff Clites, perl6-i...@perl.org
On Thursday, April 15, 2004, at 01:48 , Dan Sugalski wrote:

> At this point I can say I don't honestly care all that much, and most
> of my worries are based on vague feelings that there are platforms out
> there where finding the actual executable name is somewhere between
> hard and impossible. I will, then, do the sensible thing and just punt
> on this--we can work out a best practices thing and enshrine it as the
> default on systems which can support it and be done with it.
>
> The other question, then, is do we see the need for multiple categories
> of library which would want separately settable library paths?

Wouldn't it be sensible to build something robust enough to also solve
the problems of finding parrot user libraries and user resources? In
which case, a static search path is decidedly retro. It would hardly
make sense to not include, at the front of the search path, directories
relative to the PBC file trying to find its libraries or resources.*

For finding resources, one doesn't generally want to fall back to system
paths. Finding libraries is another matter.

Then there's the mention of using URLs to load resources (e.g., over
HTTP). Which seems sensible and forward-thinking to me.

Which suggests to me a linked list of resource resolvers. First one in
the chain to return a file handle to the data or PBC wins. The head of
parrot's own "system" chain would be available to be appended to any
other chains that wanted it.

Gordon Henriksen
mali...@mac.com

(* - The directory containing every loaded PBC file is not at all
important; consider an application like Apache+mod_parrot which is
loading multiple independent PBC files. It would be useful to allow the
administrator to install both the production PBC in addition to a
development release of the same application on the same web server [just
at different paths], with confidence that mod_parrot won't get the two
confused. [IIS can do this. It's very cool.])

Gordon Henriksen

unread,
Apr 17, 2004, 9:18:16 PM4/17/04
to Perl 6 Internals
Dan Sugalski wrote:

> Brent 'Dax' Royal-Gordon wrote:
>
>> Dan Sugalski wrote:
>>
>>> 3) Parrot itself (the main executable) has a static, global 1K buffer
>>> in it that starts and ends with some recognizable string (like, say,
>>> "***+++***START|" and "|END***+++***") so we can find it and
>>> overwrite the contents if the library gets moved, for use on
>>> platforms where the only way to put a path in is to stick it
>>> statically in the executable.
>>
>> That's pretty disgusting, but I don't know that I have a better idea.
>
> There isn't one, alas, at least for some people.

Everyone running tripwire, et al. (or simply md5sum'ing files to verify
integrity) will just love this strategy to death.


Finding resource and library files relative to the binary really is a
very good strategy. Windows is adopting the placed-near-the-binary
strategy for locating resources and libraries. It has completely
eliminated "DLL hell" for .NET programs. Mac OS 7 through X have all
used the same strategy. They have never had major problems with library
or resource location. Looks like a strong precedent and a proven
technique.

Of course, one can find pathological cases—especially on Unix, which
seems designed to thwart this sort of easy-to-administer technology:

• parrot binary unlink'd between exec and main(). (Can't happen on
Windows.)
• Launched through a symlink to the binary.
• Launched through a hard link to the binary.
• bin/ is a symlink, so ../share won't work.
• Platform can't find the binary. (Can't happen on Windows, Linux,
or Mac OS X.)
• chroot (which, in general, near-the-binary solves rather than
complicates).

But I'd say these are all are heavily outweighed by the advantages. And,
in any case, it's a trivial matter at this point in design to offer
support for replacing a call to Parrot_get_bin_path() (or whatever) with
"/usr/local/bin" at configure time. That resolves all of the above. With
a loss of functionality, true, but: Users on platforms which can't
support this feature won't after all expect /opt/parrot to work after it
was mv'd.


As for the security concerns of trusting anything but one's current
binary*, parrot could adopt a cryptographic solution for verifying
integrity of resource files, if anybody's really all that worried about
an errant Unicode character database.

Gordon Henriksen
mali...@mac.com

* - Is the binary itself is really all that trustworthy in the first
place? If a user is executing a program through an untrusted or
compromised path, they're already putting their life in their hands, and
accessing ${bin}/../share won't make the configuration any more
trustworthy.

Gordon Henriksen

unread,
Apr 17, 2004, 10:40:02 PM4/17/04
to Dan Sugalski, Jeff Clites, perl6-i...@perl.org
On Saturday, April 17, 2004, at 10:35 , Gordon Henriksen wrote:

> Which suggests to me a linked list of resource resolvers. First one in
> the chain to return a file handle to the data or PBC wins. The head of
> parrot's own "system" chain would be available to be appended to any
> other chains that wanted it.

And the more I mull this over, the more I really come up with maybe 4
slots in the search chain which are logically important. The order is up
for debate, but they all need to be in there (whenever they apply, that
is).

1. Paths relative to the PBC binary which is searching for a library.
2. Paths relative to the embedding application.
3. Paths relative to parrot itself (be that libparrot.shlib or parrot).
4. Paths to "system" libraries as specified by the administrator.

When searching for resources, only #1 should be used. Here are some
examples:


PBC File: (whatever)
Host app: /usr/local/bin/parrot
Parrot: /usr/local/lib/libparrot.shlib
Consider searches for:
icu.dat
Search path:
1. /usr/local/shared # Relative to executable

PBC File: D:\inetpub\wwwroot\example.pbchtml
Host app: C:\Apache\libexec\httpd.exe
Parrot: C:\Parrot\lib
Consider searches for:
icu.dat
mod_parrot.pbc
My::WWWUtil.pbc
Time::HiRes.pbc
Search path:
D:\inetpub\wwwroot{,\lib,\..\lib} # Relative to PBC
C:\Apache\libexec{,\lib,\..\lib} # Relative to host app
C:\CPAN\lib # System libraries
C:\Parrot\lib # Relative to parrot

PBC File: ./bin/fib
Host app: /home/me/bin/parrot
Parrot: /home/me/bin/parrot
Consider searches for:
icu.dat
Time::HiRes.pbc
fib.parrot_resource_file
One possible search path:
./bin/{,/lib,/../lib} # Relative to PBC
/usr/local/lib # System libraries
/home/me/lib # Relative to parrot

The scenario which gives me a little bit of heartburn is one like this,
though:

Consider, say, an e-commerce site package. Call it OneStep::ECS. Runs
under mod_parrot in Apache. Has hooks to load plugins:

• Third-party plugins to provide connectivity to payment processing
engines (call it OneStep::VeriSignPayflow.pbc).
• First-party plugins allowing the customer to integrate his
storefront with his database (call it
MySite::OneStepECSCustomizations.pbc).

Now consider searches for VeriSign::PayflowPro.pbc, PayFlowPro.dll,
MySite::CRM.pbc, MySite::Reporting.pbc, mysite_logo.png,
Time::HiRes.pbc, libparrot.pbc, CGI.pbc....

So maybe some libraries are "hosts" and need to be included in the
search paths of libraries which are linked to them. One could even look
at libparrot that way, in which case the search path model becomes:

Paths relative to this PBC file.
Paths relative to its hosts.
Paths relative to its hosts' hosts.
Paths relative to its hosts' hosts' hosts.
...
Paths configured by the system administrator.

Gordon Henriksen
mali...@mac.com

Jeff Clites

unread,
Apr 19, 2004, 5:56:02 AM4/19/04
to Gordon Henriksen, Perl 6 Internals
On Apr 17, 2004, at 6:18 PM, Gordon Henriksen wrote:

> Dan Sugalski wrote:
>
>> Brent 'Dax' Royal-Gordon wrote:
>>
>>> Dan Sugalski wrote:
>>>
>>>> 3) Parrot itself (the main executable) has a static, global 1K
>>>> buffer in it that starts and ends with some recognizable string
>>>> (like, say, "***+++***START|" and "|END***+++***") so we can find
>>>> it and overwrite the contents if the library gets moved, for use on
>>>> platforms where the only way to put a path in is to stick it
>>>> statically in the executable.
>>>
>>> That's pretty disgusting, but I don't know that I have a better idea.
>>
>> There isn't one, alas, at least for some people.
>
> Everyone running tripwire, et al. (or simply md5sum'ing files to
> verify integrity) will just love this strategy to death.

It would be no different than what would happen if you had to rebuild
to change the built-in path. This just has the advantage of not
requiring a compiler, and the source code.

> Of course, one can find pathological cases—especially on Unix, which
> seems designed to thwart this sort of easy-to-administer technology:
>
> • parrot binary unlink'd between exec and main(). (Can't happen on
> Windows.)
> • Launched through a symlink to the binary.
> • Launched through a hard link to the binary.
> • bin/ is a symlink, so ../share won't work.
> • Platform can't find the binary. (Can't happen on Windows, Linux, or
> Mac OS X.)
> • chroot (which, in general, near-the-binary solves rather than
> complicates).

Pick any strategy, and there will be an opportunity to thwart it.
Launch the binary, and then force-unmount the filesystem containing it
and all of its resources. That would thwart any strategy with external
resources. The point here (which you're probably agreeing with) is to
provide a solution that gives people flexibility they can use, not a
solution that will work if they are actively trying to trip it up.

> As for the security concerns of trusting anything but one's current
> binary*, parrot could adopt a cryptographic solution for verifying
> integrity of resource files, if anybody's really all that worried
> about an errant Unicode character database.

It's really no different that loading an external Perl module today, as
I see it.

> * - Is the binary itself is really all that trustworthy in the first
> place? If a user is executing a program through an untrusted or
> compromised path, they're already putting their life in their hands,
> and accessing ${bin}/../share won't make the configuration any more
> trustworthy.

Exactly. The contents of the filesystem are either secure, or they're
not.

JEff

Dan Sugalski

unread,
Apr 19, 2004, 10:09:26 AM4/19/04
to Gordon Henriksen, Perl 6 Internals
At 9:18 PM -0400 4/17/04, Gordon Henriksen wrote:
>Dan Sugalski wrote:
>
>>Brent 'Dax' Royal-Gordon wrote:
>>
>>>Dan Sugalski wrote:
>>>
>>>>3) Parrot itself (the main executable) has a static, global 1K
>>>>buffer in it that starts and ends with some recognizable string
>>>>(like, say, "***+++***START|" and "|END***+++***") so we can find
>>>>it and overwrite the contents if the library gets moved, for use
>>>>on platforms where the only way to put a path in is to stick it
>>>>statically in the executable.
>>>
>>>That's pretty disgusting, but I don't know that I have a better idea.
>>
>>There isn't one, alas, at least for some people.
>
>Everyone running tripwire, et al. (or simply md5sum'ing files to
>verify integrity) will just love this strategy to death.

No, not really. This only gets done once, when the package is installed.

>Finding resource and library files relative to the binary really is
>a very good strategy.

I'm not saying it isn't, just that it's not possible on some systems.
Granted, fairly old one generally, but...

0 new messages