finding data files in packages during development

39 views
Skip to first unread message

Jon Zeppieri

unread,
Mar 4, 2016, 4:58:16 PM3/4/16
to Racket Dev List
I'm working on a new version of the CLDR (localization data) packages, with the goal of reducing the amount of data that most installations will need.

The packages are broken up into:
cldr2: provides functions for accessing the data and resolving locale names
cldr2-data-core: provides the data from https://github.com/unicode-cldr/cldr-core
cldr2-data_<locale name>: one per locale, provides data from the rest of the unicode-cldr repos, but only for the locale in the package name

The cldr2 package depends on cldr2-data-core, but a user is free to install as many or as few locale-specific packages as desired.

The problem I've run into is how, at runtime, to find the data files I need in a way that works during development.

Each of the cldr2-data-* packages has a data archive at cldr2/data/json.zip from the package root directory. So, if someone wants data from the core package, we could do:

```
(define dir (pkg-directory "cldr2-data-core" #:cache PKG-CACHE))
... [raise an exception if the package isn't installed] ...
(define zip-path (build-path dir "cldr2" "data" "json.zip"))
... [open the archive and extract the relevant data] ...
```

And that's fine, except that it doesn't work in development when using raco link, because raco link manages collections, not packages. And I don't think that collection-file-path is useful here either, since all of these json.zip files have exactly the same collection-relative path.

What's the best way to handle this? Should I just give the zip files distinct names and use collection-file-path? Or is there a better way to handle this situation? (I'm a bit reluctant to use collection-file-path, since I think it searches the file system and so would be a bit expensive. pkg-directory needs to parse the package catalog, but it allows the results of that parse to be cached.)

-Jon




Robby Findler

unread,
Mar 4, 2016, 6:03:24 PM3/4/16
to Jon Zeppieri, Racket Dev List
I think that files in collections is the way to go. You don't have to use collection-file-path, you could search the collections yourself, but I'm not sure that it would be faster. Are you seeing something specific that has bad performance?

Robby


--
You received this message because you are subscribed to the Google Groups "Racket Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+...@googlegroups.com.
To post to this group, send email to racke...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/CAKfDxxxywSUuR0j%3DM1sgU3foezz_jt3%2BAChNEqCO3SuqYtffRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jon Zeppieri

unread,
Mar 4, 2016, 6:14:20 PM3/4/16
to Robby Findler, Racket Dev List
On Fri, Mar 4, 2016 at 6:03 PM, Robby Findler <ro...@eecs.northwestern.edu> wrote:
I think that files in collections is the way to go. You don't have to use collection-file-path, you could search the collections yourself, but I'm not sure that it would be faster. Are you seeing something specific that has bad performance?

Robby

No, I haven't actually implemented this yet. It's just that I imagine it has to work something like this:
- find all of the possible directories that represent the "cldr2" collection
- look in each for the given path until you find it

If you had hundreds of locales installed on your machine, this would seem to be a pretty expensive way to look for a single file, especially since it can only be provided by a single package. I could default to looking for the package directory and then fall back on the collection search, but I'd still need to verify that both actually work. Tests would use different code paths in dev and in prod, which isn't great.

Or, I guess I could keep the collection search small by using different sub-collections for each locale (and a special one for the core data). I'll need to look at the implementation of collection-file-path to see if that would make a difference.

-Jon

Matthew Flatt

unread,
Mar 4, 2016, 6:19:22 PM3/4/16
to Jon Zeppieri, Racket Dev List
Things that you want to access from a program should be based on
collection, based on packages. In principle, packages don't exist at
all at run-time --- and they really don't exist at run-time for a
program bundled with `raco exe`.

A good way to register extensions via the collection layer is to use a
new "info.rkt" field. Each package can supply a "cldr2/data" collection
directory, with an "info.rkt" file defining a field that lists the
provided data files. Then you can use `find-relevant-directories` to
find all the relevant directories (i.e., all the "info.rkt" files that
define your new field).

Another possibility is to use `copy-shared-files` in "info.rkt" to
instruct `raco setup` (and `raco pkg install`) to install files in the
"share" directory. In this case, I think the strategy that uses a new
"info.rkt" field is probably better.

One more piece of the puzzle: in the code that accesses the data files,
use `define-runtime-path-list` to build a list of all the currently
installed files. That way, `raco exe` will know to perform that
computation at build time and pull along the relevant files.

Jon Zeppieri

unread,
Mar 4, 2016, 6:33:57 PM3/4/16
to Matthew Flatt, Racket Dev List
This sounds like a good approach. Thanks, Matthew! -J

Jon Zeppieri

unread,
Mar 4, 2016, 6:51:45 PM3/4/16
to Matthew Flatt, Racket Dev List
One question about this, Matthew: should I ensure that all of the data files have distinct collection-relative paths, instead of making them all cldr2/data/json.zip?

Jon Zeppieri

unread,
Mar 4, 2016, 7:35:16 PM3/4/16
to Matthew Flatt, Racket Dev List
On Fri, Mar 4, 2016 at 6:51 PM, Jon Zeppieri <zepp...@gmail.com> wrote:
One question about this, Matthew: should I ensure that all of the data files have distinct collection-relative paths, instead of making them all cldr2/data/json.zip?

Oh, never mind -- obviously I need to give them different paths. -J
 

Matthew Flatt

unread,
Mar 4, 2016, 8:28:38 PM3/4/16
to Jon Zeppieri, Racket Dev List
I would have said that using the same collection-relative path is be
fine for non-module files. After all, the `find-relevant-directories`
function will give you directories, not collection paths, and you can
resolve any path in the an "info.rkt" relative to its directory.

But maybe I'm missing some other constraint that requires different
collection-relative paths.

Reply all
Reply to author
Forward
0 new messages