Fewer packages / more flexibility, part 2. Variation tags

11 views
Skip to first unread message

Leo Gordon

unread,
Nov 18, 2019, 7:55:00 AM11/18/19
to Collective Knowledge Technology
Dear CK users,

There have been many cases where a slight variation in the way
a package is downloaded/configured/compiled caused us to clone
an existing package and create another one.

Everyone knows it's a bad idea when it comes to maintainability,
as for every possible future change we will have to re-trace
our cloning steps back and replicate that change in as many packages
as we have cloned.

So how do we strike a balance between reproducibility and maintainability?
Can we somehow pack several variations of the same package into one entry?


        --answer=yes


Let's say, for example, that we want to install a certain library:
    ck install package --tags=lib,armcl,viascons

This command clones a particular git repository and compiles the library from there.

Now imagine that we want to perform exactly the same installation procedure,
but start by getting the source code from a different fork of the repository.
We could do something like:

    ck install package --tags=lib,armcl,viascons --env.PACKAGE_URL=http://dev.site/armnn.git --extra_tags=dev --extra_path=dev


Both extra_tags and extra_path are necessary to allow the two installed
versions to co-exist. Workable? Definitely. Simple? Well, maybe...
But what happens if we have several different dimensions to vary?
Switching several independent features of the package on and off?
Keeping track of all possible combinations of environment variables, extra_tags and extra_paths
would be a nightmare. Unless we could automate it, of course!


Variations

Why not ask for extra features by packing all of the internals into the tags?
This is exactly what we did with variations. Let's have a look at the following example:

    ck pull repo:ck-armnn
    ck load ck-armnn:package:lib-armcl-viascons | less

You can see this package has a relatively small set of "obligatory" tags
["lib", "armcl", "viascons"]. It also has a long list of "variations".
Variations are optional tags that override portions of "install_env"
when selected. Adjustments to the tags and install paths is automated.

As a result, all you need to ask for is something like:

    ck install package --tags=lib,armcl,viascons,rel19.02,neon
or
    ck install package --tags=lib,armcl,viascons,dev,neon,opencl

What would you expect to happen when, say, both rel.19.02 and dev variations
are switched on?  Since they affect the same variables, and we cannot enforce
the order in a dictionary, that could lead to undefined behaviour...

Luckily, we have such cases covered now: CK automatically detects when
the user is trying to mix variations that attempt to assign different values
to the same variable, and won't let it happen.


Default variations

In order to minimize clutter, we prefer some variations to be ON by default.
We mark them with "on_by_default": "yes" and they will be automatically added
to the tag list (and affect both the environment and customization dictionaries)
ONLY IF they do not come in conflict with any of the explicitly listed variations.
Otherwise, they will be silently ignored.

A good example to study is `package:dataset-imagenet-preprocessed-using-opencv` ,
which performs imagenet preprocessing and caches the results in the form of a resolvable CK environment.
This package has many variations in several orthogonal groups:
1. "universal" vs "for-mobilenet" vs "for-resnet" (diferent types of pre-normalization available)
2. "audit.test03" vs "unmutilated" (whether to cancel the Blue channel data for an MLPerf audit test or leave it untouched)
3. "crop.875" vs "crop.1000" (whether to crop 87.5% from the center of the image or leave it untouched)
4. "first.20" vs "last.20" vs "full" (whether to only preprocess the first 20 examples of the set, the last 20, or take the full set)
5. "side.96", "side.128", "side.160", "side.192", "side.224" (target image resolution, assuming square shape)
6. "inter.linear" vs "inter.area" (different types of interpolation when performing resize operation)


Installing a package with variations - three ways

1. The most portable, reproducible and maintainable way is to skip the package name and bundle all the desired tags and variations together:

ck install package --tags=imagenet,preprocessed,using-opencv,full,side.160

Wherever a non-default variation is mentioned in the --tags= list, this variation displaces the corresponding default.
So, full displaces the default first.20, and side.160 displaces the default side.224.
All other default variations (universal, unmutilated, crop.875 and inter.linear) will automatically be added.

We recommend that this format is used when publishing or documenting a package (which includes Dockerfile and .travis.yml) .
This way package maintainers are free to move functionality around between CK package entries when necessary,
while keeping the command-level compatibility (In the past the same tag combinations could have invoked different packages,
which were later merged into the same entry with variations, while the calling command may be kept the same.) .


2. Of course we still support the format where the package name is mentioned explicitly:

ck install package:dataset-imagenet-preprocessed-using-opencv --tags=full,side.160

In this case we only need to add the non-default variations to the --tags= list.
This command also runs a bit faster, but lacks in portability/maintainability.
Recommended for development or experimentation.


3. We also support automatic package recognition based on the current directory, which can make your command even shorter, but more cryptic and context-dependent:

ck install package --tags=full,side.160


As you might have guessed, this is the fastest and the least portable way, but ok for package development or experimentation.


Happy haCKing!

Leo

Anton Lokhmotov

unread,
Nov 18, 2019, 11:47:22 AM11/18/19
to collective...@googlegroups.com
Thanks Leo!

To elaborate a bit on the savings, one package
(ck-mlperf:package:model-tf-and-tflite-mlperf-mobilenet-v1-20180802)
now covers 32 MobileNets-v1 models (4 multipliers, 4 resolutions,
quantized/non-quantized), and another one
(ck-mlperf:package:model-tf-and-tflite-mlperf-mobilenet-v2) - 22
MobileNets-v2 models.

As another example, one package (ck-armnn:package:lib-armnn) now
covers many different variants of ArmNN (several production releases
or development, TF/TFLite/ONNX frontends, Neon/OpenCL/ref backends).

IIRC, future work includes supporting variations for packages that
require patches (e.g. different patches for different versions of a
library). While we already have some ad-hoc support for that, we need
to think about a proper solution.

Anton.
> --
> You received this message because you are subscribed to the Google Groups "Collective Knowledge Technology" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to collective-knowl...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/collective-knowledge/66fa7250-4da7-47d1-a954-492d2a9e8784%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages