There's remark in TA1 of safeguarded AI that the worldmodels need version control.
TA1.2 Backend shall develop a professional-grade implementation of the Theory, yielding a distributed version control system for knowledge represented as mathematical world-models, as well as for specifications.
Given dependencies (which may have their own dependencies), a lockfile is a transitive closure that marks particular commits or tags (with hashes) for each version and each version in the transitive closure.
So `conda-lock.yml` (from python) is generated by SMT, but others (like `yarn.lock` from javascript) use simpler graph algorithms. The problem with these lockfiles is if package A depends on package C >15 and package B depends on package C <14, it's unclear which version of C you want in your project. In some cases you can just install two versions of C and use them in the two different contexts, but this can be finicky and doesn't always work.
`flake.lock` (from nixpkgs/nixos) doesn't have this problem, because dependencies are fully isolated. What's more, the specification file `flake.nix` allows granular control over the transitive closure, with "follows":
```
{
Inputs = {
Nixpkgs.url = "github:nixos/nixpkgs"
Otherdep.url = "github:me/somerepo"
Otherdep.inputs.nixpkgs.follows = "Nixpkgs"
...
```
Since every flake has a lockfile, by default Otherdep will have its own commit of nixpkgs. However, I can patch a dep's lockfile in the spec to force agreement. Furthermore, I tend not to actually need to, since when nix arranges symlinks to dependencies it keeps track enough to not have collisions.
I don't think `package.json` or `Pipfile` or `anaconda.yml` has this.
I think a flake-like lockfile system may be sufficient for a system like safeguarded ai.
Ozzie of QURI's counterargument is that commits are not granular enough for worldmodels that depend on continuous time data streams. If i make up an example that i think captures this concern, consider if one input to a worldmodel is inflation rate from a web socket stream, and another is a GET request to some wikidata repo that fires monthly. So unless you checkpoint in auxiliary code, you dont know what time t your inflation rate is from. it's unclear how an analyst can accomplish a reproducible model (like a harder version of fixing random seeds in data science). If you think there was some corruption at wikidata at time t-1, you can roll back to time t-2 given any lockfile setup, but it's unclear what a principled approach to the interest rate stream is if you think the interest rate stream was corrupted recently.
Track the development of squigglehub
https://squigglehub.org/ which I think is making a version control system soon ish.
One further reason git may not be great is that there may not be a natural notion of diff for worldmodels, or even specs. Squiggle code could serve as a bridge between the pedestrian's git diff and some more quantitative idea of diff, but this may be an unnecessary limitation. What are some ideas for quantitative diffs?