Using as backing store for NixOS?

29 views
Skip to first unread message

Wout Mertens

unread,
Apr 18, 2022, 6:17:54 AM4/18/22
to bup-...@googlegroups.com
Hi,

I'm wondering if bup could be used to handle /nix/store of the https://nixos.org distro

Background:

/nix/store stores all the packages and config files for the distro in hashed read-only paths like /nix/store/somehash-python-3.6.5/. You can have many different python-3.6.5 that way, when they all get built with different configurations. The same build configuration will give you the same path.

Each of those store paths depends on zero or more other store paths, and together they form a closures, the full set of paths needed to "install" a given store path.

Installing a package is just the act of copying the closure in place. Any paths that are already in the store can be skipped when copying, thanks to their hash.


So my questions are:

I suspect bup would be a great fit for storing this, and I wonder if git fetch could be used to grab closures efficiently.

I think that for each package I would create a backup of its closure, and name it the same as its store path.
Do I understand correctly that this will create a git branch with the packed data?

So suppose I want to get a certain python hash and I already have an older version, I can run git fetch for just that branch, and git will then only download the differing data between the old and new python closures?

Then I can mount my backing store with fuse on /nix/store, and the newly downloaded python path will be available there?

Thanks,

Wout.

Johannes Berg

unread,
Apr 19, 2022, 4:01:20 PM4/19/22
to Wout Mertens, bup-...@googlegroups.com
On Mon, 2022-04-18 at 12:17 +0200, Wout Mertens wrote:
> Hi,
>
> I'm wondering if bup could be used to handle /nix/store of the
> https://nixos.org distro

I guess? But what would you want to achieve with it?

It'd be slow to access, for one, since everything larger than a few KiB
(depending on how you tune it, though current upstream bup cannot be
tuned at all in this, I have patches) will be split out into multiple
objects ...

[snip nix description]

> So my questions are:
>
> I suspect bup would be a great fit for storing this, and I wonder if
> git fetch could be used to grab closures efficiently.

FSVO efficiently, I guess? Are you looking at network transfer speed? I
think nix currently (mostly?) uses xz, while git is (currently?) limited
to gzip for compression. So any gains you'd probably have would be from
deduplicating files in similar/related closures, say files that didn't
change from one version to the next?

> I think that for each package I would create a backup of its closure,
> and name it the same as its store path.
> Do I understand correctly that this will create a git branch with the
> packed data?

Yes.

> So suppose I want to get a certain python hash and I already have an
> older version, I can run git fetch for just that branch, and git will
> then only download the differing data between the old and new python
> closures?

Yes, but not sure how much that would really save.

> Then I can mount my backing store with fuse on /nix/store, and the
> newly downloaded python path will be available there?

Yes. Though there's always the 'latest' symlink inside, so to be able to
use it you'd need to either change bup or do some symlink farm for
/nix/store.

As far as deduplicating goes ... I just tried with bup and default
settings on

66M /nix/store/drr8qcgiccfc5by09r5zc30flgwh1mbx-python3-3.7.5
56M /nix/store/66fbv9mmx1j4hrn9y06kcp73c3yb196r-python3-3.8.9
56M /nix/store/qjmp7jvig1xq8sm424nahvi4km9xwwll-python3-3.8.9
93M /nix/store/k0z9n599k02hab8qjjp3ljw065iwjcvg-python3-3.9.6
93M /nix/store/rjx617di0rrq9lpa9pmz0sz6dqhhhzkv-python3-3.9.6
68M /nix/store/s9xmbxlqxfrlxmr5nwhibj64fnzhix5i-python3-3.9.6

(all python versions I currently had), and I got

21M /tmp/test/objects/pack/pack-704448f1094f0263459a654b3bd036a86a1ff35b.pack
18M /tmp/test/objects/pack/pack-2aaf50ab98895748a1f0d276301ac5ce80a098f2.pack
3.9M /tmp/test/objects/pack/pack-3fb7b2f228fdf94a0bacfb43ac937945bd75c05a.pack
38M /tmp/test/objects/pack/pack-4f2ddc835df6e0156e6e00bf85ee8acd93f2e07b.pack
8.4M /tmp/test/objects/pack/pack-bcaf2c8fc0e4f426a13c8ae4b303aa398bfbe31c.pack
9.9M /tmp/test/objects/pack/pack-abddf072011d33ef9ddf0a75ce2f79f063758648.pack

created for saving them in the listed order, so 

drr8qcgiccfc5by09r5zc30flgwh1mbx-python3-3.7.5
took 21M dedup'ed & compressed (default settings),

66fbv9mmx1j4hrn9y06kcp73c3yb196r-python3-3.8.9
added 18M

qjmp7jvig1xq8sm424nahvi4km9xwwll-python3-3.8.9
added only 3.9M

etc.


So yeah, I guess you'd save some space, but probably with a massive
trade-off in access speed...

johannes

Stefan Monnier

unread,
Apr 19, 2022, 4:07:33 PM4/19/22
to Johannes Berg, Wout Mertens, bup-...@googlegroups.com
> created for saving them in the listed order, so 
>
> drr8qcgiccfc5by09r5zc30flgwh1mbx-python3-3.7.5
> took 21M dedup'ed & compressed (default settings),

Side note: I thought `bup`, contrary to `git`, did not compress the data
in the pack files. Was I confused?


Stefan

Johannes Berg

unread,
Apr 19, 2022, 4:08:25 PM4/19/22
to Stefan Monnier, Wout Mertens, bup-...@googlegroups.com
It does compress using zlib (I forget what the default settings are),
but not using delta "compression".

johannes

Stefan Monnier

unread,
Apr 19, 2022, 4:11:48 PM4/19/22
to Johannes Berg, Wout Mertens, bup-...@googlegroups.com
Johannes Berg [2022-04-19 22:08:21] wrote:
> It does compress using zlib (I forget what the default settings are),
> but not using delta "compression".

Aahhh... makes sense, thank you.


Stefan

Wout Mertens

unread,
Apr 19, 2022, 5:13:05 PM4/19/22
to Johannes Berg, bup-...@googlegroups.com
Thanks for your answers!

On Tue, Apr 19, 2022 at 10:01 PM Johannes Berg <joha...@sipsolutions.net> wrote:
On Mon, 2022-04-18 at 12:17 +0200, Wout Mertens wrote:
> Hi,
>
> I'm wondering if bup could be used to handle /nix/store of the
> https://nixos.org distro

I guess? But what would you want to achieve with it?

It'd be slow to access, for one, since everything larger than a few KiB
(depending on how you tune it, though current upstream bup cannot be
tuned at all in this, I have patches) will be split out into multiple
objects ...

I'd like to optimize for disk usage of many system generations, and network efficiency when upgrading. If the access times via fuse turn out to be terrible, I'm sure there's optimization possibilities.
 
Due to the nature of NixOS there are typically many very similar files stored, and as you showed, rollinghash results in substantial deduplication.

> So my questions are:
>
> I suspect bup would be a great fit for storing this, and I wonder if
> git fetch could be used to grab closures efficiently.

FSVO efficiently, I guess? Are you looking at network transfer speed? I
think nix currently (mostly?) uses xz, while git is (currently?) limited
to gzip for compression. So any gains you'd probably have would be from
deduplicating files in similar/related closures, say files that didn't
change from one version to the next?

Not sure what you mean with FSVO, but yes. As you showed, if I already have 66fbv9mmx1j4hrn9y06kcp73c3yb196r-python3-3.8.9
then getting qjmp7jvig1xq8sm424nahvi4km9xwwll-python3-3.8.9 from the binary cache would only require downloading 3.9MB and storing 3.9MB instead of downloading 21MB and storing 68MB.
 
So yeah, I guess you'd save some space, but probably with a massive
trade-off in access speed...

Thank you for checking! So my expectation is that initial load times for system files will be a bit longer but since that's normally a fraction of used system memory and file system I/O, it might not have much impact?

I'd also love to see this combined with the output hashing efforts that make the hashes self-verifying. That way all you need to know is that a certain hash represents the system you want and then you can download it from anywhere without worrying. You could then make an initrd that gets passed that hash, downloads it from some cache and boots from it. Great for cloud servers.

Cheers,

Wout.

Stefan Monnier

unread,
Apr 19, 2022, 5:29:17 PM4/19/22
to Wout Mertens, Johannes Berg, bup-...@googlegroups.com
> I'd like to optimize for disk usage of many system generations, and network
> efficiency when upgrading. If the access times via fuse turn out to be
> terrible,

Access times will likely prove problematic (e.g. for boot up and
application startup times) if /nix holds the "main local copy" of system
executables (which I assume it does). You'll probably also encounter
some bootstrap issues if that's the case (`bup fuse` will need a Python
environment to work but it won't be able to use the /nix store for that
obviously).

I wouldn't be surprised to hear that this approach suffers from other
problems (e.g. I don't know if `bup fuse` supports mmap'd files).

> I'm sure there's optimization possibilities.

It seems like an ideal candidate for CacheFS ;-)

> Due to the nature of NixOS there are typically many very similar files
> stored, and as you showed, rollinghash results in
> substantial deduplication.

You could even ask Git to repack so as to take advantage of delta
compression (not sure how much that would gain you, tho), since Bup
seems perfectly able to use such packs even it doesn't generate
them itself.

> Not sure what you mean with FSVO, but yes.

Presumably "for some value of".


Stefan

Johannes Berg

unread,
Apr 20, 2022, 3:10:22 AM4/20/22
to Stefan Monnier, Wout Mertens, bup-...@googlegroups.com
On Tue, 2022-04-19 at 17:29 -0400, Stefan Monnier wrote:
>
> Access times will likely prove problematic (e.g. for boot up and
> application startup times) if /nix holds the "main local copy" of
> system
> executables (which I assume it does). You'll probably also encounter
> some bootstrap issues if that's the case (`bup fuse` will need a
> Python
> environment to work but it won't be able to use the /nix store for
> that
> obviously).

It also needs git to read the repo (I have a non-git read path, but it
performs even worse :) )

But I suppose that could be arranged.

> I wouldn't be surprised to hear that this approach suffers from other
> problems (e.g. I don't know if `bup fuse` supports mmap'd files).
>
> > I'm sure there's optimization possibilities.
>
> It seems like an ideal candidate for CacheFS ;-)

That might not be the worst idea, actually - instead of 'bup fuse' make
it a network filesystem, and use local caching?

> > Due to the nature of NixOS there are typically many very similar
> > files
> > stored, and as you showed, rollinghash results in
> > substantial deduplication.
>
> You could even ask Git to repack so as to take advantage of delta
> compression (not sure how much that would gain you, tho), since Bup
> seems perfectly able to use such packs even it doesn't generate
> them itself.

Indeed, it uses git to read, so that's not an issue (well, unless you
use that built-in read code I wrote, which doesn't deal with deltas),
but the way git repacks probably won't result in significant savings by
delta compression.


So I don't know, I can't really see it working well from a performance
POV, I'd think that would need some serious thought into caching, and
then you need more disk space again.

You can probably do better by using btrfs deduplication, even if bup has
better granularity. Maybe using bup (with a larger hashsplit/rolling
hash block size) as the transfer and to detect duplicate extents, and
then telling the filesystem about it?

johannes

Alek Paunov

unread,
Apr 20, 2022, 5:02:19 PM4/20/22
to Wout Mertens, bup-...@googlegroups.com
On 4/18/22 13:17, Wout Mertens wrote:
>
> I'm wondering if bup could be used to handle /nix/store of the
> https://nixos.org distro
>

At least, it seams that there is a trend in the storing the things the
"bup way" (CAS over rolling hash produced chunks), targeting versioning
and immutability:
* Noms - OSS database based on the "prolly tree" b-tree variant [1][2],
used for the storage layer of Dolt/DoltHub/DoltLab.
* Relational.ai - Probably the next major player in the Business
apps/ML in the future (the technical people from LogicBlox).
Unfortunately, not all key components of their cloud DB design are
OSS [3], but at least from their presentations [4], the DB have the
same Dolt-like CAS/versioning properties.

You are deep into SQLite already [5] :-), might be a bup support for
libgit2 with sqlite storage backend + additional tables (e.g. chunk
offset indexes) will solve some of the bup VFS/fuse efficiency issues,
cited in the thread (offloading most of the computation and I/O to a
SQLite procedures [6])?

Dreaming ahead, such design may serve as a base for alternative bup
storage driver for podman and friends [7] (current out-of-the-box are
overlayfs and devmapper, user (rootless) containers are on FUSE
overlayfs anyway) and the long-standing bup feature request for fast
"incremental" (without full scan) bup snapshot-ing of large things like
DBs and VMs based on changes tracking sources.

Kind regards,
Alek

[1] https://www.dolthub.com/blog/2020-04-01-how-dolt-stores-table-data/
[2] https://github.com/attic-labs/noms/blob/master/doc/intro.md
[3] https://github.com/relationalai-oss
[4] https://www.youtube.com/watch?v=WRHy7M30mM4
[5] https://github.com/StratoKit/strato-db
[6] https://github.com/facebookincubator/CG-SQL
[7] https://github.com/containers/

P.S. Related to the question (outside of the NixOS context) is the
os-tree project [8] also based on git-like repo, which is key building
block in e.g. Fedora CoreOS and few immutable Workstation projects of
Fedora and RedHat, but I personally would prefer to use more flexible
and compact bup based solution for mass OS tree version control.

[8] https://ostreedev.github.io/ostree/repo/

Wout Mertens

unread,
Apr 24, 2022, 6:20:50 PM4/24/22
to bup-list
Hi Alek, 

thank you for the thorough overview! Reading your answer triggered me into casting a wider net and now I realize that casync is actually meant for this use case, so I'll check that out first.

As for your observations, indeed it seems that variable-size chunks are getting more popular, I am guessing that SSD storage and faster CPUs are to thank for that.

I think SQLite would indeed be great to store the chunk offset indexes of files. SQLite DBs can be single read-only files, so the DB can be embedded together with the data, and in fact a vfs could be written that reads the DB directly from a git object. Meaning it could be part of the backup, but to then also deduplicate the DB becomes a bit of a bootstrap problem.
 
Besides that, SQLite is in-process, which means that stored procedures are only an optimization inside tight loops inside queries (e.g. for function calls that calculate query values from row data), and calculations and I/O are only sped up if you manage multithreading right (just like in any application).
It also means that queries are very low lag because there's no network encode/decode step, so tons of queries is not a problem at all, yet another reason why stored procedures aren't all that necessary.

I'm now also wondering if bcachefs.org could get an extra translation layer to store variable-size chunked content-addressed storage directly. It has variable-size extents (meaning it can already figure our chunk offsets) but it doesn't have CAS.
I'd guess that if a CAS index could be maintained for the extents, and files can be split into arbitrary extents based on the rolling hash, that would be all the extra stuff needed to deduplicate, no?
(Sorry, veering pretty far from bup use case now)

Cheers,

Wout.
Reply all
Reply to author
Forward
0 new messages