Heads up: Memory overflow while copying a 230GB file into upspin

52 views
Skip to first unread message

Filip Filmar

unread,
Mar 6, 2024, 2:52:01 PMMar 6
to Upspin
Hi folks.

This is just a heads up of an issue I observed recently. I have not done a full analysis yet, but wanted to let you know there's an issue, and perhaps someone has ideas before I get the forensics results. (This may take some time, since repro is lengthy.)

I tried to copy a 230GB file into upspin on a GCP Linux VM. I tried both `upspinfs` with `rsync`, and `upspin cp`, with the same results. The file gets transferred, but at the end, `upspinfs` (or `upspin` as the case may be) uses up all the system memory (64GB RAM, no swap) and gets killed by `oomkiller`.

This is what happens to `upspinfs-with-rsync`:

```
╰─>$ rsync --progress <redacted> $HOME/u/<redacted>
242,244,464,202 100% 40.18MB/s 1:35:50 (xfr#1, to-chk=0/1)
rsync: [receiver] close failed on "<redacted>": Software caused connection abort (103)
rsync error: error in file IO (code 11) at receiver.c(888) [receiver=3.2.7]
┬─[filmil@instance-3:~/b2]─[09:52:34]
╰─>$

┬─[filmil@instance-3:~/b2]─[09:48:18]
╰─>$ ./upspin/run.upspin.client.sh: line 13: 12539 Killed nice nohup upspinfs -allow_other -config=$HOME/upspin/<redacted> $HOME/u >> $HOME/run/upspin/nohup.out
```


Rodrigo Schio

unread,
Mar 6, 2024, 4:37:42 PMMar 6
to Upspin
Hi, I didn't dive deep into the problem, but the problem could be that the whole file is kept in memory:

```
// File is a simple implementation of upspin.File.
// It always keeps the whole file in memory under the assumption
// that it is encrypted and must be read and written atomically.
type File struct {
    name     upspin.PathName // Full path name.
    offset   int64           // File location for next read or write operation. Constrained to <= maxInt.
    writable bool            // File is writable (made with Create, not Open).
    closed   bool            // Whether the file has been closed, preventing further operations.


    // Used only by readers.
    config upspin.Config
    entry  *upspin.DirEntry
    size   int64
    bu     upspin.BlockUnpacker
    // Keep the most recently unpacked block around
    // in case a subsequent readAt starts at the same place.
    lastBlockIndex int
    lastBlockBytes []byte


    // Used only by writers.
    client upspin.Client // Client the File belongs to.
    data   []byte        // Contents of file.
}
```

Eric Grosse

unread,
Mar 6, 2024, 4:58:13 PMMar 6
to Rodrigo Schio, Upspin
Right, that was the original and, to me, simplest design. We did revise the underlying implementation to split the file into blocks that could be independently encrypted/authenticated but I'm not astonished if we missed comprehensively carrying that through all pieces of the code.

Without disputing that this ought to be fixed, I am interested in understanding some of the 230GB context, to the extent that can be described while respecting privacy. As an example, I have sometimes adopted the design of breaking big files into a Unix directory of smaller files. It just seemed to make things easier for editing or parallel processing or recovering from some errors or whatever. PDF and zip are familiar examples of doing it the other way: single big file with an index at the start allowing random access speedup. Maybe you have another illuminating format?

--
You received this message because you are subscribed to the Google Groups "Upspin" group.
To unsubscribe from this group and stop receiving emails from it, send an email to upspin+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/upspin/8195c969-bc94-4d4b-9fed-e91026a06a6fn%40googlegroups.com.

Filip Filmar

unread,
Mar 6, 2024, 5:12:06 PMMar 6
to Eric Grosse, Rodrigo Schio, Upspin
On Wed, Mar 6, 2024 at 1:58 PM Eric Grosse <gro...@gmail.com> wrote:
Without disputing that this ought to be fixed, I am interested in understanding some of the 230GB context, to the extent that can be described while respecting privacy.

The content is not confidential. It's a product of `docker save`, and is a .tar.gz of a Docker image. It is produced by https://github.com/filmil/vivado-docker. With Vivado being a monstrosity, I am not sure if the packaging could be made more palatable to upspin as-is. The whole reason I packaged it into a docker container is so that I wouldn't need to deal with its gazillion files and directories.

F

Filip Filmar

unread,
Mar 6, 2024, 5:36:12 PMMar 6
to Eric Grosse, Rodrigo Schio, Upspin
Hm, I suppose I could pre-split the archive before upload, and then reassemble it when needed. 

F

Eric Grosse

unread,
Mar 6, 2024, 6:22:14 PMMar 6
to Filip Filmar, Rodrigo Schio, Upspin
Thanks, perfectly reasonable. It is mostly a matter of personal taste
these days whether to "tar c; scp; tar x" or "scp -rp". In the old
days I did the former; nowadays I do the latter.

I'll look into what upspin changes we need to make to help you out.

Filip Filmar

unread,
Mar 6, 2024, 6:25:06 PMMar 6
to Eric Grosse, Rodrigo Schio, Upspin
On Wed, Mar 6, 2024 at 3:22 PM Eric Grosse <gro...@gmail.com> wrote:
Thanks, perfectly reasonable. It is mostly a matter of personal taste
these days whether to "tar c; scp; tar x" or "scp -rp". In the old
days I did the former; nowadays I do the latter.

Well, had I known what I know now, I might have done differently... :)

F
 

Albert-Jan de Vries

unread,
Mar 11, 2024, 4:37:56 AMMar 11
to Upspin
I had some memory issues with writing files, and made this (rough) change to the upspin client:

Op donderdag 7 maart 2024 om 00:25:06 UTC+1 schreef Filip Filmar:

Eric Grosse

unread,
Mar 16, 2024, 6:26:21 PMMar 16
to Albert-Jan de Vries, Upspin
When we adopted chunk encryption in upspin, I regret that we did not drop Client.Get and Put in favor of only the File interface. (We should have implemented buffering and streaming chunks in Write as well as Read there.)

But now I contemplate a streaming app S that uses the new style compared to an app A that extravagantly spends memory to accumulate all results in a byte slice until success and only then calls Put. Doesn't S tend to create more junk files in upspin? We're not great at garbage collection, so I wonder if there is a hidden cost.

Albert-Jan, since you have already implemented a form of this, I wonder if you have any experience on dealing with cleanup, or is it not an issue?

--
You received this message because you are subscribed to the Google Groups "Upspin" group.
To unsubscribe from this group and stop receiving emails from it, send an email to upspin+un...@googlegroups.com.

Albert-Jan de Vries

unread,
Mar 18, 2024, 7:05:14 AMMar 18
to Upspin
Hi Eric,

Thanks for your message. You're right. There is a hidden cost, and garbage collection for upspin services isn't great. I created a cronjob that runs the upspin audit commands to clean things up, and I'm working on a upspin store and dir server that is better at cleaning up the 'garbage' that is no longer needed.

I like the idea of the File interface (or maybe io.Reader and io.Writer is enough).

Regards,
-- AJ

Op zaterdag 16 maart 2024 om 23:26:21 UTC+1 schreef Eric Grosse:

Eric Grosse

unread,
Mar 18, 2024, 11:48:09 AMMar 18
to Albert-Jan de Vries, Upspin
Very glad to hear that; thanks!

"Correct" garbage collection of course requires ability to see all pointers. Upspin audit can do this in a context where there is one dominant owner. In the general global case where there are links in confidential directories to semi-public resources by other upspin users, this seems hard.

One possibility is a more shared version of audit that has a convention on how to publish lists of blocks in use. All readers of a storeserver would have to participate, at penalty of losing files.

Another idea, from the time of the original Plan 9 snapshot filesystem that Upspin builds on, is to have storage so cheap that garbage collection is not a pressing matter. It is never free, so we still want to avoid generating huge temporary files. (And hence bringing up the topic on this thread.)

I'm unclear whether our existing cloud storage does an adequate automatic job of distinguishing cold, very cheap, blocks from warmer ones with faster access and higher cost.

David Presotto

unread,
Mar 18, 2024, 12:14:04 PMMar 18
to Eric Grosse, Albert-Jan de Vries, Upspin
Garbage collection has always been our sore point.  While I'd love a world with infinite free storage, that doesn't exist.  I've started my storage from scratch a number of times to clean up.  In addition to the problem of unknown or unreadable directories servers, our dump directories tie down storage forever.  Unless you have a separate dump free dir and store server for temporary files, they're here forever if caught by a dump.



Reply all
Reply to author
Forward
0 new messages