My Upspin is in a "broken state" after using `upspin audit`

50 views
Skip to first unread message

Rodrigo Schio

unread,
Jun 25, 2023, 7:11:19 PM6/25/23
to Upspin
My setup:
OS: linux
ARCH: amd64
server: upspinserver-gcp running in the Oracle Cloud (migrated few months ago from GCP).
storage: Google Cloud Storage.
mounted locally with upspinfs in $HOME/u.
cache: on.

A few days ago I wanted to delete some garbage from my Upspin to reduce the my Google Cloud
billing costs. Before doing anything I backedup my whole user root (rodrig...@gmail.com/)
to my computer. After the backup I used `upspin audit delete-garbage`, I executed `scan-dir`
before the `scan-store` and it didn't work, then I read the `upspin audit -help` and exeucted
the commands in the correct order and it worked:

```
upspin -config=upspin/deploy/config audit scan-store
upspin audit scan-dir  rodrigos...@gmail.com
upspin audit scan-dir  rodrigosc...@gmail.com
upspin audit scan-dir rodrig...@gmail.com
upspin audit find-garbage
upspin -config=upspin/deploy/config audit delete-garbage
```

The command took some time to finish and seemed to work, reduced the storage
size from ~6GB to ~3.5GB as expected, I was able to list and read all the files
I tried to and I could create a test file "rodrig...@gmail.com/hello.txt".

Today I tried to copy a file ("book.pdf") to "rodrig...@gmail.com/livros/" and
got an error:

```
$ cd $HOME/u/rodrig...@gmail.com/livros
$ cp /tmp/book.pdf .
cp: cannot create regular file './book.pdf': Input/output error
```

At first I was able to read the files inside `livros` but then I noticed it was
because the cacheserver, so I set `cache: off` and restarted the `upspinfs` service.
After that everything inside `livros` stopped working.

When I `ls` my root directory I got:

```
$ cd $HOME/u/rodrig...@gmail.com/
$ ls -lha
ls: cannot access 'livros': No such file or directory
total 80K
-rwx------ 1 rschio rschio   27 Nov  9  2022 Access
drwx------ 1 rschio rschio  55K Jun 25 14:29 Books
drwx------ 1 rschio rschio 6.3K Jun 25 14:29 certificados
drwx------ 1 rschio rschio 3.8K Jun  5 20:51 .config
drwx------ 1 rschio rschio 5.4K Jun 25 16:35 documentos
drwx------ 1 rschio rschio 1.6K Jun 25 16:35 exames
drwx------ 1 rschio rschio 3.1K Dec 29  2020 faculdade
drwx------ 1 rschio rschio  616 May 30  2021 Group
?????????? ? ?      ?         ?            ? livros
drwx------ 1 rschio rschio 1.1K Jun 24 18:51 Public
```

I also tried to ls with upspin command but got this:
```
$ upspin ls rodrig...@gmail.com/livros
upspin: ls: client.Lookup: rodrig...@gmail.com/livros:
        dir/remote("upspin.schio.dev:443").Lookup:
        dir/server.Lookup:
        store/remote("upspin.schio.dev:443").Get: fetching https://storage.googleapis.com/schio-dev-upspin/BCEBA32DD638D051FADF3C3884C9CEB0F14607FD53F7C99BFED0F254D9DF9E3D: 403 Forbidden
```

Then I went to Google Cloud Storage to check why it was returning 403, it was
because the file does not exists (probably was deleted).

I tried some desperate things to fix the problem like creating an empty file
with name BCEBA32DD638D051FADF3C3884C9CEB0F14607FD53F7C99BFED0F254D9DF9E3D in
Cloud Storage, but it didn't work (the checksum was wrong).

I thought this ref was the `livros` directory ref or `livros/Access` ref, so it
would make impossible to read the directory or impossible to do WichAccess.

The content of "rodrig...@gmail.com/livros/Access" was:
```
read, list, create, write: rodrig...@gmail.com, rodrigosc...@gmail.com
```

In the file "$HOME/upspin/audit/garbage_upspin.schio.dev:443_1687638891" I found
the ref and looking to the size, probably is not the Access file:

$HOME/upspin/audit/garbage_upspin.schio.dev:443_1687638891
```
...
"BCEBA32DD638D051FADF3C3884C9CEB0F14607FD53F7C99BFED0F254D9DF9E3D" 50844
...
```

I searched in the file "$HOME/upspin/audit/dir_upspin.schio.dev:443_rodrig...@gmail.com_1687717489"
for the same ref and didn't find, but I found the refs to both `livros` and `livros/Access`:

$HOME/upspin/audit/dir_upspin.schio.dev:443_rodrig...@gmail.com_1687717489
```
...
"37105CBAFCB681B7159252E3982F821BB21BF092A9636F7DD39BB9AF46618D71" 45662 "rodrig...@gmail.com/livros"
...
"4C34AE2E1E3CA17BCB5C58DFB5AD88E59F17627ABFC460C1269328D333A49ABA" 81 "rodrig...@gmail.com/livros/Access"
...
```

I noticed the same problem in one other directory, all the other directories and files
are working.

---------------
Summing up:

- First I'm not sure the problem was caused by `upspin audit`, as I said in the beginning,
I migrated my upspin server from one cloud to another and it is possible I messed with something.

- I didn't lose my files because the backup and cacheserver saved me, so that's ok.

- The problem is that every time I `ls` my upspin root dir (without cache) I get the broken directory
`livros` and I am not able to delete it, nor interact in any way.


Is there a way to delete a DirEntry that points to an invalid block? Or is there a way to remove this
directory?

Any help is welcome, thanks.

- Rodrigo Schio

Albert-Jan de Vries

unread,
Oct 30, 2023, 11:21:17 AM10/30/23
to Upspin
Hi Rodrigo,

Sorry to hear that your upspin server is in a broken state. The upspin-audit command needs to be used with caution. I'm not sure what happened, the only thing I can imagine is that you didn't complete the dir scan for the right user?

Restoring the state is very difficult and probably means manual intervention and some coding. You didn't use a backup store when running the audit command?

Regards,
-- AJ

Op maandag 26 juni 2023 om 01:11:19 UTC+2 schreef Rodrigo Schio:
Reply all
Reply to author
Forward
0 new messages