Rebuild from old blobs

160 views
Skip to first unread message

Matthias Teege

unread,
May 25, 2018, 8:10:24 AM5/25/18
to per...@googlegroups.com
Hello,

I've copied blobs from an old camlistore server to a new one. After
running perkeep -reindex I can retrieve blobs with "pk get $HASH".

In the WebUI I see a lot of folder symbols but if I select one I've
got "No resutls found". I can "pk get $HASH" but not search for
"ref:$HASH" in the UI.

How do I import the data correctly?

Thanks
Matthias

Mathieu Lonjaret

unread,
May 25, 2018, 9:53:43 AM5/25/18
to per...@googlegroups.com
Hi,

Did you change anything about your config? If you're using a different
config file or config dir, did you make sure to copy over your GPG key
as well?
Do the server logs say anything relevant? What about the javascript
console in your browser, when you try to browse through the web UI?

Regards,
Mathieu
> --
> You received this message because you are subscribed to the Google Groups "Perkeep" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to perkeep+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Matthias Teege

unread,
May 25, 2018, 11:44:11 AM5/25/18
to per...@googlegroups.com
On Fri, May 25, 2018 at 03:53:20PM +0200, Mathieu Lonjaret wrote:

Hi,

> Did you change anything about your config? If you're using a different
> config file or config dir, did you make sure to copy over your GPG key
> as well?

No, I did not copy the GPG key from the old setup. My goal is to
merge three different perkeep/camlistore instances and I assumed,
that I can rebuild the index from the blobs only. Would be nice for
a long term archive. :)

> Do the server logs say anything relevant? What about the javascript

Looks like the reindexing stops after some blobs. The old blobs are
hashed with SHA1. The blobdir looks like this:

drwx------. 3 mt user 4096 May 25 15:57 cache
-rw-------. 1 mt user 707 May 25 10:40 GENERATION.dat
drwx------. 3 mt user 4096 May 25 15:57 packed
drwx------. 258 mt user 4096 May 25 11:18 sha1
drwx------. 15 mt user 4096 May 25 13:44 sha224
drwx------. 2 mt user 4096 May 25 15:57 sync-to-index-queue.leveldb
drwx------. 2 mt user 4096 May 25 15:57 thumbmeta.leveldb

% perkeepd -reindex -keep-going ~
2018/05/25 15:57:25 Starting perkeepd version master, 2018-05-24-a140b28f5d; Go go1.10.2 (linux/amd64)
2018/05/25 15:57:25 Starting to listen on http://localhost:3179
2018/05/25 15:57:25 Index wiped.
2018/05/25 15:57:25 blobpacked: checking integrity of packed blobs against index...
2018/05/25 15:57:25 blobpacked: 0 large blobs found in index, 0 missing from index
2018/05/25 15:57:25 Reindexing /index/ ...
2018/05/25 15:57:25 Rebuilding index...
2018/05/25 15:57:25 Reindexing at sha1-00010ede8ad52b7058b0f13edda52e9e507aa9a5
2018/05/25 15:57:26 Reindexing at sha1-0b39b8fb29f165151e6795605e50576e4815b080
2018/05/25 15:57:27 Reindexing at sha1-153b36dd882c715c88b074f38f8282795ef59243
2018/05/25 15:57:28 Reindexing at sha1-191022a84941213748f09578ef2da503ab704c79
2018/05/25 15:57:29 Reindexing at sha1-2504c9a027481489afd5e8e2b79ff9d704d771b0
2018/05/25 15:57:30 Reindexing at sha1-31338a3e59e19dbe72b1ddbb59650d2b406fd12b
2018/05/25 15:57:31 Reindexing at sha1-3932f036baa715a149ced9152db608ed981ead20
2018/05/25 15:57:32 Reindexing at sha1-3ff471a21a2e03e8ec0cb234f90fbd5110abe9b3
2018/05/25 15:57:33 Reindexing at sha1-4bd5fdb2ad4020aeac3cedd958cd0cdb9aab4682
2018/05/25 15:57:34 Reindexing at sha1-548adc982d90338bef6cfe91c49d4ff33691c997
2018/05/25 15:57:35 Reindexing at sha1-5e70312c9cb86fa3a37610659da298a700cfaab1
2018/05/25 15:57:36 Reindexing at sha1-6a970e17001b5ad2e6101dce2435de704bc572b4
2018/05/25 15:57:37 Reindexing at sha1-759122320b96ae9e025cff107066b4277b6f19b3
2018/05/25 15:57:38 Reindexing at sha1-7aca3dc69f464e777b422cdc616bc8998c55f6ca
2018/05/25 15:57:39 Reindexing at sha1-85f0e32697f2a06e3541390626467d225e45758b
2018/05/25 15:57:40 Reindexing at sha1-8ffe939a78c9a475ba6246e8a22946d964bfddb8
2018/05/25 15:57:41 Reindexing at sha1-974f356b03601dd477601b3d773b7ba450ad3a0f
2018/05/25 15:57:42 Reindexing at sha1-a229dc412e621bdeff74bfff6e1f80707f7df214
2018/05/25 15:57:43 Reindexing at sha1-ab87b524eea2bf4dd6e904cc2e87f7bf749481e2
2018/05/25 15:57:44 Reindexing at sha1-b43e7b2a750b4c1f49f50024f989e7be3694dd03
2018/05/25 15:57:45 Reindexing at sha1-b9e2ffd75055d922760e5c983906f0ce1c61895b
2018/05/25 15:57:46 Reindexing at sha1-c3f69a5086b391794f835ced8faba56700b2ef7e
2018/05/25 15:57:47 Reindexing at sha1-ccc22e2409bc21c1f4ae1a8d04154508507f7065
2018/05/25 15:57:48 Reindexing at sha1-d608a81e426c3a2e8757068c6741fa6f946f9a65
2018/05/25 15:57:49 Reindexing at sha1-df5e761f6edfd8413f69745c3bc306e77c4d876f
2018/05/25 15:57:50 Reindexing at sha1-e91c136ee0b6623c4b76cae7970236a80a30c918
2018/05/25 15:57:51 Reindexing at sha1-f1721c26cc21b41f41be41ef376b0238fa6a37fe
2018/05/25 15:57:52 Reindexing at sha1-fa8f698d8452a7fc752c5a36e8e7a19326cc4999
2018/05/25 15:57:53 Index rebuild complete.

Thats all but there are a lot more blobs.

% pk list | grep sha1 | wc -l
104961

% pk list | grep sha224 | wc -l
13

There is no explicit error message.

Thanks,
Matthias

Mathieu Lonjaret

unread,
May 25, 2018, 12:07:09 PM5/25/18
to per...@googlegroups.com, Matthias Teege
On 25 May 2018 at 17:44, Matthias Teege <matthia...@mteege.de> wrote:
> On Fri, May 25, 2018 at 03:53:20PM +0200, Mathieu Lonjaret wrote:
>
> Hi,
>
>> Did you change anything about your config? If you're using a different
>> config file or config dir, did you make sure to copy over your GPG key
>> as well?
>
> No, I did not copy the GPG key from the old setup. My goal is to
> merge three different perkeep/camlistore instances and I assumed,
> that I can rebuild the index from the blobs only. Would be nice for
> a long term archive. :)

your GPG identity is your proof of ownership. the blobs and files
themselves are still accessible wherever your move them, but the index
respects the ownership of any mutation made, i.e. of any claim or
permanode created. So if you're using your new perkeepd with a new GPG
key, when you're browsing with the web UI, you're asking the index to
show you the permanodes that you (i.e. the new GPG key) are the owner
of. It cannot show you the old permanodes, because you're not the
owner of them anymore.
Long story short, if you've lost your old GPG keys you still have your
contents/files/data, but you'll have to recreate all the permanodes
for them with the new GPG key.
yeah I would guess these are only the blobs your created with the new
GPG identity.

> Thats all but there are a lot more blobs.
>
> % pk list | grep sha1 | wc -l
> 104961
>
> % pk list | grep sha224 | wc -l
> 13
>
> There is no explicit error message.
>
> Thanks,
> Matthias
>

Matthias Teege

unread,
May 25, 2018, 5:54:34 PM5/25/18
to per...@googlegroups.com
On Fri, May 25, 2018 at 06:06:46PM +0200, Mathieu Lonjaret wrote:

Hi,

> > No, I did not copy the GPG key from the old setup. My goal is to

> Long story short, if you've lost your old GPG keys you still have your
> contents/files/data, but you'll have to recreate all the permanodes
> for them with the new GPG key.

Thank you for clarifying. Is it possible to use more then one
identity on a single perkeep instance?

> > 2018/05/25 15:57:51 Reindexing at sha1-f1721c26cc21b41f41be41ef376b0238fa6a37fe
> > 2018/05/25 15:57:52 Reindexing at sha1-fa8f698d8452a7fc752c5a36e8e7a19326cc4999
> > 2018/05/25 15:57:53 Index rebuild complete.
>
> yeah I would guess these are only the blobs your created with the new
> GPG identity.

Hmm, the new instance was almost empty and IMHO uses sha-224 as
default. I've copied the old identify to the new server. Starting
perkeepd gives me:

2018/05/25 23:04:31 Rebuilding index...
2018/05/25 23:04:31 Reindexing at sha1-00010ede8ad52b7058b0f13edda52e9e507aa9a5
2018/05/25 23:04:32 Reindexing at sha1-0b99792ba9b537fa3622da2346f72d5f9e8dba7f
2018/05/25 23:04:33 Reindexing at sha1-14d61d0fc34249b8eb7b7c74b7c8d348c6c74433
2018/05/25 23:04:34 Reindexing at sha1-181456b2020e2ce506fd7499f60c705f2574a87b
2018/05/25 23:04:35 Reindexing at sha1-23c4eca8533e2ccb45b9d5847e722c0c246eecd4
2018/05/25 23:04:36 Reindexing at sha1-2ef4791b2fb5f75d459c6f77293bfdbdab52bf2a
2018/05/25 23:04:37 Reindexing at sha1-37a84014711bfb20d5ddffc3512e1deba2d5eac6
2018/05/25 23:04:38 Reindexing at sha1-3f798547f2e8390059d7d9ea0794cc6c26544c0b
2018/05/25 23:04:39 Reindexing at sha1-485fd48855dbdaad37136dacd677960804559b6d
2018/05/25 23:04:40 Reindexing at sha1-52cf179d7a859e99bda3c8b60abbd1a7290ec2cf
2018/05/25 23:04:41 Reindexing at sha1-5d151bb17854d334983ab1282a1f557e09ee7574
2018/05/25 23:04:42 Reindexing at sha1-6777b372275c63f2cb5b065997f975cb146de318
2018/05/25 23:04:43 Reindexing at sha1-7321695a0a96675daea6b797f88ff09c9fa16cc6
2018/05/25 23:04:44 Reindexing at sha1-77cabbdf1c8e9b14028a7aa1b8b9b4b3c30de2d1
2018/05/25 23:04:45 Reindexing at sha1-82365f43b8decb3b8d27ad525e0b16ffed778ee6
2018/05/25 23:04:46 Reindexing at sha1-8d2be4d6240f9ba09919cf4fa25b195c2edbed2d
2018/05/25 23:04:47 Reindexing at sha1-95d6ae4d75aea7e87fb25e9c459855a2deb88aa1
2018/05/25 23:04:48 Reindexing at sha1-9f2056294c8ca7e090c773b2c1c1df4832debc44
2018/05/25 23:04:49 Reindexing at sha1-a7c8ed7152ae01550a6d2514e2a890cf3eaf1896
2018/05/25 23:04:50 Reindexing at sha1-b07eed218c799119aa90de682d396669bcd4eec5
2018/05/25 23:04:52 Reindexing at sha1-b9de4448fd0f4b0e2cfa81c1322fd948cc0ce8e9
2018/05/25 23:04:53 Reindexing at sha1-c3ef5caf71ab2b4c82e144092e205ae9f54efc92
2018/05/25 23:04:54 Reindexing at sha1-cc4b61649d4975d40fd3c52fb92c1107ca1c4e13
2018/05/25 23:04:55 Reindexing at sha1-d545959e2c0b349cb3df568190734f15abba0689
2018/05/25 23:04:56 Reindexing at sha1-decccf7b5df4ed52af02c4b4e49421b41f57ca82
2018/05/25 23:04:57 Reindexing at sha1-e70c1ffab2668845a68caff5a26a98b3564b63ba
2018/05/25 23:04:58 Reindexing at sha1-efe29d61f9e4de554c9950d7ea661e5cba61646d
2018/05/25 23:04:59 Reindexing at sha1-f8087c1297b1fa2c40ab531214fe8d5a0363e9a3
2018/05/25 23:05:00 Index rebuild complete.
2018/05/25 23:05:00 index/corpus: loading into memory...

There are more the 100000 blobs. Shouldn't I see each blob during
a reindexing?

This is the stats output from perkeep:

2018/05/25 23:39:06 index/corpus: stats: 54.149 MiB mem: 104975 blobs (1.686 GiB) (58510 schema (1064 permanode, 36974 file (317 image), ...)

If I search I only see some objects:

$ pk search 'is:image' | grep '"blob"' | wc -l
5

$ pk search '{"camliType":"file"}' | grep '"blob"' | wc -l
200

$ pk search 'filename:*' | grep '"blob"' | wc -l
200

$ pk search '{"camliType":"permanode"}' | grep '"blob"' | wc -l
200

Matthias

Brad Fitzpatrick

unread,
May 25, 2018, 7:06:42 PM5/25/18
to per...@googlegroups.com, Matthias Teege
It's just showing where it's "at" periodically. Note that the log lines are each 1 second part.


Mathieu Lonjaret

unread,
May 25, 2018, 7:49:04 PM5/25/18
to per...@googlegroups.com, Matthias Teege
On 25 May 2018 at 23:54, Matthias Teege <matthia...@mteege.de> wrote:
> On Fri, May 25, 2018 at 06:06:46PM +0200, Mathieu Lonjaret wrote:
>
> Hi,
>
>> > No, I did not copy the GPG key from the old setup. My goal is to
>
>> Long story short, if you've lost your old GPG keys you still have your
>> contents/files/data, but you'll have to recreate all the permanodes
>> for them with the new GPG key.
>
> Thank you for clarifying. Is it possible to use more then one
> identity on a single perkeep instance?

Not at the same time, no. Nothing prevents you from storing blobs from
different identities in the same blobserver, but you have to start a
different perkeepd for each identity (hence why you specify an
identity in the config file).
Yep, that looks good, no?

> If I search I only see some objects:
>
> $ pk search 'is:image' | grep '"blob"' | wc -l
> 5
>
> $ pk search '{"camliType":"file"}' | grep '"blob"' | wc -l
> 200
>
> $ pk search 'filename:*' | grep '"blob"' | wc -l
> 200
>
> $ pk search '{"camliType":"permanode"}' | grep '"blob"' | wc -l
> 200

pk uses the client config file. Have you also set the correct identity
in your client config file?

Brad Fitzpatrick

unread,
May 26, 2018, 12:15:44 AM5/26/18
to per...@googlegroups.com, Matthias Teege
On Fri, May 25, 2018 at 4:48 PM, Mathieu Lonjaret <mathieu....@gmail.com> wrote:
On 25 May 2018 at 23:54, Matthias Teege <matthia...@mteege.de> wrote:
> On Fri, May 25, 2018 at 06:06:46PM +0200, Mathieu Lonjaret wrote:
>
> Hi,
>
>> > No, I did not copy the GPG key from the old setup. My goal is to
>
>> Long story short, if you've lost your old GPG keys you still have your
>> contents/files/data, but you'll have to recreate all the permanodes
>> for them with the new GPG key.
>
> Thank you for clarifying. Is it possible to use more then one
> identity on a single perkeep instance?

Not at the same time, no. Nothing prevents you from storing blobs from
different identities in the same blobserver, but you have to start a
different perkeepd for each identity (hence why you specify an
identity in the config file).

Well, we could support letting you list multiple identities in the config file to treat as yourself for your own search queries.

We'll probably need that over time for key rotation and other such key loss events.

File a bug to track that?


Matthias Teege

unread,
May 28, 2018, 5:14:59 AM5/28/18
to per...@googlegroups.com
On Sat, May 26, 2018 at 01:48:40AM +0200, Mathieu Lonjaret wrote:

Hi,

> On 25 May 2018 at 23:54, Matthias Teege <matthia...@mteege.de> wrote:
> > On Fri, May 25, 2018 at 06:06:46PM +0200, Mathieu Lonjaret wrote:
> >
> > Thank you for clarifying. Is it possible to use more then one
> > identity on a single perkeep instance?
>
> Not at the same time, no. Nothing prevents you from storing blobs from
> different identities in the same blobserver, but you have to start a
> different perkeepd for each identity (hence why you specify an
> identity in the config file).

Ah, that's okay. I think I should use different indexes too?

> > This is the stats output from perkeep:
> >
> > 2018/05/25 23:39:06 index/corpus: stats: 54.149 MiB mem: 104975 blobs (1.686 GiB) (58510 schema (1064 permanode, 36974 file (317 image), ...)

> Yep, that looks good, no?

Yes, perfect :) but I'm not sure if I can retrieve all images and files.

> pk uses the client config file. Have you also set the correct identity
> in your client config file?

Yes, I've checked it again:

New identity (almost empty):

$ jq -r .identity .config/perkeep/server-config.json ~
B18487E09E6A321F
$ jq -r .identity .config/perkeep/client-config.json ~
B18487E09E6A321F
$ perkeepd -reindex
2018/05/28 10:26:51 index/corpus: stats: 53.993 MiB mem: 104975 blobs (1.686 GiB) (58510 schema (1064 permanode, 36974 file (317 image), ...)
$ pk search 'is:image' | jq -r '.blobs[].blob' | wc -l
1

Thats ok. There is only one image for that identity.

For the old identity it looks like this:

$ jq -r .identity .config/perkeep/server-config.json
4F6E82AB
$ jq -r .identity .config/perkeep/client-config.json
4F6E82AB
$ perkeepd -reindex
018/05/28 10:16:05 index/corpus: stats: 47.446 MiB mem: 104975 blobs (1.686 GiB) (58510 schema (1064 permanode, 36974 file (317 image), ...)
$ pk search 'is:image' | jq -r '.blobs[].blob' | wc -l
5

I'm not sure about the "missing" 311 images.

I've checked for another identity:

$ grep -r camliSigner blobs/ | cut -d ':' -f3 | sort | uniq -c ~
7 "sha1-1db6fc5e7add169face657fb6e53c6595dce0948"
4462 "sha1-1db6fc5e7add169face657fb6e53c6595dce0948",
4 "sha224-3368771ade1c68d6dc2444973be24e3bc99b0083fc33af8e82f28f4c",

but there are only two.

Matthias

Mathieu Lonjaret

unread,
May 28, 2018, 10:22:29 AM5/28/18
to per...@googlegroups.com
On 28 May 2018 at 11:14, Matthias Teege <matthia...@mteege.de> wrote:
> On Sat, May 26, 2018 at 01:48:40AM +0200, Mathieu Lonjaret wrote:
>
> Hi,
>
>> On 25 May 2018 at 23:54, Matthias Teege <matthia...@mteege.de> wrote:
>> > On Fri, May 25, 2018 at 06:06:46PM +0200, Mathieu Lonjaret wrote:
>> >
>> > Thank you for clarifying. Is it possible to use more then one
>> > identity on a single perkeep instance?
>>
>> Not at the same time, no. Nothing prevents you from storing blobs from
>> different identities in the same blobserver, but you have to start a
>> different perkeepd for each identity (hence why you specify an
>> identity in the config file).
>
> Ah, that's okay. I think I should use different indexes too?

Yes.
Ok. I suppose it could be another bug with the switch to sha224, or
the switch to using the long form of the key ID fingerprint. Let me
think on how to debug that further.
What kind of index are you using? MySQL, or LevelDB, or ... ?

> I've checked for another identity:
>
> $ grep -r camliSigner blobs/ | cut -d ':' -f3 | sort | uniq -c ~
> 7 "sha1-1db6fc5e7add169face657fb6e53c6595dce0948"
> 4462 "sha1-1db6fc5e7add169face657fb6e53c6595dce0948",
> 4 "sha224-3368771ade1c68d6dc2444973be24e3bc99b0083fc33af8e82f28f4c",
>
> but there are only two.
>
> Matthias
>

Mathieu Lonjaret

unread,
May 28, 2018, 10:53:15 AM5/28/18
to per...@googlegroups.com
wait, I've just realized the answer could be very simple. Did you make
permanodes for all these files and images?
the 'is:image' query is looking for permanodes that have a
camliContent image, not images by themselves. So if you stored these
311 images without making permanodes for them, it's normal that the
index finds them, but that the search query does not.

Mathieu Lonjaret

unread,
May 28, 2018, 11:04:33 AM5/28/18
to per...@googlegroups.com
To specifically look for file schemas (and see if we do find your
36974), you can do this instead :
pk search -rawquery '{"constraint": {"camliType": "file"}, "Limit": -1}'

Matthias Teege

unread,
May 28, 2018, 11:26:46 AM5/28/18
to per...@googlegroups.com
On Mon, May 28, 2018 at 04:22:06PM +0200, Mathieu Lonjaret wrote:

Hi,

> On 28 May 2018 at 11:14, Matthias Teege <matthia...@mteege.de> wrote:
> > I'm not sure about the "missing" 311 images.
>
> Ok. I suppose it could be another bug with the switch to sha224, or
> the switch to using the long form of the key ID fingerprint. Let me
> think on how to debug that further.
> What kind of index are you using? MySQL, or LevelDB, or ... ?

I use a LevelDB (default) index.

Thanks,
Matthias

Matthias Teege

unread,
May 28, 2018, 11:39:09 AM5/28/18
to per...@googlegroups.com
On Mon, May 28, 2018 at 05:04:11PM +0200, Mathieu Lonjaret wrote:

Hi,

> On 28 May 2018 at 16:52, Mathieu Lonjaret <mathieu....@gmail.com> wrote:
> > On 28 May 2018 at 16:22, Mathieu Lonjaret <mathieu....@gmail.com> wrote:

> >>> I'm not sure about the "missing" 311 images.
> >
> > wait, I've just realized the answer could be very simple. Did you make
> > permanodes for all these files and images?
> > the 'is:image' query is looking for permanodes that have a
> > camliContent image, not images by themselves. So if you stored these
> > 311 images without making permanodes for them, it's normal that the
> > index finds them, but that the search query does not.
>
> To specifically look for file schemas (and see if we do find your
> 36974), you can do this instead :
> pk search -rawquery '{"constraint": {"camliType": "file"}, "Limit": -1}'

ok:

$ pk search -rawquery '{"constraint": {"camliType": "file"}, "Limit": -1}' | grep blob | wc -l
31085

$ pk search -rawquery '{"constraint": {"camliType": "permanode"}, "Limit": -1}' | grep blob | wc -l
1065

Not exactly :)
Matthias

Mathieu Lonjaret

unread,
May 28, 2018, 12:08:20 PM5/28/18
to per...@googlegroups.com
On 28 May 2018 at 17:39, Matthias Teege <matthia...@mteege.de> wrote:
> On Mon, May 28, 2018 at 05:04:11PM +0200, Mathieu Lonjaret wrote:
>
> Hi,
>
>> On 28 May 2018 at 16:52, Mathieu Lonjaret <mathieu....@gmail.com> wrote:
>> > On 28 May 2018 at 16:22, Mathieu Lonjaret <mathieu....@gmail.com> wrote:
>
>> >>> I'm not sure about the "missing" 311 images.
>> >
>> > wait, I've just realized the answer could be very simple. Did you make
>> > permanodes for all these files and images?
>> > the 'is:image' query is looking for permanodes that have a
>> > camliContent image, not images by themselves. So if you stored these
>> > 311 images without making permanodes for them, it's normal that the
>> > index finds them, but that the search query does not.
>>
>> To specifically look for file schemas (and see if we do find your
>> 36974), you can do this instead :
>> pk search -rawquery '{"constraint": {"camliType": "file"}, "Limit": -1}'
>
> ok:
>
> $ pk search -rawquery '{"constraint": {"camliType": "file"}, "Limit": -1}' | grep blob | wc -l
> 31085

I think the corpus stats line counts directories as "files" as well.
So let's see if the ~6000 discrepancy comes from there?
pk search -rawquery '{"constraint": {"camliType": "directory"},
"Limit": -1}' | jq -r '.blobs[].blob' | wc -l

> $ pk search -rawquery '{"constraint": {"camliType": "permanode"}, "Limit": -1}' | grep blob | wc -l
> 1065
>
> Not exactly :)
> Matthias
>

Simon B.

unread,
May 29, 2018, 1:25:49 AM5/29/18
to per...@googlegroups.com
Now how about orphans or meta blobs?

All this ought to be part of pk geekout or pk overview or something. Some blob type and size overview.

the 'is:image' query finds permanodes that have a camliContent image, not images by themselves.

To look for file schemas :

pk search -rawquery '{"constraint": {"camliType": "file"}, "Limit": -1}'


to count directories:

Matthias Teege

unread,
May 29, 2018, 6:01:44 AM5/29/18
to per...@googlegroups.com
On Mon, May 28, 2018 at 06:07:57PM +0200, Mathieu Lonjaret wrote:

Hi,

> I think the corpus stats line counts directories as "files" as well.
> So let's see if the ~6000 discrepancy comes from there?
> pk search -rawquery '{"constraint": {"camliType": "directory"},
> "Limit": -1}' | jq -r '.blobs[].blob' | wc -l

it matches:

index/corpus: stats: 47.300 MiB mem: 104975 blobs (1.686 GiB) (58510 schema (1064 permanode, 36974 file (317 image)

$ pk search -rawquery '{"constraint": {"camliType": "directory"},"Limit": -1}' | jq -r '.blobs[].blob' | wc -l
5890
$ pk search -rawquery '{"constraint": {"camliType": "file"},"Limit": -1}' | jq -r '.blobs[].blob' | wc -l
31084
$ echo '5890 31084 + p' | dc
36974

Perfect! I can now entrust my data to perkeep. :-)

Thank you for your patience.
Matthias

Sandy Tse

unread,
May 29, 2018, 2:18:01 PM5/29/18
to per...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "Perkeep" group.
To unsubscribe from this group and stop receiving emails from it, send an email to perkeep+unsubscribe@googlegroups.com.

Mathieu Lonjaret

unread,
May 29, 2018, 2:21:28 PM5/29/18
to per...@googlegroups.com
Sorry all, I mis-clicked and marked this one as valid instead of as spam.

On 29 May 2018 at 10:50, Sandy Tse <sandy...@gmail.com> wrote:
> https://www.youtube.com/watch?v=bvikgvuOjps
>
> On Tue, May 29, 2018 at 1:25 PM, Simon B. <simon....@gmail.com> wrote:
>>
>> Now how about orphans or meta blobs?
>>
>> All this ought to be part of pk geekout or pk overview or something. Some
>> blob type and size overview.
>>
>>> the 'is:image' query finds permanodes that have a camliContent image, not
>>> images by themselves.
>>>
>>> To look for file schemas :
>>> pk search -rawquery '{"constraint": {"camliType": "file"}, "Limit": -1}'
>>>
>>>
>>> to count directories:
>>>
>>> pk search -rawquery '{"constraint": {"camliType": "directory"},
>>> "Limit": -1}' | jq -r '.blobs[].blob' | wc -l
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Perkeep" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to perkeep+u...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Perkeep" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to perkeep+u...@googlegroups.com.

Mathieu Lonjaret

unread,
May 29, 2018, 2:22:39 PM5/29/18
to per...@googlegroups.com, Matthias Teege
Excellent, thanks.
Reply all
Reply to author
Forward
0 new messages