Does Clair Garbage Collection delete old vulnerabilities?

Iain Duncan

unread,

May 3, 2023, 3:00:22 AM5/3/23

to clair-dev

Hi,

We're having problems with the size of our database and I came across this Issue that matches what we see with having a very large uo_vuln table (we have update_retention set to -1 to turn off GC):

https://github.com/quay/clair/issues/1362

The solution in the issue was to set the update_retention value and TRUNCATE the uo_vuln table. My understanding from the Issue and the GC docs in Clair (https://github.com/quay/claircore/blob/3591883e16d41abc7348ed0059b02b1a3337a960/datastore/postgres/gc.go#L47-L55) suggests that the Garbage Collection in Clair will look for the oldest update operations that were run to load vulnerabilities and then delete that UO and the vulnerabilities associated with it. Is that understanding correct?

If this is the case does that mean that if an old image with old vulnerabilities was scanned then those vulnerabilities would not be found (this was the reason for turning off GC as we were worried not all vulnerabilities would remain in the DB)?

Would love a bit more detail on how the GC works so we can hopefully bring down our DB size!

Thanks,

Iain

Joseph Crosland

unread,

May 3, 2023, 11:41:23 AM5/3/23

to clair-dev

In order to explain the current GC process there are couple of things to know: an update_operation is created each time an updater runs (i.e. when we deem that things have changed on a remote source), the uo_vuln table holds the relationship between that update_operation and the vuln table.

When there is a change in a security feed we will pull in all the data and try and insert it again (these inserts can conflict if the vuln information is the same and this is expected), during this update we also assign all the vulns we process (whether they already existed or not) to a newly created update_operation (through the uo_vuln table).
When the GC runs it basically deletes old update operations (not all "unused" but any older update_operations dictated by update_retention config value), (2 update operations per updater is the lowest value as it's needed for the notifier*).
The deletion of the update_operation will result in the implicit deletion of the related uo_vuln rows
Finally the GC will remove any orphaned vulns (vulns without a uo_vuln row)

So in answer to your question, yes the GC will eventually delete vulnerabilities that no longer exist in external sources, this is desired behaviour to prevent the matcher DB continually growing but still providing an accurate picture of what our sources are saying. But note, if the vulnerability does still exist in the remote source it will be continually associate to the newest update_operation and won't be GCd.

* the notifier needs 2 update_operations to be able to provide the delta between two updates, this delta can be used to see which vulns were added and thus trigger notifications.

Iain Duncan

unread,

May 4, 2023, 6:24:54 AM5/4/23

to clair-dev

Thanks for the detailed explanation! It was the part where the updaters put *all* the current vulnerabilities so that current vulnerabilities wouldn't be deleted that I was missing in my understanding so now the GC makes complete sense to me and I'll get it turned on.