This is a very good question! Data lifecycle management is an important decision.
You would have to think about what you want to do with the data that you collect? I think of the endpoint as the source of truth for all data, and all we are doing is querying that data at a point in time. So the data that Velociraptor collects on the server is like a snapshot - it is temporary and useful in the short term but goes out of date quickly.
For example, instead of diving back through data I collected a year ago, it can just recollect again in 2 min from current systems today. Not only does 1 year old data have little value (things tend to change quickly with a real compromise), but it will take a lot of effort to churn through the old data anyway because we have to do it in a single place.
So maybe retaining data for a long time is not too useful? There are PII considerations too. We have the artifacts Server.Utils.BackupS3 etc to backup data to cheaper storage if you really need to keep it.
You can also write an artifact that deletes old flows (collections). The majority of space is taken up by collections. A hunt is just a set of collections so the hunt itself does not store much data, just a list of collections on clients. Going through all collections and deleting old ones is easy to do with VQL.
Thanks
Mike
| Mike Cohen Digital Paleontologist, Velocidex Enterprises |
| | | | |
|
|