Cleaning up the nodestore_node table

3,325 views
Skip to first unread message

Silvian Cretu

unread,
Feb 26, 2014, 4:16:08 AM2/26/14
to gets...@googlegroups.com
Hi guys,
I'm using sentry's cleanup option in order to keep my database in order, but the nodestore_node table becomes bigger and bigger:

mysql> SELECT TABLE_NAME, table_rows, data_length, index_length, round(((data_length + index_length) / 1024 / 1024),2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema = 'sentry' and TABLE_TYPE='BASE TABLE' ORDER BY data_length DESC limit 0, 5;
+-----------------------+------------+-------------+--------------+------------+
| TABLE_NAME            | table_rows | data_length | index_length | Size in MB |
+-----------------------+------------+-------------+--------------+------------+
| nodestore_node        |    3129200 | 17075503104 |    247480320 |   16520.48 | 
| sentry_message        |    3103097 |  1272987648 |   1069940736 |    2234.39 | 
| sentry_searchtoken    |    6511913 |   354713600 |    509870080 |     824.53 | 
| sentry_eventmapping   |    3047297 |   237207552 |    424886272 |     631.42 | 
| sentry_groupedmessage |     138770 |    69320704 |     84279296 |     146.48 | 
+-----------------------+------------+-------------+--------------+------------+

Is there any way to safely truncate it? I'm using Sentry 6.4.2.1 with MySQL 5.5.33. Thanks!

David Cramer

unread,
Feb 26, 2014, 1:52:01 PM2/26/14
to gets...@googlegroups.com
This is mostly a limitation of sql, you can run cleanup (to remove old rows) but it wont reclaim the space. There’s no way to reclaim the space without downtime.

We use Riak on getsentry.com to work around this, mostly because we need to store the data longer, and we also dont want to deal w/ the massive amount of data in a Postgres db.
--
You received this message because you are subscribed to the Google Groups "sentry" group.
To unsubscribe from this group and stop receiving emails from it, send an email to getsentry+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jesse Kretschmer

unread,
Apr 1, 2014, 1:46:45 PM4/1/14
to gets...@googlegroups.com
I am running out of storage on my sentry server. It has been running for almost two months and it is now taking more space than I had hoped. I've read through the docs and found the cleanup option and I have since run it. As Silvian experienced, I actually lost space after the command was run.

On Wednesday, February 26, 2014 10:52:01 AM UTC-8, David Cramer wrote:
This is mostly a limitation of sql, you can run cleanup (to remove old rows) but it wont reclaim the space. There’s no way to reclaim the space without downtime.

I can take some downtime on my sentry server. What can I do to reclaim the space? I'm not seeing any info in the docs.

Cheers,
Jesse

Jesse Kretschmer

unread,
Apr 1, 2014, 2:25:46 PM4/1/14
to gets...@googlegroups.com
I discovered the vacuum command for postgres. I've just run it on my sentry database and it did reclaim a wee bit of space. I don't know if I've configured my setup appropriately. The cleanup process is still a bit opaque for me.

So essentially this is what I've run:
sentry cleanup
# lost about 100M of space on my server
vacuumdb -dsentry -f --username=sentry --password
# reclaimed about 300M of space on my server

I'm netting 200M of reclaimed space after cleaning stuff up. My total postgres database for sentry is now 4.4G. All in that's not too bad for the logging it's doing. I've migrated my sentry data to a server with some more storage. This is a pet project for me right now. I've extended my lvm on the dev postgres server, but I do want to have a little more control over the storage utilization so I can reclaim space when needed.

Can someone help me understand what parameters I can use for expiring old events? It would be nice to auto-dump events older than 1 month or something similar. Perhaps I'm misunderstanding the log retention strategies.

Cheers,
jesse

marcin....@gmail.com

unread,
Feb 4, 2017, 9:31:53 AM2/4/17
to sentry, silvia...@gmail.com

What for is nodestore used? I've truncated 33G from this table and sentry seems to work ok. This looks like a sort of trash, a bag for everything. The Sentry Team should consider hiring some database architect.

marcin....@gmail.com

unread,
Feb 4, 2017, 9:45:37 AM2/4/17
to sentry, silvia...@gmail.com


On Saturday, February 4, 2017 at 3:31:53 PM UTC+1, marcin....@gmail.com wrote:

What for is nodestore used? I've truncated 33G from this table and sentry seems to work ok. This looks like a sort of trash, a bag for everything. The Sentry Team should consider hiring some database architect.


OMG it is really creepy thing. It is used for event data, even for tags. I see that whole sentry db model is quite amateurish and djangoish. 
Sorry for my language, but the db layer is most important thing of every db-based system.

The sentry-like system storage layer should be splitted into parts like this:
  • relational, for user/project administrative data,
  • timeseries, for catching up events together with data (with specified lifetime for required data retention policies)
  • search index, for relevant and fast searching of the events,
  • optional memory cache,
  • optional key-value store, for temporal data (if needed).
Current Sentry storage is too far dependent on Django, which is simple framework for making blogs or simple portals, but does not work well for specialized projects like Sentry.

Marcin   

David Cramer

unread,
Feb 4, 2017, 9:51:28 AM2/4/17
to gets...@googlegroups.com, marcin....@gmail.com, silvia...@gmail.com
Please remove yourself from this user group if you can’t show some respect. We know what we’re doing, and your advice is not wanted, nor even remotely correct.
--
You received this message because you are subscribed to the Google Groups "sentry" group.
To unsubscribe from this group and stop receiving emails from it, send an email to getsentry+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marcin Nowak

unread,
Feb 4, 2017, 10:05:13 AM2/4/17
to David Cramer, gets...@googlegroups.com, silvia...@gmail.com
On 4 February 2017 at 15:51, David Cramer <dcr...@gmail.com> wrote:
Please remove yourself from this user group if you can’t show some respect. We know what we’re doing, and your advice is not wanted, nor even remotely correct.


  1. Ok, I'll remove myself. I like to talk with persons who are open-minded and ready for criticism and suggestions.
  2. I said that Sentry model is not efficient and it's true. Your Cassandra backend (more precisely - the schema and column family) is also unefficient, because does not support easy way to cleaning up storage. Cassandra will not remove data physically from SSTables without manually compacting tons of GB, until you switch to timeserie based model and the DTCS strategy.
  3. "We know what we’re doing, " - You're making business and it is clear for me. I'm talking just about technical aspects.
  4.  "and your advice is not wanted" - you may improve your product or leave room for better ones. I wrote my 5cents because I see some caveats. You're not forced to read nor implement this.  

Marcin

Reply all
Reply to author
Forward
0 new messages