How to "clean" empty space in database files

59 views
Skip to first unread message

Dan Harkness

unread,
Jun 10, 2014, 11:46:37 PM6/10/14
to percona-d...@googlegroups.com
I am researching security options for a sensitive application. I am wondering if anyone is aware of any commands or tools to "wipe" (e.g. overwrite with 0s) free space in a xtradb database?

Ideally, I'd like to clean a running server (meaning both the underlying files and the memory contents), but stopping the server would be possible.

The use case here would be to eliminate all traces of a sensitive piece of data that mistakenly makes it into the database. The same tool question would apply for cleaning raid arrays as well.

Any input would be greatly appreciated.

Dan Harkness

unread,
Jul 2, 2014, 11:22:12 AM7/2/14
to percona-d...@googlegroups.com
Still hoping that someone may have some insight into this.

Here is my thinking so far:
1) Stop input sources to the database
2) Delete sensitive data from database (steps 1 & 2 could be switched)
3) Use mysqldump to perform logical dump of all db contents
4) Stop mysql
5) Securely erase all data files, logs, etc pertaining to mysql, plus all free space on the drive. Plus clean memory. (Various techniques should be available for both of these.)
6) Start mysql
7) Recreate "clean" db from the dump
8) Re-Enable input sources to the database

This seems like a technically sound solution, but it definitely results in significant downtime and is less than idea.

Can anyone a) suggest a better approach, and b) poke holes in the process?

Bill Karwin

unread,
Jul 2, 2014, 12:49:09 PM7/2/14
to percona-d...@googlegroups.com
I believe you’ve got the process right, if you need to clean sensitive data to that extent. You could have values hiding in the change buffer, undo buffer, double write buffer, and it would be hard to guarantee that you’ve overwritten every page of these unless you physically remove the ibdata1 file. If you remove that file, you'd have to dump and restore all InnoDB data as you describe (or temporarily convert InnoDB tables to MyISAM and then back after you’ve recreated a new tablespace).

One possible short-cut: 

1. Use file-per-table for all tables.
2. Stop input sources.
3. Perform a super clean shutdown (innodb_fast_shutdown=0) to force all change buffers and undo buffers to be processed.
4. Restart mysqld.
5. Use ALTER TABLE .. ENGINE=InnoDB ALGORITHM=COPY to force a table restructure, which will only copy current data (the latter option is 5.6 specific).
6. Dump metadata only (mysqldump --no-data).
7. Stop mysqld.
8. Remove ibdata1 and redo logs, but not the individual tables.
9. Move all .ibd files to somewhere safe (not the data directory, but may be on the same disk volume so mv is quick)
10. Restart mysqld.
11. Reimport the metadata you dumped, creating new empty tables.
12. Use ALTER TABLE .. DISCARD TABLESPACE for each table to remove the (empty) tablespaces.
13. Move your saved (full) .ibd files back into the data directory.
14. Reattach the tablespaces with ALTER TABLE .. IMPORT TABLESPACE.
15. Re-enable input sources.

Read more about the transportable tablespace usage here: http://dev.mysql.com/doc/refman/5.6/en/tablespace-copying.html

This should be a lot faster than the dump & restore of full data, but it’s kind of a meticulous process. You should write a script to automate as much of the steps as you can.

Caveat: I have not tested this, so I can’t guarantee that it scrubs your data in the way you need. You’ll have to test that.

--
Bill Karwin
Senior Knowledge Manager

--
You received this message because you are subscribed to the Google Groups "Percona Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to percona-discuss...@googlegroups.com.
To post to this group, send email to percona-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Harkness

unread,
Jul 2, 2014, 1:11:05 PM7/2/14
to percona-d...@googlegroups.com
Thanks for the feedback, Bill.
We are certainly still in the "research and design" phase with respect to the level of sanitation necessary and the process to be used. Scripting and testing would certainly be important.

Step 5 has piqued my interest. This isn't directly related to the cleaning question, but I'm curious if this would also have the effect (with files-per-table) of shrinking over-sized files (for example when large amounts of data have been dropped from a schema)?
We are currently running 5.5 but that may be a motivation to move to 5.6.

Thanks

Bill Karwin

unread,
Jul 2, 2014, 1:15:59 PM7/2/14
to percona-d...@googlegroups.com
Yes, ALTER TABLE .. ENGINE=InnoDB forces a table restructure. In the case of using file-per-table, this creates a new physical file. Then it copies only the current data rows, and rebuilds indexes from scratch. If there was a lot of wasted physical space in the .ibd file, the new file will be smaller and more compact. It’s a good way to reclaim space.

If you don’t use file-per-table, you don’t get any space back, because ibdata1 never shrinks. In fact, during the table restructure, ibdata1 expands to hold the new copy of the table. The pages containing the old version of the table are available for use by other data after the table restructure is finished, but they still occupy space.

--
Bill Karwin
Senior Knowledge Manager

娄帅

unread,
Jul 4, 2014, 5:21:10 AM7/4/14
to percona-d...@googlegroups.com
Hi, Dan Harkness,

First of all, I use innodb_file_per_table=1, then i check some tables with the following command:

SELECT DATA_LENGTH,DATA_FREE from information_schema.tables wheretable_schema='test' and table_name='t1';


Where the DATA_FREE stands for the free extends in the tablespace of the table, if the value is very large, it means there must be much empty space,
So i start to clean the table.
There are two methods to clean table:

1. ALTER TABLE t1 engine= INNODB;
But this will block access for the t1 when doing the ALTER.

2. Using pt-online-schema-change

This percona tool will not block access of t1, So i usually use the method 2.

Best wishes!


在 2014年6月11日星期三UTC+8上午11时46分37秒,Dan Harkness写道:
Reply all
Reply to author
Forward
0 new messages