7zip Splitting Files

0 views

Skip to first unread message

Raelene

unread,

Aug 5, 2024, 8:54:05 AM8/5/24

to temenhowon

Hiall,

I am wanting to backup and compress around 60Gb of data (file, folders, sub folders) and split the output into 1.95Gb chunks, and am wondering if there is any way to verify and "guarantee" the compression and splitting of files was successful (without actually doing a test extraction of the files/folders), "and" if there was a particular method or a compression algorithm which will allow me to open each of the individual split files if one or more of the other split files were either missing or corrupted?

I have tested both 7z and Zip files so far and the extraction will not even begin if one of the files was missing and individual split files cannot be opened. I really don't want to loose access to all data if one of the split files was missing or corrupted.

As for your problem:

There are different possibilities:

1. use a wrapper (if you want to call it so) which would cluster the data into big chunks. Then use 7z for each of them individually. You won't lose everything when one file is lost.

2. use parity generating software afterwards. Lets say you put the 60GB into 5 strips with 1.95Gb of data. Now get that parity tool up and running and tell it to calc 20% overhead. This should result into a 6th file which you need to story more safely. If you miss one of the other files the 6th one can be used to restore the lost data strip file.

In general I would say there is maybe no such thing as perfect compression and perfect backup recovery possibility. You either lose compression rate or backup security.

Just try to come up with a nice trade-off and you will be set.

Oh - and for 'testing' the archive:

if you really want to test, you NEED to decompress!

You don't want some nifty test to tell you that everything is all right when it maybe isn't!

You don't need to store the files! Just decompress into some MD5 (or other hash) tool (which will create a hash and then drop the data in memory) and compare the hash to the one of all the other files.

I can split the files with the split command, but the owner of the disk has Windows, so I decided to generate a multipart 7zipped file from command line. As the original file is already compressed, I use no compression switch:

Hi! Thank you very much for this tutorial. I was curious if you knew a way to split a text file by lines instead of max size. We have txt file with a bunch of records on each line. When I split it by size, it cut some of the lines in half, which break the record and give us false-positives. Any info would be greatly appreciated!!

I want to make sure this entire bunch of data is as safe as possible (against hackers that would in some way get their hands on this data, and against Google and its employees, and also in the future, i.e. if I delete this data from Google I want to be sure they won't be able to 'open' it even if they keep its backup forever).

So in this case instead of uploading all this data right away to the cloud, I will instead make one folder containing all the data I want to upload, and then I will compress this entire folder using 7-Zip and of course password-protect it using 7-Zip.

I will do this not once, but a few times, i.e. once I have the 7-Zip password-protected archive ready, I will compress it once again using 7-Zip and use a completely different password. I will do this five times. So in the end my data is compressed five times and it has been password-protected using 7-Zip by five completely different unrelated passwords. So in order to get to my data I have to extract it five times and provide five different passwords.

What I will then do is that, I will take this five-times-password-protected archive, and I will compress it once again using 7-Zip and yet a different sixth password, but in addition to that this time I will also choose to split the archive into smaller chunks.

Now I take those nine 200 MB archives and put them in one container and encrypt the container using VeraCrypt (assuming the three level cascade encryption) and then upload this container to my Google Drive.

I keep the 10th archive (the 5 MB one) on a completely different service (say on Dropbox -- and that Dropbox account is in no way connected/linked to my Google account at all) (also encrypted by VeraCrypt).

- Have I created a security theater? Or have I really made it impossible for anyone to access and extract my data? After all they have to pass one level of encryption by VeraCrypt and even after that the archives are six times password protected and one of the archives (the tenth one) is stored somewhere else!

- If someone gets access to my Google Drive and downloads all those nine archives, is there any way for them to extract the archive without having the last (the tenth) 5 MB archive? Can the data in any way be accessed with one of the split-archives missing?

- Even if someone gets their hand on all those 10 archives together and manages to bypass the VeraCrypt encryption in any way, will it be still feasible to break the six remaining passwords?

The algorithm used by 7-Zip is AES-256 which is considered secure. But if someone would find a flaw in it which would make it breakable, then they would likely be able to break all your encryption layers with equal effort.

So either you trust the encryption algorithm used by 7-Zip, then one application would be good enough. Or you don't trust it, then you would do another encryption pass with a different algorithm. Layering the same algorithm multiple times often doesn't have as much effect as one would think, as the meet-in-the-middle attack on Triple-DES demonstrated.

Regarding splitting up an encrypted file: It is often possible to rescue some data from a 7-Zip archive if parts of the archive are missing. 7-Zip uses AES in CBC mode to emulate stream-cipher behavior (every 128-bit block is combined with the previous 128 bit block). That means if someone is missing a part of the message, they can't decrypt anything which follows (unless they have a known plaintext somewhere), but everything which comes before it. That means if you want to prevent an attacker from decrypting the archive by withholding a part of it, you need to withhold the first chunk, not the last one.

The encryption mechanism is either safe, or it is not. You will not be more protected by encrypting multiple times. If the encryption is secure, it's useless to add additional layers of encryption. If there is a flaw in your security model, for example, your encryption keys could leak, multiple layers of encryption will not fix this flaw.

You should look at the bigger picture: Where do you save your passphrase? Who can access it? What's your backup solution if you cannot access or remember you passphrase? Is your computer safe from malware (that would render any encryption useless)? Is your solution so complex that you will never use it?

Second, it sounds like you are in need of a threat model. A threat model describes what sort of adversaries you are worried about facing. Without a threat model, all security is security theater because you're basically flailing around wildly hoping that your uncontrolled actions stop an opponent you know nothing about. Are you trying to stop your sister from reading your diary, or are you trying to hide from the mob? Have you made enemies with any three letter agencies lately?

This is why a threat model is so essential. Consider this. It is generally accepted that an AES-256 encrypted file protected with a suitably long password is uncrackable by anything short of a government agency, and its generally assumed that the government agencies cannot crack it either. Thus a single layer of AES-256 will most certainly protect you from anything up to the sorts of shady characters who are willing to beat you with a rubber hose until you cough up the password. There's no reason to do anything more than one layer of AES-256 unless you have a threat model to back it up.

Now why do you think that splitting the file will help you? You're using some of the strongest encryption tools on the planet already. Why would withholding one section really help? Sure, if you're up against an attacker who knows of some currently undisclosed crack of AES-256 but doesn't know how to reverse engineer a 7zip split file format, that might help. So could tin foil.

I would not say that what you have is security theater. The first step is good: store the data in an encrypted format with a good strong password. The rest of the process, however, is security theater. Don't waste your time putting layers upon layers of encryption.

The worst case scenario is that your overzealous effort prevents you from accessing the data because you made it 10x more difficult to get to your own data. The likelihood of a mistakes in the process which renders your data inaccessible is great, and you gained no appreciable actual security for it.

I'm not as familiar with 7-Zip to fully understand its capabilities. However, assuming it is only a password-protected zip file, then no, this is not secure. However, I see some comments where 7-Zip may offer AES based encryption which would be secure (assuming that no bugs exist with the implementation of 7-Zip so it created a true and valid AES encrypted file).

In short, anything that can be downloaded and cracked offline should be assumed that it can be cracked with enough time. Assuming that your 7-Zip file is encrypted with AES-256 (again, no bugs) you would be at the mercy of the entropy of your passwords and the speed of the attacker's machine.

Instead of creating 5 or 6 levels of split archives, using a single archive of high-entropy password would be sufficient. Splitting the archive does nothing as you only have to be able to 'unlock' the first one to be able to decrypt the chain of archives.

Ultimately, the answer to your question is that is purely sufficient to use 7-Zip with AES-128 or 256 as long as you are using a truly random and secure password with as much entropy as possible. 64-characters of uppercase, lowercase, numbers and symbols with no discernible patterns or repetition is perfectly acceptable. Just don't write down the password on a sticky-note and leave it on your monitor. Use a real password manager to keep track of that 64-character password and make sure you use a nice secure master password on that password manager. You're only as secure as the least secure method -- sticking your password as a plain-text file on your desktop is not secure.