Author |
Topic: Compression and encryption (Read 646 times) |
|
cuckoo
Junior Member
Gender:
Posts: 57
|
|
Compression and encryption
« on: Sep 26th, 2008, 12:29am » |
Quote Modify
|
Given a stream of bytes, how to detect whether it is a ciphertext or it is a compressed text? It is easy to differentiate between ciphertext and natural language text, because the entropy of natural language is much lower than that of ciphertext. However, it not easy to differentiate between ciphertext and compressed text, since both encryption algorithm and compress algorithm tend to transform data into a nearly total random sequence. Any suggestion are welcome!
|
|
IP Logged |
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Compression and encryption
« Reply #1 on: Sep 26th, 2008, 1:15am » |
Quote Modify
|
I'd expect a cipher text to be more compressible. In a sense compression and encryption are opposites; encryption increases the search space (so one text is indistinguishable from another), and compression decreases it (so with less information you can find your original text). Of course, in reality. Things are a bit more complicated. For one example file (using gpg to encrypt and bzip2 to compress) Original text size: 184147 Encrypted size: 79051 (So apparently it got compressed at the same time). Compressed size: 60352 Recompressing encrypted file: 79752 Recompressing compressed file: 60952 So both got bigger in both cases. (for completeness, compressing followed by encryption: 60522) Now, encryption of itself should not compress a message. So what does gpg do here? Does it compress the file, then encrypt it, or encrypt it first then compress it. Considering compression leaves certain patterns in the output (and one must assume the algorithm is known), they would aid in cracking the encryption. So compression has to happen after encryption. Which means the fact the encrypted file gpg returns is much smaller, that compression after encryption does work (best as I can tell from this impure example). But when confronted with a compressed encrypted file, and an compressed file of the same size, it is hard to tell which is which.
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
SMQ
wu::riddles Moderator Uberpuzzler
Gender:
Posts: 2084
|
|
Re: Compression and encryption
« Reply #2 on: Sep 26th, 2008, 5:17am » |
Quote Modify
|
I'm afraid your intuition has failed you here, towr. A good encryption algorithm, like a good compression algorithm, should produce output which is as close to statistically random as is practically possible. This is because any redundancy remaining in the ciphertext must be due to redundancy in the original plaintext (or worse, in the key-derived material), and so leaks information. GPG compresses before encryption in part as a defense against exactly that sort of attack. I believe the -z 0 parameter will tell GPG not to precompress the text. As to the original question, the only thing I can think of is to look for markers of common compression programs in the data, or even attempt to decompress it using a few common methods. Because there is still some structure remaining after compression, it's unlikely that truly random data will be successfully decompressible, so if the data is decompressed successfully there is a high probability that it had been compressed by the corresponding algorithm. --SMQ
|
|
IP Logged |
--SMQ
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Compression and encryption
« Reply #3 on: Sep 26th, 2008, 6:32am » |
Quote Modify
|
on Sep 26th, 2008, 5:17am, SMQ wrote:I'm afraid your intuition has failed you here, towr. A good encryption algorithm, like a good compression algorithm, should produce output which is as close to statistically random as is practically possible. This is because any redundancy remaining in the ciphertext must be due to redundancy in the original plaintext (or worse, in the key-derived material), and so leaks information. GPG compresses before encryption in part as a defense against exactly that sort of attack. |
| That idea might not be as well-grounded as is generally assumed if I read http://www.iacr.org/archive/fse2002/23650264/23650264.ps correctly. It adds little to no security in most cases, and reduces it significantly in a few. So its practical advantage is only that you reduce the load on transmission. And on the other hand, reversing the order of compression and encryption is also possible in some cases, http://www.eecs.berkeley.edu/~mjohnson/papers/tcc04.pdf
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
|