wu :: forums - Print Page


    
      
        wu :: forums
        (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
      

        riddles >> cs >> Compression and encryption
        
(Message started by: cuckoo on Sep 26^th, 2008, 12:29am)

Title: Compression and encryption
Post by cuckoo on Sep 26^th, 2008, 12:29am

Given a stream of bytes, how to detect whether it is a ciphertext or it is a compressed text?

It is easy to differentiate between ciphertext and natural language text, because the entropy of natural language is much lower than that of ciphertext.
However, it not easy to differentiate between ciphertext and compressed text, since both encryption algorithm and compress algorithm tend to transform data into a nearly total random sequence.

Any suggestion are welcome! :-*

Title: Re: Compression and encryption
Post by towr on Sep 26^th, 2008, 1:15am

I'd expect a cipher text to be more compressible.

In a sense compression and encryption are opposites; encryption increases the search space (so one text is indistinguishable from another), and compression decreases it (so with less information you can find your original text).

Of course, in reality. Things are a bit more complicated.

For one example file (using gpg to encrypt and bzip2 to compress)
Original text size: 184147
Encrypted size: 79051 (So apparently it got compressed at the same time).
Compressed size: 60352

Recompressing encrypted file: 79752
Recompressing compressed file: 60952
So both got bigger in both cases.

(for completeness, compressing followed by encryption: 60522)

Now, encryption of itself should not compress a message. So what does gpg do here? Does it compress the file, then encrypt it, or encrypt it first then compress it. Considering compression leaves certain patterns in the output (and one must assume the algorithm is known), they would aid in cracking the encryption. So compression has to happen after encryption.
Which means the fact the encrypted file gpg returns is much smaller, that compression after encryption does work (best as I can tell from this impure example).
But when confronted with a compressed encrypted file, and an compressed file of the same size, it is hard to tell which is which.

Title: Re: Compression and encryption
Post by SMQ on Sep 26^th, 2008, 5:17am

I'm afraid your intuition has failed you here, towr. A good encryption algorithm, like a good compression algorithm, should produce output which is as close to statistically random as is practically possible. This is because any redundancy remaining in the ciphertext must be due to redundancy in the original plaintext (or worse, in the key-derived material), and so leaks information. GPG compresses before encryption in part as a defense against exactly that sort of attack. I believe the -z 0 parameter will tell GPG not to precompress the text.

As to the original question, the only thing I can think of is to look for markers of common compression programs in the data, or even attempt to decompress it using a few common methods. Because there is still some structure remaining after compression, it's unlikely that truly random data will be successfully decompressible, so if the data is decompressed successfully there is a high probability that it had been compressed by the corresponding algorithm.

--SMQ

Title: Re: Compression and encryption
Post by towr on Sep 26^th, 2008, 6:32am

on 09/26/08 at 05:17:46, SMQ wrote:

That idea might not be as well-grounded as is generally assumed if I read http://www.iacr.org/archive/fse2002/23650264/23650264.ps correctly.
It adds little to no security in most cases, and reduces it significantly in a few.
So its practical advantage is only that you reduce the load on transmission.

And on the other hand, reversing the order of compression and encryption is also possible in some cases, http://www.eecs.berkeley.edu/~mjohnson/papers/tcc04.pdf