Author |
Topic: String Hashing (Read 2224 times) |
|
John_Gaughan
Uberpuzzler
Behold, the power of cheese!
Gender:
Posts: 767
|
|
String Hashing
« on: Jan 25th, 2004, 11:08pm » |
Quote Modify
|
This is slightly subjective and not very difficult -- what do you think is the best way to hash a string? Assume an ASCII string, without the extended character set (i.e. 7 bit). I'm interested in the general plan of attack, not "use MD5." I ask because in one of my classes this topic came up and some of the methods proposed just don't seem very good to me. By "best" I mean what method will produce good hashes and have decent (but not necessarily optimal) running time.
|
|
IP Logged |
x = (0x2B | ~0x2B) x == the_question
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: String Hashing
« Reply #1 on: Jan 26th, 2004, 1:16am » |
Quote Modify
|
I suppose you could take a long long int (64 bits), and fill it by adding characteri * primei .. I don't really know what it'll do, but I'm sure it's decent enough for some applications Or maybe long double hash = 1.0; while(character[i]) { srand(character[i]); hash*= rand(chararcter[i++]) } (supposedly since rand is reasonably uniformly distributed, hash will too) hmm.. there must be some information to be found on this somewhere..
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
John_Gaughan
Uberpuzzler
Behold, the power of cheese!
Gender:
Posts: 767
|
|
Re: String Hashing
« Reply #2 on: Jan 27th, 2004, 6:12am » |
Quote Modify
|
I think you have the right idea -- pull information from somewhere besides the contents of the string. Primes are a good direction to go in. I don't know about rand(), since this is not guaranteed to give the same numbers on different implementations given the same seed (i.e. C++ or Java, Windows or Linux, etc). Some of the ideas presented are to add up all the characters, ignoring carry/overflow bits; alternately multiply and divide characters (h = (a1a2)/a3...) doing integer division, and a couple others. Each one has weaknesses and it is possible to guess the original string. It does depend on the application. Doing a quick and computationally easy hash so you can hopefully find a shortcut when comparing two objects is one thing, and protecting /etc/passwd or /etc/shadow is something different. And doing checksums on files can be even trickier.
|
|
IP Logged |
x = (0x2B | ~0x2B) x == the_question
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: String Hashing
« Reply #3 on: Jan 27th, 2004, 7:40am » |
Quote Modify
|
If rand isn't deterministic given a seed, you can always try and use an algorithm for your own pseudo random number generator. In the end the hash should probably be as random-like as possible, as this will make it more usefull (for use in hashmaps for instance, it should be as uniform as possible modulo whatever prime you use as its size) I suppose it'd be best if the hash isn't the same for every permutation of the input string. So that means the calculation shouldn't be associative. In that sense adding or multiplying all characters (or functions of de seperate characters) is probably not a very good idea. Guessing the original string from the hash is in general quite difficult, since in most cases the strings can be any length, and hashes have a fixed length. Only if you know enough constraints can you guess the original. The problem with passwords is that they are generally short (6-10 characters), and most people like using words. So after encrypting all the words from a dictionary you can generally guess some of the passwords in a password-file.
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
John_Gaughan
Uberpuzzler
Behold, the power of cheese!
Gender:
Posts: 767
|
|
Re: String Hashing
« Reply #4 on: Jan 27th, 2004, 12:25pm » |
Quote Modify
|
on Jan 27th, 2004, 7:40am, towr wrote:If rand isn't deterministic given a seed, you can always try and use an algorithm for your own pseudo random number generator. |
| I don't think it's the algorithm so much as what prime numbers you use to turn the crank. Every random number generator I've seen is basically the one Knuth has in his book, rand = input * big prime (mod m). Quote:Only if you know enough constraints can you guess the original. |
| Not necessarily. Heuristics can give clues about the generating algorithm (bit patterns, for example), and combined with a dictionary attack on modern hardware, it is possible to crack a weak password using a weak hash in a few hours. I've seen it myself. I don't have the depth of computer science knowledge needed to do it myself, but I do understand how it is done. Again, a nice side benefit of previous security-related jobs. But I know what you're saying. I'm not disagreeing, I'm just adding my two cents.
|
|
IP Logged |
x = (0x2B | ~0x2B) x == the_question
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: String Hashing
« Reply #5 on: Jan 27th, 2004, 1:39pm » |
Quote Modify
|
on Jan 27th, 2004, 12:25pm, John_Gaughan wrote:Not necessarily. Heuristics can give clues about the generating algorithm (bit patterns, for example), and combined with a dictionary attack on modern hardware, it is possible to crack a weak password using a weak hash in a few hours. |
| That presupposes there is a weak password. If every password is a completely random string a dictionary attack won't do anything. And if the string is sufficiently larger than the hash the chance is nil you'll ever guess a password brute force (if the hashing algorithm is even half decent). Both the length, and being a word are constraints, as is 'easy to remember for people'. So given those you may be able to crack it, but without them I very much doubt it.. If you look at the unconstrained problem, things are much more difficult. I would be very interested to see anyone reconstuct a, oh let's say, 120 Mb program given only a 32 character hash. Or even a full 10-20 word sentence used as a password.. (A shame many programs only look at the first 8 characters..) Without constraints the chance is too slim you'll find a string that gives the same hash, and slimmer still it would be the original.
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
|