wu :: forums (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
riddles >> medium >> Real-space Hash Functions
(Message started by: RandomSam on Oct 4th, 2007, 4:39pm)

Title: Real-space Hash Functions
Post by RandomSam on Oct 4th, 2007, 4:39pm
A hash function is a many-to-one mapping that is s commonly used to reduce the amount of information you have to handle, so that large sets of data can quickly be compared.  Given an arbitrarily large integer, it will return a manageably small positive integer.  Inputting the same integer into the function always yields the same output, but inputting a slightly different integer (e.g. a consecutive number) will yield a completely different output.

My question: could a hash function be devised that had an infinitely large domain?  What about an uncountable domain,  e.g. accepting any real number as its input?  It would need to have a defined output but an undefined derivative at all points; is that possible?

(My apologies if I am posting this in a silly place - let me know if I should repost elsewhere.  I almost put it in the 0.999... thread as this would have applications there!)

Title: Re: Real-space Hash Functions
Post by Aryabhatta on Oct 4th, 2007, 4:59pm
So you want a function from Real numbers to integers such that it is one to one? (i.e. there are no collisions)

Title: Re: Real-space Hash Functions
Post by towr on Oct 5th, 2007, 12:38am

on 10/04/07 at 16:39:34, RandomSam wrote:
My question: could a hash function be devised that had an infinitely large domain?
Sure, take the set of integers modulo some number. Or if you really want adjacent number in the domain to end up further apart in the range, take pn mod q for some appropriate primes p,q. That should work fairly ok.


Quote:
What about an uncountable domain, e.g. accepting any real number as its input?
I think this might be a problem; I suspect that infinitely many intervals in the domain would have to end up with the same value if you want to map it to a finite set.


Quote:
It would need to have a defined output but an undefined derivative at all points; is that possible?
I think so, but that doesn't guarantee you'd get a proper hash.
You could think along the lines of http://mathworld.wolfram.com/DirichletFunction.html

Title: Re: Real-space Hash Functions
Post by RandomSam on Oct 5th, 2007, 10:32am

on 10/04/07 at 16:59:32, Aryabhatta wrote:
So you want a function from Real numbers to integers such that it is one to one? (i.e. there are no collisions)
All hash functions are many-to-one, so there must be collisions, but if the function is difficult to reverse then these collisions will take a long time to find.


on 10/05/07 at 00:38:20, towr wrote:
I suspect that infinitely many intervals in the domain would have to end up with the same value if you want to map it to a finite set.
I agree, but can the intervals each be infinitesimally narrow?


on 10/05/07 at 00:38:20, towr wrote:
You could think along the lines of http://mathworld.wolfram.com/DirichletFunction.html
The Dirichlet function separates the rationals from the irrationals.  Given an integer hash function, I can easily enough define a rational-to-integer hash function using e.g. h(b/a):h(h(a)+b).  But I'm not sure what to do with the irrational numbers: how can I be sure whether two reals are distinct if no function can distinguish between them?

Title: Re: Real-space Hash Functions
Post by towr on Oct 5th, 2007, 11:01am

on 10/05/07 at 10:32:48, RandomSam wrote:
But I'm not sure what to do with the irrational numbers: how can I be sure whether two reals are distinct if no function can distinguish between them?
If no function can distinguish between them, then they are not distinct.
Besides, between any two distinct irrational numbers, you can find a rational one. Along the lines of: given irrational a<b take s=floor(log2(b-a)) then floor(b/2s)2s should be rational, and between a and b.. maybe.

Title: Re: Real-space Hash Functions
Post by Aryabhatta on Oct 5th, 2007, 1:31pm

on 10/05/07 at 10:32:48, RandomSam wrote:
All hash functions are many-to-one, so there must be collisions, but if the function is difficult to reverse then these collisions will take a long time to find.


Well, if we take any function from Reals to Integers, there will be some integer to which uncountably many reals map (try proving it). It might be pretty hard to resolve the collision then...

Interesting question though, probably research-worthy.

Title: Re: Real-space Hash Functions
Post by SMQ on Oct 7th, 2007, 6:13am

on 10/05/07 at 13:31:46, Aryabhatta wrote:
(try proving it)

Informally: [hide]assume the opposite -- that no integer has an uncountable infinity of real values mapped to it.  But a countable infinity (the integers) of countable infinities (the reals mapped to a given integer) is countable, a contradiction.  Therefore the assumption is false.[/hide]

--SMQ

Title: Re: Real-space Hash Functions
Post by towr on Oct 7th, 2007, 7:17am

on 10/05/07 at 13:31:46, Aryabhatta wrote:
Well, if we take any function from Reals to Integers, there will be some integer to which uncountably many reals map (try proving it). It might be pretty hard to resolve the collision then...
You'd want the distribution of reals over the integers (or (finite) subset thereof) to be fairly uniform. So I'd say typically all integers would have a uncountable set of reals mapped to them.

If we take N integers (0..N-1), you could consider mapping a real R to floor(N*[R]), where [R] is the fractional part of R. The downside is that's it's a bit predictable. And it depends on the probability distribution of the reals you expect to deal with whether that's a problem.
Although obviously there will need to be some pattern to the mapping, preferably it'd be chaotic (which also implies that a small difference between two reals means they are very likely to end up further apart).
So perhaps something like floor(N*[PR]), for some P, would be more suitable (exponentiation and modulo are common tricks to promote chaotic behaviour in functions. It's also used in pseudo random generators)

Title: Re: Real-space Hash Functions
Post by RandomSam on Oct 7th, 2007, 11:05am

on 10/07/07 at 07:17:05, towr wrote:
...perhaps something like floor(N*[PR])...
The problem with this is the "sawtooth" effect of the [R] operator: its derivative is 1 almost everywhere, so it won't make the hash function difficult to reverse.  This attack relies on differentiation, so is not possible when only integers are allowed.

on 10/07/07 at 07:17:05, towr wrote:
...preferably it'd be chaotic...
A chaotic function sounds like a brilliant idea: fractals exist which have infinite intricacy.  Are there any fractals which are discontinuous everywhere, other than those that classify types of numbers?

Title: Re: Real-space Hash Functions
Post by towr on Oct 7th, 2007, 11:25am

on 10/07/07 at 11:05:20, RandomSam wrote:
The problem with this is the "sawtooth" effect of the [R] operator: its derivative is 1 almost everywhere
N*[PR] does not have a derivative of 1 nearly everywhere, far from it. If we take P=e, then the derivative is eR nearly everywhere, which is only 1 for R=0.


Quote:
so it won't make the hash function difficult to reverse.
Ok, suppose P is e, N=100, and the function evaluates to 50, what is R? Rather a lot of options, I'd say.

Title: Re: Real-space Hash Functions
Post by RandomSam on Oct 7th, 2007, 2:10pm

on 10/07/07 at 11:25:49, towr wrote:
Ok, suppose P is e, N=100, and the function evaluates to 50, what is R? Rather a lot of options, I'd say.
There have to be a lot of options: a hash function must be many-to-one.  The problem is that all those options can easily be calculated if we are in real-space, so the hash function can be reversed. In your case, the input can be ln(0.5), ln(1.5), ln(2.5)... whereas the discrete equivalent is non-trivial to reverse with modulo arithmetic, an important property for cryptography.

EDIT: I think I can prove that the hash function must be discontinuous everywhere.  If the hash function's domain is countable, then its derivative must be 0 at all points where the function is continuous, and the function can't distinguish between almost-identical numbers.  In other words, it has a finite-precision input.  We might as well round to the nearest <insert small number> and use a conventional hash function.

If the function's domain isn't countable, then it must output an infinite amount of data.  We can't store or compare that much data, so the function is of little use.  Any subsequent truncating of the data means we have a countable function, and so finite precision.

I think that proves that a hash function won't work unless it's discontinuous everywhere.

Title: Re: Real-space Hash Functions
Post by towr on Oct 7th, 2007, 2:35pm

on 10/07/07 at 14:10:01, RandomSam wrote:
There have to be a lot of options: a hash function must be many-to-one.  The problem is that all those options can easily be calculated if we are in real-space, so the hash function can be reversed.
If you can't bring it back to a single value, what's the problem?


Quote:
whereas the discrete equivalent is non-trivial to reverse with modulo arithmetic, an important property for cryptography.
Meh, then just round the darn real, and throw it in the discrete equivalent.

Title: Re: Real-space Hash Functions
Post by rmsgrey on Oct 8th, 2007, 10:47am
How precise is the real you're starting with anyway?

Title: Re: Real-space Hash Functions
Post by RandomSam on Oct 8th, 2007, 4:24pm

on 10/08/07 at 10:47:37, rmsgrey wrote:
How precise is the real you're starting with anyway?
Finite precision numbers are easy - just multiply by the threshold of accuracy, round to an integer and use a normal hash function.  

An exact expression that is known to be in its simplest form could be typed as ASCII text and hashed, although I'm not sure how to check if a simpler form exists: h("0.5") won't equal h("1/2").

What I can't seem to devise is a function with a real-valued domain and a countable number of output states, for which it is computationally difficult to discover collisions.

Title: Re: Real-space Hash Functions
Post by towr on Oct 9th, 2007, 1:01am

on 10/08/07 at 16:24:54, RandomSam wrote:
What I can't seem to devise is a function with a real-valued domain and a countable number of output states, for which it is computationally difficult to discover collisions.
How about using a polynomial of degrees greater than 4 as an intermediate step. It would in general be difficult to find roots for it.
(Besides, I don't think this problem is any less for mapping integers to integers if all parameters of the function are known)

Title: Re: Real-space Hash Functions
Post by RandomSam on Oct 13th, 2007, 11:40am
I've found a silly function whose output is only defined for irrational input.  

http://www.mr.ethz.ch/free_cgi-bin/formula.gif?expr=%5Cmbox%7Bsilly%7D%28x%29%20%3D%20%5Cmathop%7B%5Csum%7D_%7Bn%3D0%7D%5E%7B%5Cinfty%7D%7B%5Cfrac%7B2%5E%7B-n%7D%7D%7B%5Csin%5E%7B2%7D%28%5Cpi%7B%7Dn%21x%29%7D%7D&size=300

The ideal function seems to be non-trivial to reverse (e.g. I don't know how to find an x such that a < silly(x) < b, except by brute force) so if I round the output to a finite precision, I'll have an ideal hash function.

In practice it's not possible to do the infinite sum, and the approximation using the first few terms has a well-defined slope almost everywhere.  This means it's reversible to arbitrary accuracy using numerical methods, so if I round the output, collisions can easily be produced.

EDIT: Found another function with similar properties but a finite expectation, which should give a stronger guarantee of convergence, although I'm not exactly sure what that means in this context.

http://www.mr.ethz.ch/free_cgi-bin/formula.gif?expr=%5Cmbox%7Bsillier%7D%28x%29%20%3D%20%5Cmathop%7B%5Csum%7D_%7Bn%3D0%7D%5E%7B%5Cinfty%7D%7B%5Cfrac%7B2%5E%7B-n%7D%7D%7B%5Csqrt%7B%5Bn%21x%5D%7D%7D%7D&size=300

I think that's as far as I can go with this idea!



Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board