wu :: forums - Print Page


    
      
        wu :: forums
        (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
      

        riddles >> cs >> Efficient Datastructure
        
(Message started by: techforn10 on Jul 7^th, 2007, 10:00am)

Title: Efficient Datastructure
Post by techforn10 on Jul 7^th, 2007, 10:00am

Company A serves millions of requests per day. Give a efficient method to return two character string country code given IP address as input.

Title: Re: Efficient Datastructure
Post by towr on Jul 7^th, 2007, 11:16am

Well, presumedly the input and output is known in advance, so you can spend quite some power to find the most efficient mapping.
I'd be hardpressed to say a priori what that mapping will be.

Title: Re: Efficient Datastructure
Post by aki_scorpion on Jul 7^th, 2007, 11:28am

hmmm

If the input and output are known is advance and if they are limited why can we use 2 hash functions for every ip so that query will take only o(1) time.

Title: Re: Efficient Datastructure
Post by towr on Jul 7^th, 2007, 11:50am

The range of ip numbers is limited, that much is a given. But O(1) doesn't say much in itself; if each request takes a second it's too long, even if it's a constant amount of time.

We have some 2^32 IPv4 addresses, and some 200 countries.
Hmm, I wonder if there's a list somewhere, so we can actually do it, rather than only try to figure out an approach.

Title: Re: Efficient Datastructure
Post by Grimbal on Jul 7^th, 2007, 4:00pm

I think IP addresses are attributed to countries in ranges. Storing each IP address would be overkill. Maybe a Btree would be appropriate?

Title: Re: Efficient Datastructure
Post by techforn10 on Jul 8^th, 2007, 9:16am

How about using Trie??

Vague idea:
I was thinking of building a Trie of network part of IP addresses separated by 2 letter country code. So on query just traverse through the Trie matching network part of octets then just return the data in leaf node which contains country code.

Title: Re: Efficient Datastructure
Post by towr on Jul 8^th, 2007, 10:22am

The (free) geolite database from maxmind contains just under 100000 ip-range - country pairs. And boasts 98% accuracy. (Their commercial database has over 99% accuracy)
So, that's about 500 ranges per country.

A tree of some sort would seem an obvious solution. But I wonder if it can't be done with fewer costly hops through memory.

Title: Re: Efficient Datastructure
Post by mad on Mar 9^th, 2008, 3:44am

I am not very sure of this, but if it follows a classful addressing, can we use a mask to find the class the given address belongs to. Then, if it is a class A address, then search the class A tree only , if class B, search in its tree and so on. Not very sure of this approach.

But, using a trie to store it as digits from 0-9 rather than binary tree might be efficient.

Or can we store a tree of intervals and check if a given IP address falls in a given range/interval and output the corresponding country.

Title: Re: Efficient Datastructure
Post by Grimbal on Mar 12^th, 2008, 10:47am

on 03/09/08 at 03:44:18, mad wrote:

But, using a trie to store it as digits from 0-9 rather than binary tree might be efficient.

IP addresses being 4 bytes, you should rather go 256-way at each level, or maybe 16-way.

Title: Re: Efficient Datastructure
Post by Hippo on Mar 12^th, 2008, 1:18pm

on 07/07/07 at 11:28:37, aki_scorpion wrote:

so that query will take only o(1) time.

I should comment this ... there is a big difference between O(1) and o(1) ... There is no algorithm working in o(1) time. (At least you cannot call such algorithm.)