wu :: forums (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
riddles >> cs >> Closest to the median
(Message started by: gotit on Aug 20th, 2007, 1:59am)

Title: Closest to the median
Post by gotit on Aug 20th, 2007, 1:59am
We are given an array containing n integers and a number k.We have to find the k numbers in the array that are closest to the median of the array. Can anyone suggest an algo with O(n) time complexity???

When i say closest,I mean closest by value, not by position.

Title: Re: Closest to the median
Post by towr on Aug 20th, 2007, 3:18am
I think to start you'd want to find the median, which can be done in O(n) (either stochastically, which is typically fastest, I think; or guaranteed linear using the median of medians approach)

For the next step you could e.g. think of a tree with k nodes (if k log k < n, then it's still linear); if there are less than k nodes in it, add the next node, if there are k nodes, replace the largest or smallest when appropriate by a new one.
Partial sorting is also an option.

Title: Re: Closest to the median
Post by TenaliRaman on Aug 20th, 2007, 4:27am

on 08/20/07 at 03:18:24, towr wrote:
Partial sorting is also an option.

Yeah, i would suggest doing quick sort on the median and then choosing only the halves which are closer the median everytime during sort.

-- AI

Title: Re: Closest to the median
Post by Skandh on Aug 20th, 2007, 7:09am

on 08/20/07 at 04:27:12, TenaliRaman wrote:
Yeah, i would suggest doing quick sort on the median and then choosing only the halves which are closer the median everytime during sort.

-- AI


By performing quick sort type of algo ( known as quick select, to find median) , we can find median, but what after that, because k closest numbers can be on both side of the median.
>:( >:( >:( >:(

Title: Re: Closest to the median
Post by towr on Aug 20th, 2007, 7:14am

on 08/20/07 at 07:09:30, Skandh wrote:
By performing quick sort type of algo ( known as quick select, to find median) , we can find median, but what after that, because k closest numbers can be on both side of the median.
You could get the k closest smaller than the median and the k closest larger; then somehow sort out these 2k numbers further.
(And if O(k log k) < O(n), you could just sort them without losing linearity of the overal algorithm. Once properly sorted it's easy enough to find the k closest ones.)

Title: Re: Closest to the median
Post by Grimbal on Aug 20th, 2007, 8:46am
Assuming n tends to infininty and k is constant,  sorting k elements is O(1).

Maybe you could find the n/2+k largest elements using some divide and conquer scheme that should be O(n) on average.  Then, get the 2k smallest elements from that.  Then, finding the median of the 2k central elements and identifying the k closest items to it is O(1).

Title: Re: Closest to the median
Post by KWillets on Aug 20th, 2007, 10:45am

on 08/20/07 at 07:09:30, Skandh wrote:
By performing quick sort type of algo ( known as quick select, to find median) , we can find median, but what after that, because k closest numbers can be on both side of the median.
>:( >:( >:( >:(

Do k-smallest on the absolute value of the difference from the median.

Title: Re: Closest to the median
Post by gotit on Aug 20th, 2007, 11:42am
Regarding towr's first solution of creating a k-node tree, I think he is considering of putting the actual elements of the array in the tree.What if I put the abs value of the difference of the elements and the median.In that case,we need only to compare the new elements with the largest in the tree and replace the largest with it if it is smaller.

Another question:How does the process of inserting in the k-node tree take O(klogk) time?I think it takes O(nlogk) time(in the worst case when all the n-k farthest elemets in the array occur before all the k closest elements).

Title: Re: Closest to the median
Post by towr on Aug 20th, 2007, 1:07pm

on 08/20/07 at 11:42:43, gotit wrote:
Regarding towr's first solution of creating a k-node tree, I think he is considering of putting the actual elements of the array in the tree.What if I put the abs value of the difference of the elements and the median.In that case,we need only to compare the new elements with the largest in the tree and replace the largest with it if it is smaller.
Yeah, that's better.
I really should have thought of using the absolute difference before KWillets mentioned it in the post before yours, that makes things so much simpler; and it should have been obvious.


Quote:
Another question:How does the process of inserting in the k-node tree take O(klogk) time? I think it takes O(nlogk) time(in the worst case when all the n-k farthest elements in the array occur before all the k closest elements).
Yup, right again; so I guess that was a really bad suggestion on my part :P
The obvious give away is that it should be at least O(n) because you have to consider every element in the array.

Title: Re: Closest to the median
Post by wangzhen on Sep 6th, 2007, 3:07am
My Method:
1st step: Delete the top (n-k)/2 smallest data.
2st step: Delete the top (n-k)/2 largest data.

Take the top (n-k)/2 smallest data for example:

Select a data (as pivot similar with quick sort does) randomly. Do the partition as quick sort does to let the left of pivot are all smaller than it, and the right are all larger than it. This operation are O(n).

Denote p as the current subscript (or postion) of the pivot.

Then, (p-1) means the number of the left, and
(n-p) means the number of the right.

If the (p-1)>(n-k)/2, Then Delete (n-k)/2 smallest in the left (p-1).  This is a process of recursion.
else if (p-1) == (n-k)/2, Then delete the left part and return.
else , Then Delete the left part and pivot, and Delete (n-k)/2 - p smallest in the right part. This is a process of recursion.

Calculate the complexity:(Which we define F(n))
F(n) = F(n/2) + n;

Then F(n) = F(n/4) + n/2 +n = ... = n+ n/2 +n/4 +...
We can know that the result is smaller than 2*n,  So complexity is O(n).


Similarly, We Delete the Largest (n-k)/2 in the remaining array, and eventually the final data are the answer.

Then the average complexity are O(n).
But as similar as Quick sort, this method have a complexity of O(n^2) in the most bad situation.

Title: Re: Closest to the median
Post by towr on Sep 6th, 2007, 5:26am

on 09/06/07 at 03:07:12, wangzhen wrote:
My Method:
1st step: Delete the top (n-k)/2 smallest data.
2st step: Delete the top (n-k)/2 largest data.
That might not leave the k elements closest to the median.

Consider for example the array [1,2,3,6,7] with k=3.
3 is the median, and {1,2,3} are the three closest elements; your method however would give {2,3,6}.

Title: Re: Closest to the median
Post by wangzhen on Sep 6th, 2007, 8:03pm
Thanks for towr's remind.
I have misunderstood it.

Then, I also have a new method:

suppose the array are a[].
Firstly, find the median in the array, I think this is O(n) time complexity.

Then, new an array b[] to store the values:
for(i=0; i<n; ++i)
   b[i] = abs(a[i]-median)

Then in the array b[i],
find the top k smallest (O(n)), and output them after adding median.

So, this method is also O(n).



on 09/06/07 at 05:26:09, towr wrote:
That might not leave the k elements closest to the median.

Consider for example the array [1,2,3,6,7] with k=3.
3 is the median, and {1,2,3} are the three closest elements; your method however would give {2,3,6}.


Title: Re: Closest to the median
Post by wangzhen on Sep 6th, 2007, 8:07pm
Sorry,
in the end,
not adding the median, because it is abs().

should remember the each b[i] 's origin and recover the data.



Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board