wu :: forums
« wu :: forums - Industrial/Distributed Computing »

Welcome, Guest. Please Login or Register.
Dec 3rd, 2024, 11:49am

RIDDLES SITE WRITE MATH! Home Home Help Help Search Search Members Members Login Login Register Register
   wu :: forums
   riddles
   hard
(Moderators: william wu, Icarus, Eigenray, ThudnBlunder, towr, Grimbal, SMQ)
   Industrial/Distributed Computing
« Previous topic | Next topic »
Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Industrial/Distributed Computing  (Read 2884 times)
k2xl
Newbie
*






  kiggyd12345   scratchfromstart
Email

Posts: 17
Industrial/Distributed Computing  
« on: Feb 18th, 2008, 10:48am »
Quote Quote Modify Modify

I'm a student at Georgia Tech working with distributed systems. This is a unique problem I came up with in my research.
 
 
This problem came from me explaining the concept of distributed computing to some friends. I've done many experiments with it but have come up empty. I've shown various professors this problem but nobody has been able to generalize it.
 
 
 
Let's say you want to calculate the sum of 10 numbers. Let's also say it takes you 1 second to add up two numbers together.
Therefore:
10 Numbers with 1 calculator would take 9 seconds to calculate.
 
Now let's say you give a second calculator to a friend. You give the second calculator 5 calculations to calculate and yourself 5. Working simultaneously, you would take 4 seconds and your friend 4 seconds. Then you could combine these two sums (which would take 1 second) to get the overall sum in a total of 5 seconds. Therefore:
 
10 Numbers to sum with 2 calculators would take 5 seconds.
 
 
Now let's add a third calculator. Since 10 doesn't divide evenly by 3, one calculator will do more work than the other two.
Calculator A: 3 sums
Calculator B: 3 sums
Calculator C: 4 sums
 
Therefore:
Calculator A: Takes 2 seconds (must wait 1 second)
Calculator B: Takes 2 seconds (must wait 1 second)
Calculator C: Takes 3 seconds
 
Adding all the sums takes an additional 2 seconds.  
This means:
 
10 Numbers to sum with 3 calculators would take 5 seconds.
 
 
 
So here is the problem. Using 3 calculators doesn't save you any time. It's cheaper to use 2 calculators.
 
So given N numbers to sum, find the minimum C calculators to use to obtain the lowest T time.
 
A quick formula that I came up with to get the minimum time is:  
(N/C - 1) + (N-1) = N/C + N - 2
 
 
Note: These calculators work independently. Meaning they can't pass along sums for the other calculators to calculate to save time. This means if one calculator has more work than others the others can't do more work while it finishes.
Note 2: This means that the calculators do not communicate with each other. Assume that a central server sends data to nodes and waits for all of the data to be returned and then compiles all the data.
 
If I didn't explain this well please post what needs more clarification. Thanks!
« Last Edit: Feb 19th, 2008, 5:21am by k2xl » IP Logged
k2xl
Newbie
*






  kiggyd12345   scratchfromstart
Email

Posts: 17
Re: Industrial/Distributed Computing  
« Reply #1 on: Feb 18th, 2008, 11:14am »
Quote Quote Modify Modify

on Feb 18th, 2008, 11:13am, towr wrote:

Yeah, I just noticed.. sorry about that; and I deleted it before I saw your response.. (albeit relatively long after I posted it)

 
Removed my post following it too. No problems Smiley
IP Logged
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Industrial/Distributed Computing  
« Reply #2 on: Feb 18th, 2008, 11:19am »
Quote Quote Modify Modify

Ok, let's see if I can contribute anything useful..  
As I understand it, we want to minimize N/C - 1 + (C-1), dependent on C
So, pretending it's continuous we can take the derivative wrt C and see when it's zero.
 
-N/C2+1=0
C2=N
C=N
 
That should be in the right neighbourhood.
« Last Edit: Feb 18th, 2008, 11:21am by towr » IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
k2xl
Newbie
*






  kiggyd12345   scratchfromstart
Email

Posts: 17
Re: Industrial/Distributed Computing  
« Reply #3 on: Feb 18th, 2008, 11:42am »
Quote Quote Modify Modify

Unfortunately, Sqrt(N) is incorrect.
 
Sqrt(N) is what I got when doing the derivative too.
 
But look at 10.
 
Sqrt(10) is around 3 (rounded up is 4).
 
But as we just pointed, 2 calculators can find the minimum time (cheaper than using 3)
IP Logged
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Industrial/Distributed Computing  
« Reply #4 on: Feb 18th, 2008, 11:53am »
Quote Quote Modify Modify

I only claimed it was in the right neighbourhood, not that it was in the right house Wink
Obviously the ceiling function throws a spanner in the works.
IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Industrial/Distributed Computing  
« Reply #5 on: Feb 18th, 2008, 12:20pm »
Quote Quote Modify Modify

Quite the horrible little function.. If we ever figure it out, consider submitting it to http://www.research.att.com/~njas/sequences/index.html
 
Here's a few values to work on.. (hopefully I haven't made any mistakes in generating it)

------------
1   1
2   1
3   1
4   2
5   2
6   2
7   2
8   2
9   3
10   2
11   3
12   3
13   3
14   3
15   3
16   4
17   3
18   3
19   4
20   4
21   3
22   4
23   4
24   4
25   5

------------

------------
26   4
27   4
28   4
29   5
30   5
31   4
32   4
33   5
34   5
35   5
36   6
37   5
38   5
39   5
40   5
41   6
42   6
43   5
44   5
45   5
46   6
47   6
48   6
49   7
50   5

------------

------------
51   6
52   6
53   6
54   6
55   7
56   7
57   6
58   6
59   6
60   6
61   7
62   7
63   7
64   8
65   6
66   6
67   7
68   7
69   7
70   7
71   8
72   8
73   7
74   7
75   7

------------

------------
76   7
77   7
78   8
79   8
80   8
81   9
82   7
83   7
84   7
85   8
86   8
87   8
88   8
89   9
90   9
91   7
92   8
93   8
94   8
95   8
96   8
97   9
98   9
99   9
100   10

------------

------------
101   8
102   8
103   8
104   8
105   9
106   9
107   9
108   9
109   10
110   10
111   8
112   8
113   9
114   9
115   9
116   9
117   9
118   10
119   10
120   10
121   11
122   9
123   9
124   9
125   9

------------

------------
126   9
127   10
128   10
129   10
130   10
131   11
132   11
133   9
134   9
135   9
136   10
137   10
138   10
139   10
140   10
141   11
142   11
143   11
144   12
145   10
146   10
147   10
148   10
149   10
150   10

------------

------------
151   11
152   11
153   11
154   11
155   12
156   12
157   10
158   10
159   10
160   10
161   11
162   11
163   11
164   11
165   11
166   12
167   12
168   12
169   13
170   10
171   11
172   11
173   11
174   11
175   11

------------

------------
176   11
177   12
178   12
179   12
180   12
181   13
182   13
183   11
184   11
185   11
186   11
187   11
188   12
189   12
190   12
191   12
192   12
193   13
194   13
195   13
196   14
197   11
198   11
199   12
200   12

------------
« Last Edit: Feb 18th, 2008, 12:20pm by towr » IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
k2xl
Newbie
*






  kiggyd12345   scratchfromstart
Email

Posts: 17
Re: Industrial/Distributed Computing  
« Reply #6 on: Feb 18th, 2008, 1:21pm »
Quote Quote Modify Modify

What exactly is this sequence towr? Because the minimum increases or stays the same as  N increases
 
I have some spreadsheets, I can post them if anyone is interested.
 
One thing I remember is that there is NEVER a case in which you should use 9 calculators... 100 uses 10, 101 uses 8! Isn't that interesting?
« Last Edit: Feb 18th, 2008, 1:23pm by k2xl » IP Logged
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Industrial/Distributed Computing  
« Reply #7 on: Feb 18th, 2008, 1:32pm »
Quote Quote Modify Modify

on Feb 18th, 2008, 1:21pm, k2xl wrote:
What exactly is this sequence towr? Because the minimum increases or stays the same as  N increases
This is a sequence of what (smallest) C give the minimum time; that's what we want to find, isn't it?
 
Quote:
I have some spreadsheets, I can post them if anyone is interested.
It can't hurt.
 
Quote:
One thing I remember is that there is NEVER a case in which you should use 9 calculators... 100 uses 10, 101 uses 8! Isn't that interesting?
The first one where you should use 9 calculators is 81..
For every N=K2, you should use K calculators..
« Last Edit: Feb 18th, 2008, 1:34pm by towr » IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
k2xl
Newbie
*






  kiggyd12345   scratchfromstart
Email

Posts: 17
Re: Industrial/Distributed Computing  
« Reply #8 on: Feb 18th, 2008, 1:44pm »
Quote Quote Modify Modify

Ah right.
Btw, how did you generate your list?
IP Logged
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Industrial/Distributed Computing  
« Reply #9 on: Feb 18th, 2008, 1:50pm »
Quote Quote Modify Modify

on Feb 18th, 2008, 1:44pm, k2xl wrote:
Ah right.
Btw, how did you generate your list?
brute force javascript
Code:

javascript:
for(i=1; i<200; i++)
{
m1=200;
m2=0;
for(j=1; j<200; j++)
{
  t = Math.ceil(i/j)+j-2;
  if(t < m1)
  {
    m1=t;
    m2=j;
  }
}
document.write(i+"  "+m2+"<br>");
}

(If you put it on one line, it'll run in your browser, or if you have linux and firefox  you can paste it as it is.)
« Last Edit: Feb 18th, 2008, 1:50pm by towr » IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
k2xl
Newbie
*






  kiggyd12345   scratchfromstart
Email

Posts: 17
Re: Industrial/Distributed Computing  
« Reply #10 on: Feb 18th, 2008, 1:57pm »
Quote Quote Modify Modify

Awesome, okay. So now just to find a formula Smiley
IP Logged
Icarus
wu::riddles Moderator
Uberpuzzler
*****



Boldly going where even angels fear to tread.

   


Gender: male
Posts: 4863
Re: Industrial/Distributed Computing  
« Reply #11 on: Feb 18th, 2008, 3:30pm »
Quote Quote Modify Modify

Minimizing N/C - 1 + (C-1) might be tough enough, but I don't think this exactly describes the situation: Suppose that you have 200 numbers to add up and 100 calculators. You can distribute the 200 numbers, get your 100 results, but it would be ridiculous at this point to use a single calculator to add up the 100 numbers. Instead, you distribute the results to 50 calculators, then those results to 25, etc. So you should be minimizing
 
 
N/C - 1 +  log2 C
 
I think.
IP Logged

"Pi goes on and on and on ...
And e is just as cursed.
I wonder: Which is larger
When their digits are reversed? " - Anonymous
Eigenray
wu::riddles Moderator
Uberpuzzler
*****






   


Gender: male
Posts: 1948
Re: Industrial/Distributed Computing  
« Reply #12 on: Feb 18th, 2008, 5:13pm »
Quote Quote Modify Modify

It's not too hard to minimize just n/c + c:
 
Let f(x) = n/x + x, F(x) = f(x).  Since f(x) is minimized at x=n, it follows that the minimum value of F(x) is either F(n) or F(n).  If n = k2, then it's easy to see F(k) = 2k, while F(k-1) > 2k, so C=k.
 
If n is not a square, write n = k2 + r,  0 < r 2k.  Then
 
2k < f(k+1) = (k2+r)/(k+1) + (k+1) 2k+2,
 
so F(k+1) is 2k+1 when r k, and 2k+2 when r > k.  Similarly
 
2k < f(k) = (k2+r)/k + k 2k+2,
 
and so in fact F(k) = F(k+1).  It follows that F(x) F(k) for all x, and the question is simply: what is the smallest x for which F(x) = F(k)?
 
If r k, then F(x) = F(k) when f(x) = n/x + x 2k+1, or
 
k+1/2 - {k-r+1/4} x k+1/2 + {k-r+1/4},
 
so the smallest such integer x is C = k+1/2 - {k-r+1/4}.
 
Similarly, if r > k, then F(x) = F(k) when n/x + x 2k+2, and the smallest such integer x is C = k+1-{2k-r+1].
 
However, it looks like what you really want to do is recursively let F(n) = min{ n/c - 1+ F(c) : c < n }?
IP Logged
k2xl
Newbie
*






  kiggyd12345   scratchfromstart
Email

Posts: 17
Re: Industrial/Distributed Computing  
« Reply #13 on: Feb 18th, 2008, 7:15pm »
Quote Quote Modify Modify

on Feb 18th, 2008, 3:30pm, Icarus wrote:
Minimizing N/C - 1 + (C-1) might be tough enough, but I don't think this exactly describes the situation: Suppose that you have 200 numbers to add up and 100 calculators. You can distribute the 200 numbers, get your 100 results, but it would be ridiculous at this point to use a single calculator to add up the 100 numbers. Instead, you distribute the results to 50 calculators, then those results to 25, etc. So you should be minimizing
 
 
N/C - 1 +  log2 C
 
I think.

 
Please see the note. The calculators are independent of each other. This system doesn't follow this scheme. What your suggesting is a classic distributed computing model. But many models do not function in this way.
 
The problem is basically minimizing time and then number of calculators.
IP Logged
Icarus
wu::riddles Moderator
Uberpuzzler
*****



Boldly going where even angels fear to tread.

   


Gender: male
Posts: 4863
Re: Industrial/Distributed Computing  
« Reply #14 on: Feb 18th, 2008, 10:54pm »
Quote Quote Modify Modify

I don't follow this. If the calculators cannot pass their results, then they cannot cooperatively sum the numbers at all, because the result of one cannot be passed to another. So in the end you are left with independent calculators each with the sum of their allotted portion, and no means to collect the totals together to find the grand total.
 
I suppose you can demand that only one calculator can sum up the individual totals, but the only reason I could see you doing it this way in the real world is to free up the other calculators for different tasks, while one finishes off this task (whose output is not needed by other processes until the sole remaining calculator has time to finish the job).
IP Logged

"Pi goes on and on and on ...
And e is just as cursed.
I wonder: Which is larger
When their digits are reversed? " - Anonymous
Eigenray
wu::riddles Moderator
Uberpuzzler
*****






   


Gender: male
Posts: 1948
Re: Industrial/Distributed Computing  
« Reply #15 on: Feb 19th, 2008, 12:47am »
Quote Quote Modify Modify

on Feb 18th, 2008, 5:13pm, Eigenray wrote:
However, it looks like what you really want to do is recursively let F(n) = min{ n/c - 1 + F(c) : c < n }?

Actually, this works out to F(n) = lg n, the same as Icarus's suggestion: n/c-1 + lg c is minimized at c=n/2 or c=n (among possibly other values, but the presence of two separate ceilings makes it a lot trickier to analyze).
IP Logged
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Industrial/Distributed Computing  
« Reply #16 on: Feb 19th, 2008, 1:16am »
Quote Quote Modify Modify

on Feb 18th, 2008, 10:54pm, Icarus wrote:
I suppose you can demand that only one calculator can sum up the individual totals, but the only reason I could see you doing it this way in the real world is to free up the other calculators for different tasks, while one finishes off this task (whose output is not needed by other processes until the sole remaining calculator has time to finish the job).
Typically, in distributed computing (afaik) you have one node distributing the task and collecting the results. Redistributing the intermediary results for further work is often not worth the bother (Of course this doesn't fit the current model either, because here distributing tasks seems to be free)
IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
Grimbal
wu::riddles Moderator
Uberpuzzler
*****






   


Gender: male
Posts: 7527
Re: Industrial/Distributed Computing  
« Reply #17 on: Feb 19th, 2008, 1:20am »
Quote Quote Modify Modify

on Feb 18th, 2008, 11:42am, k2xl wrote:
Unfortunately, Sqrt(N) is incorrect.
 
Sqrt(N) is what I got when doing the derivative too.
 
But look at 10.
 
Sqrt(10) is around 3 (rounded up is 4).
 
But as we just pointed, 2 calculators can find the minimum time (cheaper than using 3)

I don't understand.  With 3 calculators you need only 4 seconds, not 5.  While the 3rd calculator does its 3rd sum, the first and 2nd calculator can combine their sums.
IP Logged
k2xl
Newbie
*






  kiggyd12345   scratchfromstart
Email

Posts: 17
Re: Industrial/Distributed Computing  
« Reply #18 on: Feb 19th, 2008, 5:20am »
Quote Quote Modify Modify

on Feb 19th, 2008, 1:20am, Grimbal wrote:

I don't understand.  With 3 calculators you need only 4 seconds, not 5.  While the 3rd calculator does its 3rd sum, the first and 2nd calculator can combine their sums.

 
Please see the note.
 
 
This model is simple.
 
 
Think of it like this:
You are a central server. You distribute data to nodes. You wait until all the nodes return you results and then you compile the results.
IP Logged
rmsgrey
Uberpuzzler
*****





134688278 134688278   rmsgrey   rmsgrey


Gender: male
Posts: 2873
Re: Industrial/Distributed Computing  
« Reply #19 on: Feb 19th, 2008, 5:21am »
Quote Quote Modify Modify

on Feb 19th, 2008, 1:20am, Grimbal wrote:

I don't understand.  With 3 calculators you need only 4 seconds, not 5.  While the 3rd calculator does its 3rd sum, the first and 2nd calculator can combine their sums.

That's ruled out by a condition stated near the end of the original post - that all calculators need to complete their initial batch of calculations before the final summation can take place...
IP Logged
Grimbal
wu::riddles Moderator
Uberpuzzler
*****






   


Gender: male
Posts: 7527
Re: Industrial/Distributed Computing  
« Reply #20 on: Feb 19th, 2008, 6:06am »
Quote Quote Modify Modify

Sorry, it was indeed very clear in the initial post.
 
Note to myself: always read the fine print at the end of the page.
IP Logged
Icarus
wu::riddles Moderator
Uberpuzzler
*****



Boldly going where even angels fear to tread.

   


Gender: male
Posts: 4863
Re: Industrial/Distributed Computing  
« Reply #21 on: Feb 19th, 2008, 3:51pm »
Quote Quote Modify Modify

on Feb 19th, 2008, 1:16am, towr wrote:

Typically, in distributed computing (afaik) you have one node distributing the task and collecting the results. Redistributing the intermediary results for further work is often not worth the bother (Of course this doesn't fit the current model either, because here distributing tasks seems to be free)

 
That depends on how much work is left to do. It seems to me that many tasks require intermediary results be shared even though the bulk of the computing remains. Collecting and redistributing should be a common enough approach. Only when you are near the end of the calculation does it make sense to leave the rest to the controller. Certainly here, if C is large enough, significant time can be saved by redistributing.
 
I assumed that the use of summing as the task was merely because it makes it easy to understand the process. In a more realistic application, the task involved would be considerably more complex, which is why by comparison, distributing tasks is effectively free (just like you don't consider adding and subtracting when counting flops). In that case, correlating a large number of sub-processor results can still take significant time if performed by only one processor. Unless there is a great need to free up the slave processors for other tasks, it makes sense to use as many as necessary to speed up the final calculation.
 
But this is k2xl's question, so he controls what the task is. I'll give over my grouching.
IP Logged

"Pi goes on and on and on ...
And e is just as cursed.
I wonder: Which is larger
When their digits are reversed? " - Anonymous
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Industrial/Distributed Computing  
« Reply #22 on: Feb 20th, 2008, 12:58am »
Quote Quote Modify Modify

on Feb 19th, 2008, 3:51pm, Icarus wrote:
I assumed that the use of summing as the task was merely because it makes it easy to understand the process. In a more realistic application, the task involved would be considerably more complex, which is why by comparison, distributing tasks is effectively free (just like you don't consider adding and subtracting when counting flops).
Well yes, it wouldn't make sense to distribute tasks unless it's effectively free (or at least cheaper than not distributing it). But the delegated tasks have to be pretty big for this to be the case, because communication is several orders of magnitude more costly than memory access.
I suppose the biggest issue is whether the task remains the same size (e.g. weather simulation), or gets smaller over time (adding, or other 'data aggregation'(?)). In the latter case communication very quickly becomes more costly compared to the gain from distributing.
Most of the examples I've dealt with are of the latter type, where you have to work through huge amounts of data and join the results together.
IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
k2xl
Newbie
*






  kiggyd12345   scratchfromstart
Email

Posts: 17
Re: Industrial/Distributed Computing  
« Reply #23 on: Feb 29th, 2008, 12:26pm »
Quote Quote Modify Modify

Bumping to see if anyone wants to give another shot Smiley
IP Logged
k2xl
Newbie
*






  kiggyd12345   scratchfromstart
Email

Posts: 17
Re: Industrial/Distributed Computing   graph.JPG
« Reply #24 on: Feb 29th, 2008, 12:45pm »
Quote Quote Modify Modify

Btw, here's a spreadsheet (from data points 1-200 as posted in an earlier post)
 
http://www.mediafire.com/?nexjsg99ho4
 
 
IP Logged

Pages: 1 2  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board