Author |
Topic: Industrial/Distributed Computing (Read 2884 times) |
|
k2xl
Newbie
Posts: 17
|
|
Industrial/Distributed Computing
« on: Feb 18th, 2008, 10:48am » |
Quote Modify
|
I'm a student at Georgia Tech working with distributed systems. This is a unique problem I came up with in my research. This problem came from me explaining the concept of distributed computing to some friends. I've done many experiments with it but have come up empty. I've shown various professors this problem but nobody has been able to generalize it. Let's say you want to calculate the sum of 10 numbers. Let's also say it takes you 1 second to add up two numbers together. Therefore: 10 Numbers with 1 calculator would take 9 seconds to calculate. Now let's say you give a second calculator to a friend. You give the second calculator 5 calculations to calculate and yourself 5. Working simultaneously, you would take 4 seconds and your friend 4 seconds. Then you could combine these two sums (which would take 1 second) to get the overall sum in a total of 5 seconds. Therefore: 10 Numbers to sum with 2 calculators would take 5 seconds. Now let's add a third calculator. Since 10 doesn't divide evenly by 3, one calculator will do more work than the other two. Calculator A: 3 sums Calculator B: 3 sums Calculator C: 4 sums Therefore: Calculator A: Takes 2 seconds (must wait 1 second) Calculator B: Takes 2 seconds (must wait 1 second) Calculator C: Takes 3 seconds Adding all the sums takes an additional 2 seconds. This means: 10 Numbers to sum with 3 calculators would take 5 seconds. So here is the problem. Using 3 calculators doesn't save you any time. It's cheaper to use 2 calculators. So given N numbers to sum, find the minimum C calculators to use to obtain the lowest T time. A quick formula that I came up with to get the minimum time is: (N/C - 1) + (N-1) = N/C + N - 2 Note: These calculators work independently. Meaning they can't pass along sums for the other calculators to calculate to save time. This means if one calculator has more work than others the others can't do more work while it finishes. Note 2: This means that the calculators do not communicate with each other. Assume that a central server sends data to nodes and waits for all of the data to be returned and then compiles all the data. If I didn't explain this well please post what needs more clarification. Thanks!
|
« Last Edit: Feb 19th, 2008, 5:21am by k2xl » |
IP Logged |
|
|
|
k2xl
Newbie
Posts: 17
|
|
Re: Industrial/Distributed Computing
« Reply #1 on: Feb 18th, 2008, 11:14am » |
Quote Modify
|
on Feb 18th, 2008, 11:13am, towr wrote: Yeah, I just noticed.. sorry about that; and I deleted it before I saw your response.. (albeit relatively long after I posted it) |
| Removed my post following it too. No problems
|
|
IP Logged |
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Industrial/Distributed Computing
« Reply #2 on: Feb 18th, 2008, 11:19am » |
Quote Modify
|
Ok, let's see if I can contribute anything useful.. As I understand it, we want to minimize N/C - 1 + (C-1), dependent on C So, pretending it's continuous we can take the derivative wrt C and see when it's zero. -N/C2+1=0 C2=N C=N That should be in the right neighbourhood.
|
« Last Edit: Feb 18th, 2008, 11:21am by towr » |
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
k2xl
Newbie
Posts: 17
|
|
Re: Industrial/Distributed Computing
« Reply #3 on: Feb 18th, 2008, 11:42am » |
Quote Modify
|
Unfortunately, Sqrt(N) is incorrect. Sqrt(N) is what I got when doing the derivative too. But look at 10. Sqrt(10) is around 3 (rounded up is 4). But as we just pointed, 2 calculators can find the minimum time (cheaper than using 3)
|
|
IP Logged |
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Industrial/Distributed Computing
« Reply #4 on: Feb 18th, 2008, 11:53am » |
Quote Modify
|
I only claimed it was in the right neighbourhood, not that it was in the right house Obviously the ceiling function throws a spanner in the works.
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Industrial/Distributed Computing
« Reply #5 on: Feb 18th, 2008, 12:20pm » |
Quote Modify
|
Quite the horrible little function.. If we ever figure it out, consider submitting it to http://www.research.att.com/~njas/sequences/index.html Here's a few values to work on.. (hopefully I haven't made any mistakes in generating it) ------------ 1 1 2 1 3 1 4 2 5 2 6 2 7 2 8 2 9 3 10 2 11 3 12 3 13 3 14 3 15 3 16 4 17 3 18 3 19 4 20 4 21 3 22 4 23 4 24 4 25 5 ------------ | ------------ 26 4 27 4 28 4 29 5 30 5 31 4 32 4 33 5 34 5 35 5 36 6 37 5 38 5 39 5 40 5 41 6 42 6 43 5 44 5 45 5 46 6 47 6 48 6 49 7 50 5 ------------ | ------------ 51 6 52 6 53 6 54 6 55 7 56 7 57 6 58 6 59 6 60 6 61 7 62 7 63 7 64 8 65 6 66 6 67 7 68 7 69 7 70 7 71 8 72 8 73 7 74 7 75 7 ------------ | ------------ 76 7 77 7 78 8 79 8 80 8 81 9 82 7 83 7 84 7 85 8 86 8 87 8 88 8 89 9 90 9 91 7 92 8 93 8 94 8 95 8 96 8 97 9 98 9 99 9 100 10 ------------ | ------------ 101 8 102 8 103 8 104 8 105 9 106 9 107 9 108 9 109 10 110 10 111 8 112 8 113 9 114 9 115 9 116 9 117 9 118 10 119 10 120 10 121 11 122 9 123 9 124 9 125 9 ------------ | ------------ 126 9 127 10 128 10 129 10 130 10 131 11 132 11 133 9 134 9 135 9 136 10 137 10 138 10 139 10 140 10 141 11 142 11 143 11 144 12 145 10 146 10 147 10 148 10 149 10 150 10 ------------ | ------------ 151 11 152 11 153 11 154 11 155 12 156 12 157 10 158 10 159 10 160 10 161 11 162 11 163 11 164 11 165 11 166 12 167 12 168 12 169 13 170 10 171 11 172 11 173 11 174 11 175 11 ------------ | ------------ 176 11 177 12 178 12 179 12 180 12 181 13 182 13 183 11 184 11 185 11 186 11 187 11 188 12 189 12 190 12 191 12 192 12 193 13 194 13 195 13 196 14 197 11 198 11 199 12 200 12 ------------ |
|
« Last Edit: Feb 18th, 2008, 12:20pm by towr » |
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
k2xl
Newbie
Posts: 17
|
|
Re: Industrial/Distributed Computing
« Reply #6 on: Feb 18th, 2008, 1:21pm » |
Quote Modify
|
What exactly is this sequence towr? Because the minimum increases or stays the same as N increases I have some spreadsheets, I can post them if anyone is interested. One thing I remember is that there is NEVER a case in which you should use 9 calculators... 100 uses 10, 101 uses 8! Isn't that interesting?
|
« Last Edit: Feb 18th, 2008, 1:23pm by k2xl » |
IP Logged |
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Industrial/Distributed Computing
« Reply #7 on: Feb 18th, 2008, 1:32pm » |
Quote Modify
|
on Feb 18th, 2008, 1:21pm, k2xl wrote:What exactly is this sequence towr? Because the minimum increases or stays the same as N increases |
| This is a sequence of what (smallest) C give the minimum time; that's what we want to find, isn't it? Quote:I have some spreadsheets, I can post them if anyone is interested. |
| It can't hurt. Quote:One thing I remember is that there is NEVER a case in which you should use 9 calculators... 100 uses 10, 101 uses 8! Isn't that interesting? |
| The first one where you should use 9 calculators is 81.. For every N=K2, you should use K calculators..
|
« Last Edit: Feb 18th, 2008, 1:34pm by towr » |
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
k2xl
Newbie
Posts: 17
|
|
Re: Industrial/Distributed Computing
« Reply #8 on: Feb 18th, 2008, 1:44pm » |
Quote Modify
|
Ah right. Btw, how did you generate your list?
|
|
IP Logged |
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Industrial/Distributed Computing
« Reply #9 on: Feb 18th, 2008, 1:50pm » |
Quote Modify
|
on Feb 18th, 2008, 1:44pm, k2xl wrote:Ah right. Btw, how did you generate your list? |
| brute force javascript Code: javascript: for(i=1; i<200; i++) { m1=200; m2=0; for(j=1; j<200; j++) { t = Math.ceil(i/j)+j-2; if(t < m1) { m1=t; m2=j; } } document.write(i+" "+m2+"<br>"); } |
| (If you put it on one line, it'll run in your browser, or if you have linux and firefox you can paste it as it is.)
|
« Last Edit: Feb 18th, 2008, 1:50pm by towr » |
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
k2xl
Newbie
Posts: 17
|
|
Re: Industrial/Distributed Computing
« Reply #10 on: Feb 18th, 2008, 1:57pm » |
Quote Modify
|
Awesome, okay. So now just to find a formula
|
|
IP Logged |
|
|
|
Icarus
wu::riddles Moderator Uberpuzzler
Boldly going where even angels fear to tread.
Gender:
Posts: 4863
|
|
Re: Industrial/Distributed Computing
« Reply #11 on: Feb 18th, 2008, 3:30pm » |
Quote Modify
|
Minimizing N/C - 1 + (C-1) might be tough enough, but I don't think this exactly describes the situation: Suppose that you have 200 numbers to add up and 100 calculators. You can distribute the 200 numbers, get your 100 results, but it would be ridiculous at this point to use a single calculator to add up the 100 numbers. Instead, you distribute the results to 50 calculators, then those results to 25, etc. So you should be minimizing N/C - 1 + log2 C I think.
|
|
IP Logged |
"Pi goes on and on and on ... And e is just as cursed. I wonder: Which is larger When their digits are reversed? " - Anonymous
|
|
|
Eigenray
wu::riddles Moderator Uberpuzzler
Gender:
Posts: 1948
|
|
Re: Industrial/Distributed Computing
« Reply #12 on: Feb 18th, 2008, 5:13pm » |
Quote Modify
|
It's not too hard to minimize just n/c + c: Let f(x) = n/x + x, F(x) = f(x). Since f(x) is minimized at x=n, it follows that the minimum value of F(x) is either F(n) or F(n). If n = k2, then it's easy to see F(k) = 2k, while F(k-1) > 2k, so C=k. If n is not a square, write n = k2 + r, 0 < r 2k. Then 2k < f(k+1) = (k2+r)/(k+1) + (k+1) 2k+2, so F(k+1) is 2k+1 when r k, and 2k+2 when r > k. Similarly 2k < f(k) = (k2+r)/k + k 2k+2, and so in fact F(k) = F(k+1). It follows that F(x) F(k) for all x, and the question is simply: what is the smallest x for which F(x) = F(k)? If r k, then F(x) = F(k) when f(x) = n/x + x 2k+1, or k+1/2 - {k-r+1/4} x k+1/2 + {k-r+1/4}, so the smallest such integer x is C = k+1/2 - {k-r+1/4}. Similarly, if r > k, then F(x) = F(k) when n/x + x 2k+2, and the smallest such integer x is C = k+1-{2k-r+1]. However, it looks like what you really want to do is recursively let F(n) = min{ n/c - 1+ F(c) : c < n }?
|
|
IP Logged |
|
|
|
k2xl
Newbie
Posts: 17
|
|
Re: Industrial/Distributed Computing
« Reply #13 on: Feb 18th, 2008, 7:15pm » |
Quote Modify
|
on Feb 18th, 2008, 3:30pm, Icarus wrote:Minimizing N/C - 1 + (C-1) might be tough enough, but I don't think this exactly describes the situation: Suppose that you have 200 numbers to add up and 100 calculators. You can distribute the 200 numbers, get your 100 results, but it would be ridiculous at this point to use a single calculator to add up the 100 numbers. Instead, you distribute the results to 50 calculators, then those results to 25, etc. So you should be minimizing N/C - 1 + log2 C I think. |
| Please see the note. The calculators are independent of each other. This system doesn't follow this scheme. What your suggesting is a classic distributed computing model. But many models do not function in this way. The problem is basically minimizing time and then number of calculators.
|
|
IP Logged |
|
|
|
Icarus
wu::riddles Moderator Uberpuzzler
Boldly going where even angels fear to tread.
Gender:
Posts: 4863
|
|
Re: Industrial/Distributed Computing
« Reply #14 on: Feb 18th, 2008, 10:54pm » |
Quote Modify
|
I don't follow this. If the calculators cannot pass their results, then they cannot cooperatively sum the numbers at all, because the result of one cannot be passed to another. So in the end you are left with independent calculators each with the sum of their allotted portion, and no means to collect the totals together to find the grand total. I suppose you can demand that only one calculator can sum up the individual totals, but the only reason I could see you doing it this way in the real world is to free up the other calculators for different tasks, while one finishes off this task (whose output is not needed by other processes until the sole remaining calculator has time to finish the job).
|
|
IP Logged |
"Pi goes on and on and on ... And e is just as cursed. I wonder: Which is larger When their digits are reversed? " - Anonymous
|
|
|
Eigenray
wu::riddles Moderator Uberpuzzler
Gender:
Posts: 1948
|
|
Re: Industrial/Distributed Computing
« Reply #15 on: Feb 19th, 2008, 12:47am » |
Quote Modify
|
on Feb 18th, 2008, 5:13pm, Eigenray wrote:However, it looks like what you really want to do is recursively let F(n) = min{ n/c - 1 + F(c) : c < n }? |
| Actually, this works out to F(n) = lg n, the same as Icarus's suggestion: n/c-1 + lg c is minimized at c=n/2 or c=n (among possibly other values, but the presence of two separate ceilings makes it a lot trickier to analyze).
|
|
IP Logged |
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Industrial/Distributed Computing
« Reply #16 on: Feb 19th, 2008, 1:16am » |
Quote Modify
|
on Feb 18th, 2008, 10:54pm, Icarus wrote:I suppose you can demand that only one calculator can sum up the individual totals, but the only reason I could see you doing it this way in the real world is to free up the other calculators for different tasks, while one finishes off this task (whose output is not needed by other processes until the sole remaining calculator has time to finish the job). |
| Typically, in distributed computing (afaik) you have one node distributing the task and collecting the results. Redistributing the intermediary results for further work is often not worth the bother (Of course this doesn't fit the current model either, because here distributing tasks seems to be free)
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
Grimbal
wu::riddles Moderator Uberpuzzler
Gender:
Posts: 7527
|
|
Re: Industrial/Distributed Computing
« Reply #17 on: Feb 19th, 2008, 1:20am » |
Quote Modify
|
on Feb 18th, 2008, 11:42am, k2xl wrote:Unfortunately, Sqrt(N) is incorrect. Sqrt(N) is what I got when doing the derivative too. But look at 10. Sqrt(10) is around 3 (rounded up is 4). But as we just pointed, 2 calculators can find the minimum time (cheaper than using 3) |
| I don't understand. With 3 calculators you need only 4 seconds, not 5. While the 3rd calculator does its 3rd sum, the first and 2nd calculator can combine their sums.
|
|
IP Logged |
|
|
|
k2xl
Newbie
Posts: 17
|
|
Re: Industrial/Distributed Computing
« Reply #18 on: Feb 19th, 2008, 5:20am » |
Quote Modify
|
on Feb 19th, 2008, 1:20am, Grimbal wrote: I don't understand. With 3 calculators you need only 4 seconds, not 5. While the 3rd calculator does its 3rd sum, the first and 2nd calculator can combine their sums. |
| Please see the note. This model is simple. Think of it like this: You are a central server. You distribute data to nodes. You wait until all the nodes return you results and then you compile the results.
|
|
IP Logged |
|
|
|
rmsgrey
Uberpuzzler
Gender:
Posts: 2873
|
|
Re: Industrial/Distributed Computing
« Reply #19 on: Feb 19th, 2008, 5:21am » |
Quote Modify
|
on Feb 19th, 2008, 1:20am, Grimbal wrote: I don't understand. With 3 calculators you need only 4 seconds, not 5. While the 3rd calculator does its 3rd sum, the first and 2nd calculator can combine their sums. |
| That's ruled out by a condition stated near the end of the original post - that all calculators need to complete their initial batch of calculations before the final summation can take place...
|
|
IP Logged |
|
|
|
Grimbal
wu::riddles Moderator Uberpuzzler
Gender:
Posts: 7527
|
|
Re: Industrial/Distributed Computing
« Reply #20 on: Feb 19th, 2008, 6:06am » |
Quote Modify
|
Sorry, it was indeed very clear in the initial post. Note to myself: always read the fine print at the end of the page.
|
|
IP Logged |
|
|
|
Icarus
wu::riddles Moderator Uberpuzzler
Boldly going where even angels fear to tread.
Gender:
Posts: 4863
|
|
Re: Industrial/Distributed Computing
« Reply #21 on: Feb 19th, 2008, 3:51pm » |
Quote Modify
|
on Feb 19th, 2008, 1:16am, towr wrote: Typically, in distributed computing (afaik) you have one node distributing the task and collecting the results. Redistributing the intermediary results for further work is often not worth the bother (Of course this doesn't fit the current model either, because here distributing tasks seems to be free) |
| That depends on how much work is left to do. It seems to me that many tasks require intermediary results be shared even though the bulk of the computing remains. Collecting and redistributing should be a common enough approach. Only when you are near the end of the calculation does it make sense to leave the rest to the controller. Certainly here, if C is large enough, significant time can be saved by redistributing. I assumed that the use of summing as the task was merely because it makes it easy to understand the process. In a more realistic application, the task involved would be considerably more complex, which is why by comparison, distributing tasks is effectively free (just like you don't consider adding and subtracting when counting flops). In that case, correlating a large number of sub-processor results can still take significant time if performed by only one processor. Unless there is a great need to free up the slave processors for other tasks, it makes sense to use as many as necessary to speed up the final calculation. But this is k2xl's question, so he controls what the task is. I'll give over my grouching.
|
|
IP Logged |
"Pi goes on and on and on ... And e is just as cursed. I wonder: Which is larger When their digits are reversed? " - Anonymous
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Industrial/Distributed Computing
« Reply #22 on: Feb 20th, 2008, 12:58am » |
Quote Modify
|
on Feb 19th, 2008, 3:51pm, Icarus wrote:I assumed that the use of summing as the task was merely because it makes it easy to understand the process. In a more realistic application, the task involved would be considerably more complex, which is why by comparison, distributing tasks is effectively free (just like you don't consider adding and subtracting when counting flops). |
| Well yes, it wouldn't make sense to distribute tasks unless it's effectively free (or at least cheaper than not distributing it). But the delegated tasks have to be pretty big for this to be the case, because communication is several orders of magnitude more costly than memory access. I suppose the biggest issue is whether the task remains the same size (e.g. weather simulation), or gets smaller over time (adding, or other 'data aggregation'(?)). In the latter case communication very quickly becomes more costly compared to the gain from distributing. Most of the examples I've dealt with are of the latter type, where you have to work through huge amounts of data and join the results together.
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
k2xl
Newbie
Posts: 17
|
|
Re: Industrial/Distributed Computing
« Reply #23 on: Feb 29th, 2008, 12:26pm » |
Quote Modify
|
Bumping to see if anyone wants to give another shot
|
|
IP Logged |
|
|
|
|