|
||
Title: efficient agent Post by howard roark on Jan 24th, 2009, 11:32am Suppose there is an agent which moves from location to location along an assembly line, where the locations are indexed by the integers 0, . . . ,N or 0, . . .infinity. The delay in moving from location i to location j is |i - j|. The agent is given a location to go to, does its work, then is assigned the next location (which may be the same), and so on. After it has completed its work, it can either stay there or return to location 0 before it is given the next location. Because there is significant time between assignments, there is no delay when returning to 0 between assignments. The probability of being assigned location i is pi, and each assignment is made independently from any previous assignments. Assume that the distribution is such that there is a finite expected delay when starting from location 0. A policy is a rule which tells the robot whether to stay at location i or return to location 0, for all i. An optimal policy is one which minimizes the expected delay at each location. Optimal policies need not be unique. 1)Suppose the probabilities are pi =((3/4)^i)/4 Give an optimal policy. 2)Show that for any probability distribution, there is an optimal policy of the form: the robot stays if it is at a location <= L, and returns to 0 if it is at a location > L, for some L >= 0. |
||
Title: Re: efficient agent Post by 1337b4k4 on Jan 28th, 2009, 12:25am From a position k, the expected distance from k Xk is E(Xkhttp://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gifi=0inf|k-i|pi which is http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gifi=0k(k-i)pi + http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gifi=k+1inf(i-k)pi. The difference between adjacent terms E(Xk+1) - E(Xk) = http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gif0kpi - http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gifk+1infpi = 1 - 2*http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/sum.gifk+1infpi. This value is monotonically increasing and starts out at -1+2p0 and approaches 1 as k goes to infinity. Thus, we see that E(Xk) is a value which will decrease monotonically while cdf(k) < .5, and increase monotonically afterward, with the difference between adjacent terms approaching 1. Thus we see although E(Xk) may become smaller than E(X0) for a while, but it will eventually start increasing and grow and stay larger than E(X0) at some point, x. So for those values of k less than x, it would be optimal to stay there, and for values greater than x, it would make sense to go back to 0. It seems that the only case in which multiple optimal solutions could occur is if p0 = 1/2. Then if there are k contiguous integers i starting at 1...k where pi = 0, there will be 2i+1 optimal solutions, as the difference between adjacent values of E(Xk) won't change from 0 to k+1, and an optimal strategy for every one of those integers could choose between staying or returning to 0. |
||
Powered by YaBB 1 Gold - SP 1.4! Forum software copyright © 2000-2004 Yet another Bulletin Board |