Guest guest Posted June 20, 2008 Report Share Posted June 20, 2008 For those of you that might be interested in the new study being conduted by Cornell University Dr. Shane here is his acedemic science paper. Some of you may night be interested in it, as it contains a huge amount of formulas, but you might enjoy the result section. This was sent to me by Dr. and it shows the complexity of unit placement. Just FYI Joby Berkley - Owner Jobo's Hamburger Grill 6548 Lake Worth Blvd., # 200 Lake Worth, Texas 76135 www.joboshamburgergrill.com Work Fax Cell joby@... Quote Link to comment Share on other sites More sharing options...
Guest guest Posted June 20, 2008 Report Share Posted June 20, 2008 Approximate Dynamic Programming for Ambulance RedeploymentJournal: INFORMS Journal on ComputingManuscript ID: draft Manuscript Type: Original Article Date Submitted by the Author: n/a Complete List of Authors: Restrepo, Mateo; Cornell University, C.A.M (Center for Applied Math); Cornell University, School of Operations Research and Information Engineering Topaloglu, Huseyin; Cornell University, School of Operations Research and Information Engineering , Shane; Cornell University, School of Operations Research and Information Engineering Keywords: 067 Dynamic programming-optimal control : Applications, 272 Health care, Ambulance service, 762 Simulation, Applications, 314 Information systems : Decision support systemshttp://joc.pubs.informs.orgINFORMS Journal on Computing: For Review OnlyApproximate Dynamic Programming for Ambulance RedeploymentMateo RestrepoCenter for Applied Mathematics Cornell University, Ithaca, NY 14853, USA mr324@... G. , Huseyin TopalogluSchool of Operations Research and Information Engineering Cornell University, Ithaca, NY 14853, USA{ April 30, 2008sgh9,ht88}@... present an approximate dynamic programming approach for making ambulance redeployment decisions in an emergency medical service system. The primary decision is where we should redeploy idle ambulances so as to maximize the number of calls reached within a given delay threshold. We begin by formulating this problem as a dynamic program. To deal with the high-dimensional and uncountable state space in the dynamic program, we construct approximations to the value functions that are parameterized by a small number of parameters. We tune the parameters of the value function approximations using simulated cost trajectories of the system. Computational experiments demonstrate the performance of the approach on emergency medical service systems in two metropolitan areas. We report practically significant improvements in performance relative to benchmark static policies.Page 1 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 601. IntroductionRising costs of medical equipment, increasing call volumes and worsening traffic conditions are putting emergency medical service (EMS) managers under pressure to meet performance goals set by regulators or contracts. Ambulance redeployment is one possible approach that can potentially help EMS managers. Ambulance redeployment, also known as relocation, move up, system-status management or dynamic repositioning, refers to any strategy by which a dispatcher repositions idle ambulances to compensate for others that are busy, and hence, unavailable. The increasing availability of geographic information systems and the increasing affordability of computing power have finally created ideal conditions for bringing real-time ambulance redeployment approaches to fruitful implementation. In this paper, we present an approximate dynamic programming (ADP) approach for making real-time ambulance redeployment decisions. We begin by formulating the ambulance redeployment problem as a dynamic program. This dynamic program involves a high-dimensional and uncountable state space and we address this difficulty by constructing approximations to the value functions that are parameterized by a small number of parameters. We tune the parameters of the value function approximations through an iterative and simulation-based method. Each iteration of the simulation-based method consists of two steps. In the first step, we simulate the trajectory of the greedy policy induced by the current value function approximation and collect cost trajectories of the system. In the second step, we tune the parameters of the value function approximation by solving a regression problem that fits the value function approximation to the collected cost trajectories. This yields a new set of parameters that characterize new value function approximations, and so, we can go back and repeat the same two steps. In this respect, the idea we use closely resembles the classical policy iteration algorithm in the Markov decision process literature. In particular, the first (second) step is analogous to the policy-evaluation (policy-improvement) step of the policy iteration algorithm. There are two streams of literature that are related to our work. The first one is the literature on ADP. A generic approach for ADP involves using value function approximations of the formP p fixed basis functions; see Bertsekas and Tsitsiklis (1996), (2007). There are a number of methods to tune the parameters p approximation to the value function. For example, temporal difference learning and Q-learning use stochastic approximation ideas in conjunction with simulated trajectories of the system to iteratively tune the parameters; see Sutton (1988), Watkins and Dayan (1992), Tsitsiklis (1994), Bertsekas and Tsitsiklis (1996), Tsitsiklis and Van Roy (1997), Si, Barto, , and Wunsch II (2004). On the other hand, the linear-programming approach for ADP finds a good set of values for the parameters by solving a large linear program whose decision variables are 2P=1 rp Ãp(·), where {rp : p = 1, . . . , P} are tuneable parameters and {Ãp(·) : p = 1, . . . , P} are{rp : p = 1, . . . , P} so that PP=1 rp Ãp(·) yields a goodPage 2 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60{ and Mersereau (2007). Both classes of approaches are aimed at tuning the parameters rp : p = 1, . . . , P}; see Schweitzer and Seidmann (1985), de Farias and Van Roy (2003), Adelman{rp p choice of the basis functions an art form, requiring substantial knowledge of the problem structure. Applications of ADP include inventory control (Van Roy, Bertsekas, Lee, and Tsitsiklis, 1997), inventory routing (Adelman, 2004), option pricing (Tsitsiklis and Van Roy, 2001), game playing (Yan, Diaconis, Rusmevichientong, and Van Roy, 2005; Farias and Van Roy, 2006), dynamic fleet management (Topaloglu and , 2006), and network revenue management (Adelman, 2007; Farias and Van Roy, 2007). The second stream of literature that is related to our work is the literature on ambulance redeployment. One class of models involves solving integer programs in real time whenever an ambulance redeployment decision needs to be made; see Kolesar and (1974), Gendreau, Laporte, and Semet (2001), Brotcorne, Laporte, and Semet (2003), Gendreau, Laporte, and Semet (2006), Nair and -Hooks (2006), s (2007). The objective function in these integer programs generally involves a combination of backup coverage for future calls and relocation cost of ambulances. These models are usually computationally intensive, since they require solving an optimization problem every time a decision is made. As a result, a parallel computing environment is often (but not always) used to implement a working real-time system. A second class of models is based on solving integer programs in a preparatory phase. This approach provides a lookup table describing, for each number of available ambulances, where those ambulances should be deployed. Dispatchers attempt to dispatch so as to keep the ambulance configuration close to the one suggested by the lookup table; see Ingolfsson (2006), Goldberg (2007). A third class of models attempts to capture the randomness in the system explicitly, either through a dynamic programming formulation or through heuristic approaches. Berman (1981a), Berman (1981b) and Berman (1981c) represent the first papers that provide a dynamic programming approach for the ambulance redeployment problem. However, these papers follow an exact dynamic programming formulation and, as is often the case, this formulation is tractable only in oversimplified versions of the problem with few vehicles and small transportation networks. Andersson (2005) and Andersson and Vaerband (2007) make the ambulance redeployment decision by using a “preparedness†function that essentially measures the capability of a certain ambulance configuration to cover future calls. The preparedness function is similar in spirit to the value function in a dynamic program, measuring the impact of current decisions on the future evolution of the system. However, the way the preparedness function is constructed is heuristic in nature. When compared with the three classes of models described above, our approach provides a number of advantages. In contrast to the models that are based on integer programs, our 3= 1, . . . , P} so that PP=1 rp Ãp(·) yields a good approximation to the value function. The{Ãp(·) : p = 1, . . . , P}, on the other hand, is regarded as more ofPage 3 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60approach captures the random evolution of the system over time since it is based on a dynamic programming formulation of the ambulance redeployment problem. Furthermore, the decisions made by our approach in real time can be computed very quickly as this requires solving a simple optimization problem that minimizes the sum of the immediate cost and the value function approximation. In lookup table approaches, there may be more than one way to redeploy the ambulances so that the ambulance configuration over the transportation network matches the configuration suggested by the lookup table. Therefore, table lookup approaches still leave some aspects of dispatch decisions to subjective interpretation by dispatchers. Our approach, on the other hand, can fully automate the decision-making process. In traditional dynamicprogramming approaches, one is usually limited to very small problem instances, whereas ADP can be used on problem instances with realistic dimensions. Our approach allows working with a variety of objective functions, such as the number of calls that are not served within a threshold time standard or the total response time for the calls. Furthermore, our approach allows the possibility of constraining the frequency and destinations of ambulance relocations. This is important since a relocation scheme should balance improvements in service levels with the additional demands imposed on ambulance crews. In summary, we make the following research contributions. 1) We develop a tractable ADP approach for the ambulance redeployment problem. Our approach employs value function approximations of the form p tune the parameters formulation of the problem, our approach is able to capture the random evolution of the system over time. 2) We develop a set of basis functions function approximations for the ambulance redeployment problem. This opens up the possibility of using other ADP approaches, such as temporal difference learning and the linear-programming approach. 3) We provide computational experiments on EMS systems in two metropolitan areas. Our results indicate that ADP has the potential to obtain high-quality redeployment policies in real systems. They also show that our approach compares favorably with benchmark policies that are similar to those used in practice. The remainder of this paper is organized as follows. In Section 2, we present a dynamic programing formulation for the ambulance redeployment problem. In Section 3, we describe our ADP approach. In Section 4, we discuss the basis functions that we use in our value function approximations. In Sections 5 and 6, we report computational results for two metropolitan areas. We conclude in Section 7 and point out some directions for further research. 4PP=1 rp Ãp(·) and uses sampled cost trajectories of the system to{rp : p = 1, .. . . , P}. Since it is based on the dynamic programming{Ãp(·) : p = 1, . . . , P} that yield good valuePage 4 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 602. Ambulance Redeployment as a Markov Decision ProcessIn this section, we present a dynamic programming formulation of the ambulance redeployment problem. As will shortly be clear, our model involves an uncountable state space. For an excellent account of the basic terminology, notation and fundamental results regarding dynamic programming in uncountable states spaces, we refer the reader to Bertsekas and Shreve (1978).2.1. State SpaceTwo main components in the state of an EMS system at a given time are the vectors ( and each waiting call in the system. To simplify the presentation, we assume that we do not keep more than This is not really a restriction from a practical perspective since of each ambulance is given by a tuple A =a1, . . . , aN) and C = (c1, . . . , cM) containing information about the state of each ambulanceM waiting calls in the system, possibly by diverting them to another EMS system.M can be quite large. The statea = (¾a, oa, da, ta), where ¾a is the status of the ambulance,o time of the ambulance movement. To serve a call, an ambulance first moves to the call scene and provides service at the scene for a certain amount of time. Following this, the ambulance possibly transports the patient to a hospital, and after spending some time at the hospital, the ambulance becomes free to serve another call. Therefore, the status of an ambulance “off shift,†“idle at base,†“going to call,†“at call scene,†“going to hospital,†“at hospital†or “returning to base.†If the ambulance is stationary at location ambulance is in motion, then a and da are respectively origin and destination locations of the ambulance and ta is the starting¾a can beo, then we have oa = da = o. If theta corresponds to the starting time of this movement. Otherwise,t status of the ambulance is “at hospital,†then arrived at the hospital. This time is kept in the state of an ambulance to be able to give a Markov formulation for the non-Markovian elements in the system, such as nonexponentially distributed service times and deterministic travel times. Similarly, for a call, we have where arrived into the system and “assigned to an ambulance†or “queued for service.†We model the dynamics of the system as an event-driven process. Events are triggered by changes in the status of the ambulances or by call arrivals. Therefore, the possible event types in the system are “call arrives,†“ambulance goes off shift,†“ambulance comes on shift,†“ambulance departs for call scene,†“ambulance arrives at call scene,†“ambulance leaves call scene,†“ambulance arrives at hospital,†“ambulance leaves hospital†and “ambulance arrives at base.†We assume that we can make decisions (for any ambulance, and not just the one involved in the event) only at the times of these events. This comes at the cost of some loss of 5a corresponds to the starting time of the current phase in the service cycle. For example, if theta corresponds to the time at which the ambulancec = (¾c, oc, tc, pc),¾c is the status of the call, oc is the location of the call, tc is the time at which the callpc is the priority level of the call. The status of a call ¾c can bePage 5 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60optimality, since making decisions between the times of the events may improve performance. From a practical perspective, however, events are frequent enough to allow ample decision-making opportunities. By restricting our attention to the times of events, we visualize the system as jumping from one event time to another. Therefore, we can use the tuple of the system, where and the state trajectory of the system can be written as the system just after the Throughout the paper, we use when the state of the system is the tuple s = (¿, e,A,C) to represent the state¿ corresponds to the current time, e corresponds to the current event type,A and C respectively correspond to the state of the ambulances and the calls. In this case,{sk : k = 1, 2, . . .}, where sk is the state ofkth event occurs. Note that the time is rolled into our state variable.¿ (s) and e(s) to respectively denote the time and the event types. In other words, ¿ (s) and e(s) are the first two components ofs = (¿, e,A,C).2.2. ControlsWe assume that calls are served in decreasing order of priority, and within a given priority level are served in first-in-first-out order. We further assume that the closest available ambulance is dispatched to a call. This is not an exact representation of reality, but is close enough for our purposes: We will show that ADP yields an effective redeployment strategy under this dispatch policy, and we expect that it will continue to do so for more realistic dispatch policies. Let the system is is reallocation decisions in the binary matrices feasible decisions is thenR(s) denote the set of ambulances that are available for redeployment when the state ofs. Let uab(s) = 1 if we redeploy ambulance a to base b when the state of the systems, and 0 otherwise. Letting B be the set of ambulance bases, we can capture the potentialu(s) = {uab(s) : a 2 R(s), b 2 B}. The set ofU(s) = nu(s) 2 {0, 1}|R(s)|×|B| :Xb2Buab(s) · 1 8 a 2 R(s)o.The constraints in this definition simply state that each ambulance that is considered for redeployment can be redeployed to at most one base. An important assumption is that the cardinality of by enumerating over all feasible solutions. We can use different definitions of example, all available ambulances are considered for relocation if ambulances that are either idle at base or returning to base. But this may lead to impractically many relocations, so we can restrict to an ambulance becoming free after serving a call, in which case we can take singleton set corresponding to the newly freed ambulance. Here the ambulance crews are “on the 6U(s) is small so that an optimization problem over this feasible set can be solvedR(s) to permit different amounts of relocation. ForR(s) denotes the set ofR(s), for example to ? unless the event e(s) correspondsR(s) equal to thePage 6 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60road†when considered for relocation so there is no additional disruption to the crews relative to the standard “return to base†policy.2.3. Fundamental DynamicsCall arrivals are generated across the region a known arrival intensity fixed policy for serving calls, but our general approach does not depend on the particular form of this policy. If there are no available ambulances to serve a call, then the call is placed into a waiting queue. An ambulance serving a call proceeds through a sequence of events, including arriving at the scene, treating the patient, transporting and handing over the patient at the hospital. The main source of uncertainty in this call service cycle are the times spent between events. We assume that probability distributions for all of the event durations are known. Besides these sources of randomness, the major ingredient governing the dynamics is the decision-making of dispatchers. As a result, the complete trajectory of the system is given byR ½ R2 according to a Poisson point process withC = {C(t, x, y) : t ¸ 0, (x, y) 2 R}. As mentioned above, we have a{(sk, uk) : k = 1, 2, . . ..}, where sk is the state of the system at the time of the kth event and ukis the decision (if any) made by the dispatcher when the state of the system is the dynamics of the system symbolically bysk. We capturesk+1 = f(sk, uk,Xk(sk, uk)),where of randomness described above. We do not attempt to give a more explicit description of the transition function to point out that the preceding discussion and the fact that we can simulate the evolution of the system over time imply that an appropriate Xk(sk, uk) is a random element of an appropriate space encapsulating all the sourcesf(·, ·, ·), since this does not add anything of value to our treatment. It sufficesf(·, ·, ·) exists.2.4. Transition CostsAlong with a transition from state In our particular implementation, we use the cost functionsk to sk+1 through decision uk, we incur a cost g(sk, uk, sk+1).g <> :(sk, uk, sk+1) =8>1 if the event the call in question is urgent and the response time exceeds ¢ 0 otherwisee(sk+1) corresponds to an ambulance arrival at a call scene,.(1) Here ¢ is a fixed threshold response time that is on the order of 10 minutes. Therefore, the cost function (1) allows us to count the number of high priority calls whose response times exceed a threshold response time. We are interested in the performance of the system over the finite planning horizon [0, T], so let g(s, ·, ·) = 0 whenever ¿ (s) > T. In our implementation, T7Page 7 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60corresponds to two weeks. We have chosen this particular cost function because of its simplicity, and also because most performance benchmarks in the EMS industry are formulated in terms of the percentage of calls that are reached within a time standard. Our approach allows other cost functions without affecting the general treatment.2.5. Objective Function and Optimality EquationA policy is a mapping from the state space to the action space, prescribing which action to take for each possible state of the system. Throughout the paper, we use prescribed by policy trajectory of the system k k k k k k and the discounted total expected cost incurred by starting from initial state μ(s) to denote the actionμ when the state of the system is s. If we follow policy μ, then the state{sμ: k = 1, 2, . . .} evolves according to sμ+1 = f(sμ, μ(sμ),Xk(sμ, μ(sμ)))s is given byJμ(s) = Eh 1 Xk=1® k k k k¿(sμ) g(sμ, μ(sμ), sμ+1) | sμ1 = si,where random variables k k state. It is well-known that the policy be found by computing the value function through the optimality equation® 2 [0, 1) is a fixed discount factor. The expectation in the expression above involves the{sμ: k = 1, 2, . . .} and ¿ (sμ) is the time at which the system visits the kthμ¤ that minimizes the discounted total expected cost canJ(s) = minu and letting The difficulty with the optimality equation (2) is that the state variable takes uncountably many values. Even if we are willing to discretize the state variable to obtain a countable state space, the state variable is still a high-dimensional vector and solving the optimality equation (2) through classical dynamic-programming approaches is computationally very difficult. In the next two sections, we propose a method to construct tractable approximations to the value function. The discounting in (2) may seem odd, as it implies that we are minimizing the expected2U(s)(Ehg(s, u, f(s, u,X(s, u))) + ®¿(f(s,u,X(s,u)))−¿(s) J(f(s, u,X(s, u)))i) (2)μ¤(s) be the minimizer of the right-hand side above; see Bertsekas and Shreve (1978).discounted computational device, in the sense that the discount factor is very helpful in stabilizing the ADP approach that we describe in the next two sections. The key issue is that the effect of relocation decisions can be small relative to the simulation noise, and discounting appears to mitigate this to some extent. This observation is in agreement with empirical observations from the case studies in Chapter 8 of Bertsekas and Tsitsiklis (1996). The increase in stability is also supported by the theoretical results in Chapter 6.2 of Bertsekas and Tsitsiklis (1996). When presenting our computational results in Sections 5 and 6, we report the of calls that are not reached within the threshold response time. These results indicate that 8number of calls that are not reached within the threshold time. This is purely aundiscounted numberPage 8 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60although we construct the value function approximations with a view towards minimizing the expected in minimizing the expected discounted cost, the same value function approximations provide very good performanceundiscounted cost.3. Approximate Dynamic ProgrammingThe ADP approach that we use to construct approximations to the value function is closely related to the traditional policy iteration algorithm in the Markov decision processes literature. We begin with a brief description of the policy iteration algorithm. Throughout the rest of the paper, the greedy policy induced by an arbitrary function ˆ decision in the set argminJ(·) refers to the policy that takes au whenever the state of the system is 2U(s) (Ehg(s, u, f(s, u,X(s, u))) + ®¿(f(s,u,X(s,u)))−¿(s) ˆ J(f(s, u,X(s, u)))i) (3)s.Policy IterationStep 1. Initialize the iteration counter Step 2. (Policy improvement) Let Step 3. (Policy evaluation) Let cost incurred when starting from state Step 4. Increase In our setting the cost function is finite, so Proposition 9.17 in Bertsekas and Shreve (1978) applies and hence pointwise to the solution to the optimality equation (2). The difficulty with the policy iteration algorithm is that when dealing with uncountable state spaces, it is not even feasible to store incurred by using policy of the formn to 1 and initialize J1(·) arbitrarily.μn be the greedy policy induced by Jn(·).Jn+1(·) = Jμn(·), where Jμn(s) denotes the expected discounteds and using policy μn.n by 1 and go to Step 2.g(·, ·, ·) is nonnegative and the set of feasible decisions U(s)Jn(·) convergesJμn(·), let alone compute the expected discounted costμn. We overcome this difficulty by using value function approximationsJ(s, r) =P Xp=1rp Ãp(s).In the expression above, 1 the parameters so that 9r = {rp : p = 1, . . . , P} are tuneable parameters and {Ãp(·) : p =, . . . , P} are fixed basis functions. The challenge is to construct the basis functions and tuneJ(·, r) is a good approximation to the solution to the optimality equationPage 9 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60(2). To achieve this, each basis function the solution to the optimality equation. In Section 4, we describe one set of basis functions that work well. Once a good set of basis functions is available, we can use the following approximate version of the policy iteration algorithm to tune the parameters Ãp(·) should capture some essential information about{rp : p = 1, . . . , P}.Approximate Policy IterationStep 1. Initialize the iteration counter n to 1 and initialize r1 = {r1p Step 2. (Policy improvement) Let Step 3. (Policy evaluation through simulation) Simulate the trajectory of policy planning horizon [0: p = 1, . . . , P} arbitrarily.μn be the greedy policy induced by J(·, rn).μn over the, T] for Q sample paths. Let {snk( trajectory of policy q) : k = 1, . . . ,K(q)} be the stateμn in sample path q and Gnk( starting from state q) be the discounted cost incurred bysnk( Step 4. (Projection) Letq) and following policy μn in sample path q.rn+1 = argminr2RP ( Q Xq=1K(q) Xk=1 £Gnk(q) − J(snk(q), r)¤2).Step 5. Increase In Step 3 of approximate policy iteration we use simulation to evaluate the expected discounted cost incurred by policy n by 1 and go to Step 2.μn. Therefore, {Gnk( the sampled cost trajectories of the system under policy that the value function approximation There is still one computational difficulty in the approximate policy iteration algorithm. When simulating the trajectory of policy of the form (3) to find the action taken by the greedy policy induced by problem involves an expectation that is difficult to compute. We resort to Monte Carlo simulation to overcome this difficulty. In particular, if the state of the system is action taken by the greedy policy induced by decisions in the feasible set trajectory of the system until the next event and this provides a sample of using multiple samples, we estimate the expectationq) : k = 1, . . . ,K(q), q = 1, . . . ,Q} areμn. In Step 4, we tune the parameters soJ(·, r) provides a good fit to the sampled cost trajectories.μn in Step 3, we need to solve an optimization problemJ(·, rn). This optimizations and we want to find theJ(·, rn) in this state, then we enumerate over allU(s). Starting from state s and taking decision u, we simulate thef(s, u,X(s, u)). ByEhg(s, u, f(s, u,X(s, u))) + ®¿(f(s,u,X(s,u)))−¿(s) J(f(s, u,X(s, u)), rn)ithrough a sample average. Once we estimate the expectation above for all the decision that yields the smallest value and use it as the decision taken by the greedy policy 10u 2 U(s), we choosePage 10 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60induced by sampling error, but it provides good performance in practice. Proposition 6.2 in Bertsekas and Tsitsiklis (1996) provides a performance guarantee for the approximate policy iteration algorithm. (The result is easily extended from finite to infinite state spaces.) This is an encouraging result as it provides theoretical support for the approximate policy iteration algorithm, but its conditions are difficult to verify in practice. In particular, the result assumes that we precisely know the error induced by using regression to estimate the discounted total expected cost of a policy, and it assumes that expectations are computed exactly rather than via sampling as in our case. For this reason, we do not go into the details of this result and refer the reader to Bertsekas and Tsitsiklis (1996) for further details.J(·, rn) when the state of the system is s. This approach is naturally subject to4. Basis FunctionsIn this section, we describe the basis functions function approximations. We use insights developed in Restrepo, , and Topaloglu (2007).{Ãp(·) : p = 1, . . . , P} that we use in our valueP = 6 basis functions, some of which are based on the queueing4.1. BaselineThe first basis function is of the form give a very rough estimate of the discounted number of missed calls between the time associated with state every 1 + Ã1(s) = 1 − ®T−¿(s). The role of this basis function is tos and the end of the planning horizon. In particular, if we assume that we miss a callµ time units, then the discounted number of missed calls over the interval [¿ (s), T] is®µ + ®2 µ + . . . + ®b T−¿(s)µ 1 c µ =− ®[b T−¿(s)µ c+1] µ1 − ® ¼1 − ®T−¿(s)1 ..− ®4.2. Unreachable CallsThe second basis function computes the number of waiting calls for which an ambulance assignment has been made, but the ambulance will not reach the call within the threshold response time. This is easily computed in our setting where travel times are deterministic. In the case of stochastic travel times, we could use the probability that the ambulance will reach the call within the threshold response time instead. We do not give a precise expression for this basis function, since the precise expression is cumbersome yet straightforward to define.4.3. Uncovered Call RateThe third basis function captures the rate of call arrivals that cannot be reached on time by any available ambulance. To define this precisely we need some additional notation. Recall that 11Page 11 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60calls arrive across the region R ½ R2 according to a Poisson point process with arrival intensityC and associate a representative point or “center of mass†( The coverage of subregion of mass ( between points ( indicator function, the coverage of subregion system as= {C(t, x, y) : t ¸ 0, (x, y) 2 R}. Partition the region R into L subregions {½l : l = 1, . . . , L},xl, yl) with each subregion l = 1, . . . , L.l is the number of available ambulances that can reach the centerxl, yl) within the threshold time standard. Let d(t, (x1, y1), (x2, y2)) be the travel timex1, y1) and (x2, y2) when travel starts at time t. Then, letting 1(·) be thel can be written as a function of the state of theNl(s) = X a2A(s)1(d(¿ (s), (xa(s), ya(s)), (xl, yl)) · ¢).Here, time A(s) is the set of available ambulances and (xa(s), ya(s)) is the location of ambulance a at¿ (s) when the system is in state s. The rate of call arrivals at time t in subregion l isCl(t) = Z½lC(t, x, y) dx dy.Therefore, when the state of the system is not covered by any available ambulance bys, we can compute the rate of call arrivals that areÃ3(s) =L Xl=1Cl(¿ (s)) 1(Nl(s) = 0).4.4. Missed Call RateThe previous 2 basis functions capture calls already received that we know we cannot reach on time, and the rate of arriving calls that cannot be reached on time because they are too far from any available ambulance. We could also fail to reach a call on time due to queueing effects from ambulances being busy with other calls. The fourth basis function represents an attempt to capture this effect. This basis function is of the formÃ4(s) =L Xl=1Cl(¿ (s)) Pl(s),where time are busy with other calls. We estimate processes in different subregions as Erlang-loss systems. In Erlang-loss systems, calls arriving when all servers are busy are lost. In reality such calls are queued and served as ambulances become free, but the time threshold is almost always missed for such calls, so counting them as lost seems reasonable. The issue that such calls impose some load on the true system but are discarded in an Erlang-loss system creates a slight mismatch, but our computational results show that this function is still highly effective as a basis function. 12Pl(s) is the probability that all ambulances that could reach a call in subregion l on{Pl(s) : l = 1, . . . , L} by treating the call servicePage 12 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60The Erlang-loss probability L(¸, μ, c) for a system with arrival rate ¸, service rate μ and cservers isL ((¸, μ, c) =¸/μ)c/c!P ic=0(¸/μ)i/i!.To characterize the Erlang-loss system for subregion we need to specify the number of servers, and the arrival and service rates. Let of available ambulances that can serve a call in subregion Thenl, given that the state of the system is s,Nl(s) be the setl within the threshold response time.Nl(s) = {a 2 A(s) : d(¿ (s), (xa(s), ya(s)), (xl, yl)) · ¢}.We use be the service rate in the loss system, i.e., the rate at which an ambulance can serve a call at time on the time spent at the scene of a call and any transfer time at a hospital, since travel times are usually small relative to these quantities. In our practical implementation, we use historical data to estimate the time spent at the call scenes and the hospitals and add a small padding factor to capture travel times. Finally, let ¤ by ambulances in the set devising a value for time c = |Nl(s)| as the number of servers in the Erlang loss system for subregion l. Let μl(t)t in subregion l. It is difficult to come up with a precise value for μl(t). It primarily dependsl(s) be the rate of call arrivals that should be servedNl(s). Coming up with a value for ¤l(s) is even more difficult thanμl(t). One option is to let ¤l(s) = Cl(¿ (s)), which is the rate of call arrivals at¿ (s) in subregion l. However, ambulances in the set Nl(s) serve not only calls in subregionl ¤, but also calls in other subregions. To attempt to capture this letl(s) = X a2Nl(s)L Xi=1C so that ¤ in the set i(¿ (s)) 1(d(¿ (s), (xa(s), ya(s)), (xi, yi)) · ¢), (4)l(s) reflects the total call arrival rate in subregions that are close to any of the ambulancesNl(s). We then use the approximationP There are at least 2 shortcomings in the estimate for ¤l(s) ¼ L(¤l(s), μl(¿ (s)), |Nl(s)|). (5)l(s) in (4) and the approximation toPl(s) in (5). First, there is double counting in the estimate of ¤l(s). In particular, if 2 ambulancesa ¤ subregions. To be more precise, if there are three subregions 1, a2 2 Nl(s) can both reach subregion ` within the time threshold, then the summation forl(s) counts C`(¿ (s)) twice. Second, C`(¿ (s)) could be counted in the demand rates for multiplel1, l2, ` and two ambulancesa then the summations for ¤1 2 Nl1(s), a2 2 Nl2(s) such that both a1 and a2 can reach subregion ` within the time threshold,l1(s) and ¤l2(s) both count C`(¿ (s)). Therefore, we typically haveP l l factor 13L=1 ¤l(s) > PL=1 Cl(¿ (s)). To overcome this problem, we dilute the call arrival rates by a·. In particular, we use the call arrival rate ·C`(¿ (s)) in (4) instead of C`(¿ (s)) soPage 13 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60that we may roughly have l l preliminary experimentation. Interestingly, as we will see in our computational results, it is not necessarily the case that the best choice of tuneable parameter in our ADP approach that requires experimentation.PL=1 ¤l(s) = PL=1 ·Cl(¿ (s)). We find a good value for · through· lies in (0, 1). We emphasize that · is the only4.5. Future Uncovered Call RateIn certain settings, we may not be willing to redeploy ambulances that are already in transit to a particular location. In this case, from the perspective of covering future calls, the ambulance destinations are as important as their current locations. This is the motivation underlying the fifth and sixth basis functions. Our fifth basis function parallels the third one, replacing the true locations of ambulances by their destinations. The definition of this basis function is therefore identical to that of of the ambulances that we use to compute future configuration that is obtained by letting all ambulances in transit reach their destinations and all stationary ambulances remain at their current locations. To be precise, given that the system is in state new state l 1 l this case, we can write the fifth basis function asÃ3, but the configurationNl(s) is not the current one, but rather, an estimateds = (¿, e,A,C) with A = (a1, . . . , aN) and ai = (¾ai , oai , dai , tai ), we define a~s(s) = (¿ + 1/PL=1 Cl(¿ ), e, ~ A,C) with ~A = (~a1, . . . ,~aN) and ~ai = (~¾ai , dai , dai , ¿ +/PL=1 Cl(¿ )), where ~¾ai is the status of ambulance ai when it reaches its destination dai . InÃ5(s) =L Xl=1Cl(¿ (~s(s))) 1(Nl(~s(s)) = 0).The time l next call arrival. The next call may arrive before or after the ambulances actually reach their destinations, but we heuristically use the time that the estimated future configuration of the ambulances time ¿ (~s(s)) = ¿ + 1/PL=1 Cl(¿ ) is used as an approximation to the expected time of the¿ (~s(s)) simply to look into the future. The idea is~A is more likely to hold at the future¿ (~s(s)) than at the current time ¿ (s).4.6. Future Missed Call RateThe sixth basis function, lost due to queueing congestion. As with the fifth basis function, it uses an estimated future configuration of the ambulances. It is defined asÃ6, parallels Ã4 in that it captures the rate of calls that will likely beÃ6(s) =L Xl=1Cl(¿ (~s(s))) Pl(~s(s)).14Page 14 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 605. Computational Results on EdmontonIn this section, we present computational results for the EMS system in the city of Edmonton, Alberta in Canada. We begin with a description of the data set along with our assumptions.5.1. Experimental SetupOur data are based on the data set used in the computational experiments in Ingolfsson, Erkut, and Budge (2003). The city of Edmonton has a population of 730,000 and covers an area of around 40 assume for simplicity that all ambulances are of the same type and operate all day. An ambulance, upon arriving at a call scene, treats the patient for an exponentially distributed amount of time with mean 12 minutes. After treating the patient at the call scene, the ambulance transports the patient to a hospital with probability 0.75. The probability distribution for the hospital chosen is inferred from historical data. The time an ambulance spends at the hospital has a Weibull distribution with mean 30 minutes and standard deviation 13 minutes. The turn-out time for the ambulance crews is 45 seconds. In other words, if the ambulance crew is located at a base when it is notified of a call, then it takes 45 seconds to get ready. An ambulance crew already on the road does not incur the turn-out time. The road network that we use in our computational experiments models the actual road network on the avenue level. There are 252 nodes and 934 arcs in this road network. The travel times are deterministic and do not depend on the time of the day. There was not enough historical data to develop a detailed model of the call arrivals so we proceeded with a revealing, but not necessarily realistic, model. The model maintains a constant overall arrival rate, but the distribution of the location of calls changes in time. We divided the city of Edmonton into 20× 30 km2. The EMS system includes 16 ambulances, 11 bases and 5 hospitals. We×17 subregions and assumed that the rate of call arrivals in subregionl at time t is given byCl(t) = C£°l + ±l sin(2¼t/24)¤,where satisfy t is measured in hours. In the expression above, C, °l and ±l are fixed parameters thatC ¸ 0, °l ¸ |±l|, P340l=1 °l = 1 and P340l=1 ±l = 0. We have P340l=1 °l +P340l so that we can interpret as the probability that a call arriving at time arrival rate in subregion arrival rate in subregion day in subregion 7 calls per hour. We chose appropriate values of the business subregions early in the day and higher call arrival rates in the residential subregions later in the day. 15=1 ±l sin(2¼t/24) = 1C as the total call arrival rate into the system and °l + ±l sin(2¼t/24)t falls in subregion l. If ±l > 0, then the peak calll occurs at hours {6, 30, 54, . . .}, whereas if ±l < 0, then the peak calll occurs at hours {18, 42, 66, . . .}. The average call arrival rate over al is C°l. We estimated C and °l using historical data and C ranged from 3 to±l so that we have higher call arrival rates inPage 15 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60We implemented ADP on top of the BartSim simulation engine developed by and Mason (2004). In all of our computational experiments, we use a discount factor of day. The simulation horizon is 14 days. We use 30 sample paths in Step 3 of the approximate policy iteration algorithm. After some experimentation, we found that sixth basis functions gave the best results. We use the static redeployment policy as a benchmark. In particular, the static redeployment policy preassigns a base to each ambulance and redeploys each ambulance back to its preassigned base whenever it becomes free after serving a call. We found a good static redeployment policy by simulating the performance of the system under a large number of possible base assignments and choosing the base assignment that gave the best performance.® = 0.85 per· = 0.1 in the fourth and5.2. Baseline PerformanceIn our first set of computational experiments, we test the performance of ADP under the assumption that the only time we can redeploy an ambulance is when it becomes free after serving a call. We do not redeploy ambulances at other times. Following the discussion of Section 2.2, this is equivalent to setting free after serving a call, in which case just becoming free. The motivation for using this redeployment strategy is that it minimizes the disturbance to the crews and never redeploys an ambulance that is located at an ambulance base. Figure 1 shows the performance of ADP. The horizontal axis gives the iteration number in the approximate policy iteration algorithm and the vertical axis gives the percentage of calls not reached within the threshold response time. The plot gives the percentage of calls missed by the greedy policy induced by the value function approximation at a particular iteration. The dashed horizontal line shows the best performance attained by ADP, while the thin horizontal line shows the performance of the best static policy. ADP attains a good policy within two iterations and then bounces around this policy. The best ADP policy misses 21.9% ( the calls, whereas the best static redeployment policy misses 25.3% ( figures are undiscounted numbers of missed calls.) The ADP approach does not converge on a single policy. This lack of convergence is not a concern from a practical viewpoint since we can simply choose the best policy that is obtained during the course of the approximate policy iteration algorithm and implement this policy in practice. To emphasize the importance of choosing the basis functions carefully, Figure 2 shows the performance of ADP when we use an arrival rate in the fourth and sixth basis functions. The best policy obtained through ADP misses 23.6% 16R(s) = ?, except when e(s) corresponds to an ambulance becomingR(s) is the singleton set corresponding to the ambulance± 0.2%) of±0.2%) of the calls. (These¸ = Cl(¿ (s)) instead of the quantity in (4)Page 16 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 605 10 15 20 25 21 21.5 22 22.5 23 23.5 24 24.5 25 25.5 26 Average Number of Lost Calls (% total Calls) Iteration NumberFigure 1: Performance of ADP and the benchmark policy. ( calls in Figure 1. Furthermore, the fluctuation in the performance of the policies in Figure 2 is much more drastic when compared with Figure 1. The results indicate that choosing the basis functions carefully makes a significant difference in the performance of ADP.± 0.2%) of the calls. This is in contrast with the best policy missing 21.9% (± 0.2%) of the5.3. Comparison with Random SearchFor a fixed set of basis functions {Ãp(·) : p = 1, . . . , P}, a set of values for the tuneable parametersr approximation induces a greedy policy. Therefore, a brute-force approach for finding a good set of values for the tuneable parameters is to carry out a random search over an appropriate subset of sets of values for the tuneable parameters. To implement this idea, we first use ADP to obtain a good set of values for the tuneable parameters. Letting = {rp : p = 1, . . . , P} characterize a value function approximation J(·, r) and this value functionRP and use simulation to test the performance of the greedy policies induced by the different{ˆrp : p = 1, . . . , P} be this set of values, we sample r = {rp : p = 1, . . . , P}uniformly over the box 2 2 2 2 the performance of the greedy policy induced by the value function approximation sampled 1,100 sets of values for the tuneable parameters and this provides 1,100 value function approximations. Figure 3 gives a histogram for the percentage of the calls missed by the greedy policies induced by these 1,100 value function approximations. The vertical lines correspond to the percentage of calls missed by the best policy obtained by ADP and the best static redeployment policy. The figure indicates that very few (3%) of the sampled sets of values for the tuneable parameters provide better performance than the best policy obtained by ADP. 17£ˆr1− 1ˆr1, ˆr1+ 1ˆr1¤×. . .×£ˆrP − 1ˆrP , ˆrP + 1ˆrP ¤and use simulation to testJ(·, r). WePage 17 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 605 10 15 20 25 23 24 25 26 27 28 29 30 31 32 33 34 Average Number of Lost Calls (% total Calls) Iteration NumberFigure 2: Performance of ADP with poorly chosen basis functions. On the other hand, approximately 28% of the samples provide better performance than the best static redeployment policy. The random search procedure we use is admittedly rudimentary and one can use more sophisticated techniques to focus on the more promising areas of the search space. Nevertheless, our results indicate that when one looks at the broad landscape of the possible values for the tuneable parameters, ADP is quite effective in identifying good parameters. Moreover, the computation time required by the random search procedure is on the order of several days, whereas the computation time required by ADP is a few hours.5.4. Making Additional RedeploymentsThe computational experiments in Section 5.2 and 5.3 allow redeployments only when an ambulance becomes free after serving a call. We now explore the possibility of improving performance by allowing additional ambulance redeployments. Define an additional event type “consider redeployment†and schedule an event of this type with a certain frequency that is detailed below. Whenever an event of this type is triggered, we consider redeploying the ambulances that are either at a base or returning to a base, so that ambulances at such times. The set an ambulance becoming free after serving a call, and at all other events, We use two methods to vary the redeployment frequency. In the first method, we equally space consider-redeployment events to obtain frequencies between 0 and 15 per hour. In the second method, the frequency of consider-redeployment events is fixed at 24 per hour, but we make a redeployment only when the estimated benefit from making the redeployment exceeds 18R(s) can contain multipleR(s) continues to be a singleton when e(s) corresponds toR(s) = ?.Page 18 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 6018 20 22 24 26 28 30 32 34 36 0 10 20 30 40 50 60 70 80 90 Average percentage of missed calls Frequency ADP*Static*Figure 3: Performance of the 1,100 greedy policies obtained through random search. the estimated benefit from not making the redeployment by a significant margin. More precisely, letting matrix of zeros corresponding to the decision matrix of not making a redeployment, if we have argmin² 2 [0, 1) be a tolerance margin and using ¯0(s) to denote the |R(s)| × |B| dimensionalu2U(s) (Ehg(s, u, f(s, u,X(s, u))) + ®¿(f(s,u,X(s,u)))−¿(s) J(f(s, u,X(s, u)), r)i)· (1 − ²)Ehg(s,¯0, f(s, u,X(s, ¯0))) + ®¿(f(s,¯0,X(s,¯0)))−¿(s) J(f(s,¯0,X(s, ¯0)), r)i,then we make the redeployment decision indicated by the optimal solution to the problem on the left-hand side. Otherwise, we do not make a redeployment. Larger values of frequency of redeployments. We vary Figure 4 shows the performance improvement obtained by the additional redeployments. The horizontal axis gives the frequency of the redeployments measured as the number of redeployments per ambulance per day. The vertical axis gives the percentage of missed calls. The dashed (solid) data line corresponds to the first (second) method of varying the redeployment frequency. From Figure 1 we miss 21.9% of the calls without making any additional redeployments. By making about 10 additional redeployments per ambulance per day, we can decrease the percentage of missed calls to 20.2%. Beyond this range, we reach a plateau and additional redeployments do not provide much improvement. Another important observation is that the second method tends to provide significantly better performance improvements with the same frequency of additional redeployments. For example, the second method reduces the percentage of missed calls to 20.2% with 10 additional redeployments per ambulance per day, whereas the first method needs 15 additional redeployments per day to reach the same level. 19² decrease the² between 0.1 and 0.005.Page 19 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 600 5 10 15 20 25 30 19 19.5 20 20.5 21 21.5 22 22.5 Efficient Frontier: improvement vs. relocations per ambulance−day Relocations per ambulance−day Average Number of Missed Calls (% of total calls) Regularly scheduled relocs. var. freqs Selective relocsFigure 4: Performance of ADP as a function of the frequency of the additional redeployments. Therefore, it appears that making redeployments only when the value function approximation signals a significant benefit is helpful in avoiding pointless redeployments.6. Computational Results on a Second Metropolitan AreaIn this section, we present the performance of ADP on the EMS system operating in a second metropolitan area. This EMS system is also studied in s (2007). We cannot disclose the name of the metropolitan area due to confidentiality agreements.6.1. Experimental SetupThe population of the city in question is more than 5 times that of Edmonton and its size is around 180 times, 88 bases and 22 hospitals. The turn-out times, call-scene times and hospital-transfer times are comparable to those in Edmonton. We were able to use a detailed model for determining to which hospital a patient needs to be transported. In particular, the destination hospital depends on the location of the call. Calls originating at a given location are transported to any of a small set of hospitals – usually no more than 2 or 3 out of the 22 hospitals in the system. The corresponding probabilities are inferred from historical data. We assume that all ambulances are of the same type and a call that is not served within 8 minutes is interpreted as missed. The road network that we use in our computational experiments models the actual network on the avenue level and there are 4,955 nodes and 11,876 arcs in this road network. 20× 100 km2. The EMS system includes up to 97 ambulances operating during peakPage 20 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60The call arrival model that we used is quite realistic. The data that we used were collected from one year of operations of the EMS system and consisted of aggregated counts of calls for each hour of the week during a whole year, for each of 100 irregular and non-convex shape of the metropolitan area, roughly 80% of these zones had zero total call counts and did not intervene in the dynamics. From the remaining 20% a significant percentage had very low hourly counts of at most 5 calls. Thus, it was necessary to apply smoothing procedure for the lowest intensity zones so as to reduce the sampling noise. In order to do this, we first classified the zones in a few groups according to their average intensity along the week. Then, for the lowest intensity groups, we computed a total intensity for each hour and then distributed this total intensity uniformly among the zones in this group. In this way we obtained an intensity model that combined a uniform low intensity background with actual (accurate) counts on the highest intensity zones. The average call arrival rate was 570 calls per day and fluctuated on any day of the week from a low of around 300 calls per day to a high of around 750 calls per day. We used the actual base assignments and ambulance shifts as a benchmark. In the EMS system, a maximum of 97 ambulances operate at noon. Out of these, 66 ambulances work all day in two 12 hour shifts and the remaining 31 have single shifts typically ranging from 10 to 12 hours per day. The resulting shift schedule provides 1,708 ambulance hours per day. Ambulances are redeployed to their preassigned bases whenever they become free after serving a call. We refer the reader to s (2007) for details on the ambulance shifts.× 100 geographic zones. Due to the6.2. Baseline PerformanceFigure 5 shows the performance of ADP. The interpretations of the axes in this figure are the same as those in Figures 1 and 2. The solid horizontal line plots the percentage of calls missed by the benchmark policy and the dashed horizontal line plots the best performance obtained using ADP. The solid data line plots the performance of ADP when we use function, whereas the dashed line plots the performance when we use was found by experimentation to give the best policy and least fluctuations across iterations of all the values tried in the interval (0 For both values of perform very similarly. The improvements in the number of reached calls are roughly 3 4 than with some initial experimentation. 21· = 2 in the fourth basis· = 1. The value · = 2, 2].·, the best ADP policies are attained in one iteration and these policies.0% and.0% in the case of · = 1 and · = 2. We get significantly more stable performance with · = 2· = 1. This indicates that it is important to carefully tune the parameter · throughPage 21 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 602 4 6 8 10 12 14 25 30 35 40 45 50 55 Average # of missed calls (% of total number of calls) Iteration Numberk = 1.0k = 2.0Figure 5: Performance of ADP and the benchmark policy.6.3. Effect of Turn-Out TimeRecall that if the ambulance crew is stationed at a base when it is notified of a call, then it takes 45 seconds to get ready, i.e., the turn-out time is 45 seconds. On the other hand, an ambulance crew that is already on the road does not incur turn-out time. A potential argument against ambulance redeployment is that any gains are simply due to ambulance crews being on the road more often, and therefore incurring less turn-out time delays. To check the validity of this argument, we used ADP under the assumption that turn-out time is zero. In other words, an ambulance crew does not take any time to get ready, even if it is located at a base when it is notified of a call. Table 1 shows the results for two different call arrival regimes (633 calls per day and 846 calls per day). The third and fourth rows show the absolute and relative improvement of ADP over the benchmark strategy. The results indicate that ADP provides significant improvement over the benchmark strategy in all cases. At first sight, this improvement seems slightly smaller for the cases without turn-out time, but the relative improvement is roughly the same in all cases.6.4. Varying Call Arrival Rates and Fleet SizesWe now explore the effect of different call arrival rates and fleet sizes on the performance of ADP. We first carry out a number of runs under the experimental setup described in Section 6.1, but we vary the call arrival rates by uniformly scaling them by a constant factor. Recall that the average call arrival rate in the experimental setup in Section 6.1 is around 570 calls per day. Figure 6 shows the percentage of the calls that are missed by the best policy obtained by ADP 22Page 22 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60633 calls per day Turn out Turn out = 45 secs. = 0 secs. % of calls missed 27.5 23.0 by benchmark % of calls missed 23.9 19.3 by ADP Improvement 3.6 3.2 Rel. improvement 3.6 / 27.5 3.2 / 23.0 = 0.13 = 0.139 846 calls per day Turn out Turn out = 45 secs. = 0 secs. % of calls missed 35.5 30.7 by benchmark % of calls missed 30.5 26.3 by ADP Improvement 4.9 4.5 Rel. improvement 4.9 / 35.5 4.5 / 30.7 = 0.139 = 0.145Table 1: Effect of turn-out time on the performance gap between ADP and the benchmark policy.450 500 550 600 650 700 750 800 850 20 25 30 35 40 Average # of calls/day Average # of missed calls lost (% of total number of calls) Benchmark Policies Best ADP policiesFigure 6: Performance of ADP and benchmark policy for different call arrival rates. and the benchmark strategy as a function of the average call arrival rate. The solid data line corresponds to ADP, whereas the dashed data line corresponds to the benchmark strategy. ADP provides substantial improvements over all call arrival regimes. The improvements provided by ADP are more significant when the call arrival rate is relatively high. It appears that when the call arrival rate is high, there is more room for improvement by redeploying ambulances carefully and ADP effectively takes advantage of this greater room for improvement. In a second set of computational experiments, we hold the call arrival rate constant and vary the number of ambulances in the fleet. Figure 7 shows the performance of ADP and the benchmark strategy as a function of the fleet size. Since we do not have access to the base assignments used by the benchmark strategy for different fleet sizes, we modify the base assignments described in Section 6.1 heuristically. In particular, to produce a base assignment with fewer ambulances, we choose the ambulances assigned to bases serving low demand and take them off shift. Similarly, to produce a base assignment with extra ambulances, we add 23Page 23 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 6080 85 90 95 100 20 25 30 35 40 45 Number of Ambulances Average # of missed calls (% of total number of calls) Static Policy Best ADP redeployment policyFigure 7: Performance of ADP and benchmark policy for different fleet sizes. ambulances to the bases with the highest demand. The results indicate that ADP performs consistently better than the benchmark policy. An important observation from Figure 7 is that if our goal is to keep the percentage of the missed calls below a given threshold, say 28%, then ADP allows us to reach this goal with roughly 5 or 6 fewer ambulances than the benchmark policy. This translates into significant cost savings in an EMS system. It is also interesting that the performance of the benchmark policy does not improve significantly beyond 97 or 98 ambulances in the fleet, whereas ADP continues to provide progressively lower missed-call rates. This partly reflects the quality of our heuristic benchmark strategy, but it also indicates that a redeployment strategy can help mitigate poor static allocations and make effective use of extra capacity.7. ConclusionsIn this paper, we formulated the ambulance redeployment problem as a dynamic program and used an approximate version of the policy iteration algorithm to deal with the high-dimensional and uncountable state space. Extensive experiments on two problem scenarios showed that ADP can provide high-quality redeployment policies. The basis functions that we constructed open up the possibility of using other approaches, such as temporal-difference learning and the linearprogramming approach, to tune the parameters exploring the linear-programming approach. Other future research will incorporate additional degrees of realism into our model. We plan to include stochastic travel times, multiple call priorities, other cost functions and more realistic 24{rp : p = 1, . . . , P}. Indeed, we are currentlyPage 24 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60ambulance dynamics that involve multiple ambulances serving certain calls. Incorporating these complexities may require constructing additional basis functions.AcknowledgmentsWe thank Currell, Armann Ingolfsson and Mason for their advice and support, and for providing the data sets used in our computational experiments. This work was partially supported by National Science Foundation Grants DMI 0400287 and DMI 0422133.ReferencesAdelman, D. 2004. A price-directed approach to stochastic inventory routing. Research Adelman, D. 2007. Dynamic bid-prices in revenue management. Adelman, , Adam J. Mersereau. 2007. Relaxations of weakly coupled stochastic dynamic programs. Andersson, T. 2005. Decision support tools for dynamic fleet management. Ph.D. thesis, Department of Science and Technology, Linkoepings Universitet, Norrkoeping, Sweden. Andersson, T., P. Vaerband. 2007. Decision support tools for ambulance dispatch and relocation.Operations52 499–514.Operations Research 55 647–661.Operations Research .Journal of the Operational Research Society Berman, O. 1981a. Dynamic repositioning of indistinguishable service units on transportation networks. Berman, O. 1981b. Repositioning of distinguishable urban service units on networks. and Operations Research Berman, O. 1981c. Repositioning of two distinguishable service vehicles on networks. Transactions on Systems, Man, and Cybernetics Bertsekas, D.P., S.E Shreve. 1978. Academic Press, New York. Bertsekas, D.P., J.N. Tsitsiklis. 1996. Massachusetts. Brotcorne, L., G. Laporte, F. Semet. 2003. Ambulance location and relocation models. Journal of Operations Research de Farias, D. P., B. Van Roy. 2003. The linear programming approach to approximate dynamic programming. Farias, V. F., B. Van Roy. 2006. Tetris: A study of randomized constraint sampling. G. Calafiore, F. Dabbene, eds., Springer-Verlag. Farias, V. F., B. Van Roy. 2007. An approximate dynamic programming approach to network revenue management. Tech. rep., Stanford University, Department of Electrical Engineering. Gendreau, M., G. Laporte, S. Semet. 2001. A dynamic model and parallel tabu search heuristic for real time ambulance relocation. 2558 195–201.Transportation Science 15.Computers8 105–118.IEEESMC-11.Stochastic Optimal Control: The Discrete Time Case..Neuro-Dynamic Programming. Athena Scientific, Belmont,European147 451–463.Operations Research 51 850–865.Probabilistic and Randomized Methods for Design Under Uncertainty.Parallel Computing 27 1641–1653.Page 25 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60Gendreau, M., G. Laporte, S. Semet. 2006. The maximal expected coverage relocation problem for emergency vehicles. Goldberg, J. B. 2007. Personal Communication. , S. G., A. J. Mason. 2004. Ambulance service planning: Simulation and data visualisation. M. L. Brandeau, F. Sainfort, W. P. Pierskalla, eds., Health Care: A Handbook of Methods and Applications and Management Science, Kluwer Academic, 77–102. Ingolfsson, A. 2006. The impact of ambulance system status management. Presentation at 2006 INFORMS Conference. Ingolfsson, A., E. Erkut, S. Budge. 2003. Simulation of single start station for Edmonton EMS.Journal of the Operational Research Society 57 22–28.Operations Research and. Handbooks in Operations ResearchJournal of the Operational Research Society Kolesar, P., W. E. . 1974. An algorithm for the dynamic relocation of fire companies.54 736–746.Operations Research Nair, R., E. -Hooks. 2006. A case study of ambulance location and relocation. Presentation at 2006 INFORMS Conference. , Warren B. 2007. Dimensionality Restrepo, M., S.G , H. Topaloglu. 2007. Erlang loss models for the static deployment of ambulances. Submitted. s, D. P. 2007. Optimised ambulance redeployment strategies. Master’s thesis, Department of Engineering Science, University of Auckland, Auckland, New Zealand. Schweitzer, P., A. Seidmann. 1985. Generalized polynomial approximations in Markovian decision processes. Si, Jennie, G. Barto, Warren B. , Wunsch II, eds. 2004. Learning and Approximate Dynamic Programming Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. 22 249–274.Approximate Dynamic Programming: Solving the Curses of. Wiley & Sons, Hoboken, NJ.Journal of Mathematical Analysis and Applications 110 568–582.Handbook of. Wiley-Interscience, Piscataway, NJ.Machine Learning3 Topaloglu, H., W. B. . 2006. Dynamic programming approximations for stochastic, timestaged integer multicommodity flow problems. Tsitsiklis, J., B. Van Roy. 1997. An analysis of temporal-difference learning with function approximation. Tsitsiklis, J. N. 1994. Asynchronous stochastic approximation and 9–44.INFORMS Journal on Computing 18 31–42.IEEE Transactions on Automatic Control 42 674–690.Q-learning. Machine Learning16 Tsitsiklis, J.N, B. Van Roy. 2001. Regression methods for pricing complex American-style options. Van Roy, , Dimitri P. Bertsekas, Yuchun Lee, N. Tsitsiklis. 1997. A neuro dynamic programming approach to retailer inventory management. on Decision and Control Watkins, C. J. C. H., P. Dayan. 1992. Yan, X., P. Diaconis, P. Rusmevichientong, B. Van Roy. 2005. Solitaire: Man versus machine.185–202.IEEE Transactions on Neural Networks 12 694–703.Proceedings of the IEEE Conference.Q-learning. Machine Learning 8 279–292.Advances in Neural Information Processing Systems 2617.Page 26 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60  Joby Berkley - Owner Jobo's Hamburger Grill 6548 Lake Worth Blvd., # 200 Lake Worth, Texas 76135 www.joboshamburgergrill.com Work Fax Cell joby@...   SSM makeover??? Ambulance Placement Research For those of you that might be interested in the new study being conduted by Cornell University Dr. Shane here is his acedemic science paper. Some of you may night be interested in it, as it contains a huge amount of formulas, but you might enjoy the result section. This was sent to me by Dr. and it shows the complexity of unit placement.  Just FYI  Joby Berkley - Owner Jobo's Hamburger Grill 6548 Lake Worth Blvd., # 200 Lake Worth, Texas 76135 www.joboshamburgergrill.com Work Fax Cell joby@... Quote Link to comment Share on other sites More sharing options...
Guest guest Posted June 20, 2008 Report Share Posted June 20, 2008 Approximate Dynamic Programming for Ambulance RedeploymentJournal: INFORMS Journal on ComputingManuscript ID: draft Manuscript Type: Original Article Date Submitted by the Author: n/a Complete List of Authors: Restrepo, Mateo; Cornell University, C.A.M (Center for Applied Math); Cornell University, School of Operations Research and Information Engineering Topaloglu, Huseyin; Cornell University, School of Operations Research and Information Engineering , Shane; Cornell University, School of Operations Research and Information Engineering Keywords: 067 Dynamic programming-optimal control : Applications, 272 Health care, Ambulance service, 762 Simulation, Applications, 314 Information systems : Decision support systemshttp://joc.pubs.informs.orgINFORMS Journal on Computing: For Review OnlyApproximate Dynamic Programming for Ambulance RedeploymentMateo RestrepoCenter for Applied Mathematics Cornell University, Ithaca, NY 14853, USA mr324@... G. , Huseyin TopalogluSchool of Operations Research and Information Engineering Cornell University, Ithaca, NY 14853, USA{ April 30, 2008sgh9,ht88}@... present an approximate dynamic programming approach for making ambulance redeployment decisions in an emergency medical service system. The primary decision is where we should redeploy idle ambulances so as to maximize the number of calls reached within a given delay threshold. We begin by formulating this problem as a dynamic program. To deal with the high-dimensional and uncountable state space in the dynamic program, we construct approximations to the value functions that are parameterized by a small number of parameters. We tune the parameters of the value function approximations using simulated cost trajectories of the system. Computational experiments demonstrate the performance of the approach on emergency medical service systems in two metropolitan areas. We report practically significant improvements in performance relative to benchmark static policies.Page 1 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 601. IntroductionRising costs of medical equipment, increasing call volumes and worsening traffic conditions are putting emergency medical service (EMS) managers under pressure to meet performance goals set by regulators or contracts. Ambulance redeployment is one possible approach that can potentially help EMS managers. Ambulance redeployment, also known as relocation, move up, system-status management or dynamic repositioning, refers to any strategy by which a dispatcher repositions idle ambulances to compensate for others that are busy, and hence, unavailable. The increasing availability of geographic information systems and the increasing affordability of computing power have finally created ideal conditions for bringing real-time ambulance redeployment approaches to fruitful implementation. In this paper, we present an approximate dynamic programming (ADP) approach for making real-time ambulance redeployment decisions. We begin by formulating the ambulance redeployment problem as a dynamic program. This dynamic program involves a high-dimensional and uncountable state space and we address this difficulty by constructing approximations to the value functions that are parameterized by a small number of parameters. We tune the parameters of the value function approximations through an iterative and simulation-based method. Each iteration of the simulation-based method consists of two steps. In the first step, we simulate the trajectory of the greedy policy induced by the current value function approximation and collect cost trajectories of the system. In the second step, we tune the parameters of the value function approximation by solving a regression problem that fits the value function approximation to the collected cost trajectories. This yields a new set of parameters that characterize new value function approximations, and so, we can go back and repeat the same two steps. In this respect, the idea we use closely resembles the classical policy iteration algorithm in the Markov decision process literature. In particular, the first (second) step is analogous to the policy-evaluation (policy-improvement) step of the policy iteration algorithm. There are two streams of literature that are related to our work. The first one is the literature on ADP. A generic approach for ADP involves using value function approximations of the formP p fixed basis functions; see Bertsekas and Tsitsiklis (1996), (2007). There are a number of methods to tune the parameters p approximation to the value function. For example, temporal difference learning and Q-learning use stochastic approximation ideas in conjunction with simulated trajectories of the system to iteratively tune the parameters; see Sutton (1988), Watkins and Dayan (1992), Tsitsiklis (1994), Bertsekas and Tsitsiklis (1996), Tsitsiklis and Van Roy (1997), Si, Barto, , and Wunsch II (2004). On the other hand, the linear-programming approach for ADP finds a good set of values for the parameters by solving a large linear program whose decision variables are 2P=1 rp Ãp(·), where {rp : p = 1, . . . , P} are tuneable parameters and {Ãp(·) : p = 1, . . . , P} are{rp : p = 1, . . . , P} so that PP=1 rp Ãp(·) yields a goodPage 2 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60{ and Mersereau (2007). Both classes of approaches are aimed at tuning the parameters rp : p = 1, . . . , P}; see Schweitzer and Seidmann (1985), de Farias and Van Roy (2003), Adelman{rp p choice of the basis functions an art form, requiring substantial knowledge of the problem structure. Applications of ADP include inventory control (Van Roy, Bertsekas, Lee, and Tsitsiklis, 1997), inventory routing (Adelman, 2004), option pricing (Tsitsiklis and Van Roy, 2001), game playing (Yan, Diaconis, Rusmevichientong, and Van Roy, 2005; Farias and Van Roy, 2006), dynamic fleet management (Topaloglu and , 2006), and network revenue management (Adelman, 2007; Farias and Van Roy, 2007). The second stream of literature that is related to our work is the literature on ambulance redeployment. One class of models involves solving integer programs in real time whenever an ambulance redeployment decision needs to be made; see Kolesar and (1974), Gendreau, Laporte, and Semet (2001), Brotcorne, Laporte, and Semet (2003), Gendreau, Laporte, and Semet (2006), Nair and -Hooks (2006), s (2007). The objective function in these integer programs generally involves a combination of backup coverage for future calls and relocation cost of ambulances. These models are usually computationally intensive, since they require solving an optimization problem every time a decision is made. As a result, a parallel computing environment is often (but not always) used to implement a working real-time system. A second class of models is based on solving integer programs in a preparatory phase. This approach provides a lookup table describing, for each number of available ambulances, where those ambulances should be deployed. Dispatchers attempt to dispatch so as to keep the ambulance configuration close to the one suggested by the lookup table; see Ingolfsson (2006), Goldberg (2007). A third class of models attempts to capture the randomness in the system explicitly, either through a dynamic programming formulation or through heuristic approaches. Berman (1981a), Berman (1981b) and Berman (1981c) represent the first papers that provide a dynamic programming approach for the ambulance redeployment problem. However, these papers follow an exact dynamic programming formulation and, as is often the case, this formulation is tractable only in oversimplified versions of the problem with few vehicles and small transportation networks. Andersson (2005) and Andersson and Vaerband (2007) make the ambulance redeployment decision by using a “preparedness†function that essentially measures the capability of a certain ambulance configuration to cover future calls. The preparedness function is similar in spirit to the value function in a dynamic program, measuring the impact of current decisions on the future evolution of the system. However, the way the preparedness function is constructed is heuristic in nature. When compared with the three classes of models described above, our approach provides a number of advantages. In contrast to the models that are based on integer programs, our 3= 1, . . . , P} so that PP=1 rp Ãp(·) yields a good approximation to the value function. The{Ãp(·) : p = 1, . . . , P}, on the other hand, is regarded as more ofPage 3 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60approach captures the random evolution of the system over time since it is based on a dynamic programming formulation of the ambulance redeployment problem. Furthermore, the decisions made by our approach in real time can be computed very quickly as this requires solving a simple optimization problem that minimizes the sum of the immediate cost and the value function approximation. In lookup table approaches, there may be more than one way to redeploy the ambulances so that the ambulance configuration over the transportation network matches the configuration suggested by the lookup table. Therefore, table lookup approaches still leave some aspects of dispatch decisions to subjective interpretation by dispatchers. Our approach, on the other hand, can fully automate the decision-making process. In traditional dynamicprogramming approaches, one is usually limited to very small problem instances, whereas ADP can be used on problem instances with realistic dimensions. Our approach allows working with a variety of objective functions, such as the number of calls that are not served within a threshold time standard or the total response time for the calls. Furthermore, our approach allows the possibility of constraining the frequency and destinations of ambulance relocations. This is important since a relocation scheme should balance improvements in service levels with the additional demands imposed on ambulance crews. In summary, we make the following research contributions. 1) We develop a tractable ADP approach for the ambulance redeployment problem. Our approach employs value function approximations of the form p tune the parameters formulation of the problem, our approach is able to capture the random evolution of the system over time. 2) We develop a set of basis functions function approximations for the ambulance redeployment problem. This opens up the possibility of using other ADP approaches, such as temporal difference learning and the linear-programming approach. 3) We provide computational experiments on EMS systems in two metropolitan areas. Our results indicate that ADP has the potential to obtain high-quality redeployment policies in real systems. They also show that our approach compares favorably with benchmark policies that are similar to those used in practice. The remainder of this paper is organized as follows. In Section 2, we present a dynamic programing formulation for the ambulance redeployment problem. In Section 3, we describe our ADP approach. In Section 4, we discuss the basis functions that we use in our value function approximations. In Sections 5 and 6, we report computational results for two metropolitan areas. We conclude in Section 7 and point out some directions for further research. 4PP=1 rp Ãp(·) and uses sampled cost trajectories of the system to{rp : p = 1, .. . . , P}. Since it is based on the dynamic programming{Ãp(·) : p = 1, . . . , P} that yield good valuePage 4 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 602. Ambulance Redeployment as a Markov Decision ProcessIn this section, we present a dynamic programming formulation of the ambulance redeployment problem. As will shortly be clear, our model involves an uncountable state space. For an excellent account of the basic terminology, notation and fundamental results regarding dynamic programming in uncountable states spaces, we refer the reader to Bertsekas and Shreve (1978).2.1. State SpaceTwo main components in the state of an EMS system at a given time are the vectors ( and each waiting call in the system. To simplify the presentation, we assume that we do not keep more than This is not really a restriction from a practical perspective since of each ambulance is given by a tuple A =a1, . . . , aN) and C = (c1, . . . , cM) containing information about the state of each ambulanceM waiting calls in the system, possibly by diverting them to another EMS system.M can be quite large. The statea = (¾a, oa, da, ta), where ¾a is the status of the ambulance,o time of the ambulance movement. To serve a call, an ambulance first moves to the call scene and provides service at the scene for a certain amount of time. Following this, the ambulance possibly transports the patient to a hospital, and after spending some time at the hospital, the ambulance becomes free to serve another call. Therefore, the status of an ambulance “off shift,†“idle at base,†“going to call,†“at call scene,†“going to hospital,†“at hospital†or “returning to base.†If the ambulance is stationary at location ambulance is in motion, then a and da are respectively origin and destination locations of the ambulance and ta is the starting¾a can beo, then we have oa = da = o. If theta corresponds to the starting time of this movement. Otherwise,t status of the ambulance is “at hospital,†then arrived at the hospital. This time is kept in the state of an ambulance to be able to give a Markov formulation for the non-Markovian elements in the system, such as nonexponentially distributed service times and deterministic travel times. Similarly, for a call, we have where arrived into the system and “assigned to an ambulance†or “queued for service.†We model the dynamics of the system as an event-driven process. Events are triggered by changes in the status of the ambulances or by call arrivals. Therefore, the possible event types in the system are “call arrives,†“ambulance goes off shift,†“ambulance comes on shift,†“ambulance departs for call scene,†“ambulance arrives at call scene,†“ambulance leaves call scene,†“ambulance arrives at hospital,†“ambulance leaves hospital†and “ambulance arrives at base.†We assume that we can make decisions (for any ambulance, and not just the one involved in the event) only at the times of these events. This comes at the cost of some loss of 5a corresponds to the starting time of the current phase in the service cycle. For example, if theta corresponds to the time at which the ambulancec = (¾c, oc, tc, pc),¾c is the status of the call, oc is the location of the call, tc is the time at which the callpc is the priority level of the call. The status of a call ¾c can bePage 5 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60optimality, since making decisions between the times of the events may improve performance. From a practical perspective, however, events are frequent enough to allow ample decision-making opportunities. By restricting our attention to the times of events, we visualize the system as jumping from one event time to another. Therefore, we can use the tuple of the system, where and the state trajectory of the system can be written as the system just after the Throughout the paper, we use when the state of the system is the tuple s = (¿, e,A,C) to represent the state¿ corresponds to the current time, e corresponds to the current event type,A and C respectively correspond to the state of the ambulances and the calls. In this case,{sk : k = 1, 2, . . .}, where sk is the state ofkth event occurs. Note that the time is rolled into our state variable.¿ (s) and e(s) to respectively denote the time and the event types. In other words, ¿ (s) and e(s) are the first two components ofs = (¿, e,A,C).2.2. ControlsWe assume that calls are served in decreasing order of priority, and within a given priority level are served in first-in-first-out order. We further assume that the closest available ambulance is dispatched to a call. This is not an exact representation of reality, but is close enough for our purposes: We will show that ADP yields an effective redeployment strategy under this dispatch policy, and we expect that it will continue to do so for more realistic dispatch policies. Let the system is is reallocation decisions in the binary matrices feasible decisions is thenR(s) denote the set of ambulances that are available for redeployment when the state ofs. Let uab(s) = 1 if we redeploy ambulance a to base b when the state of the systems, and 0 otherwise. Letting B be the set of ambulance bases, we can capture the potentialu(s) = {uab(s) : a 2 R(s), b 2 B}. The set ofU(s) = nu(s) 2 {0, 1}|R(s)|×|B| :Xb2Buab(s) · 1 8 a 2 R(s)o.The constraints in this definition simply state that each ambulance that is considered for redeployment can be redeployed to at most one base. An important assumption is that the cardinality of by enumerating over all feasible solutions. We can use different definitions of example, all available ambulances are considered for relocation if ambulances that are either idle at base or returning to base. But this may lead to impractically many relocations, so we can restrict to an ambulance becoming free after serving a call, in which case we can take singleton set corresponding to the newly freed ambulance. Here the ambulance crews are “on the 6U(s) is small so that an optimization problem over this feasible set can be solvedR(s) to permit different amounts of relocation. ForR(s) denotes the set ofR(s), for example to ? unless the event e(s) correspondsR(s) equal to thePage 6 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60road†when considered for relocation so there is no additional disruption to the crews relative to the standard “return to base†policy.2.3. Fundamental DynamicsCall arrivals are generated across the region a known arrival intensity fixed policy for serving calls, but our general approach does not depend on the particular form of this policy. If there are no available ambulances to serve a call, then the call is placed into a waiting queue. An ambulance serving a call proceeds through a sequence of events, including arriving at the scene, treating the patient, transporting and handing over the patient at the hospital. The main source of uncertainty in this call service cycle are the times spent between events. We assume that probability distributions for all of the event durations are known. Besides these sources of randomness, the major ingredient governing the dynamics is the decision-making of dispatchers. As a result, the complete trajectory of the system is given byR ½ R2 according to a Poisson point process withC = {C(t, x, y) : t ¸ 0, (x, y) 2 R}. As mentioned above, we have a{(sk, uk) : k = 1, 2, . . ..}, where sk is the state of the system at the time of the kth event and ukis the decision (if any) made by the dispatcher when the state of the system is the dynamics of the system symbolically bysk. We capturesk+1 = f(sk, uk,Xk(sk, uk)),where of randomness described above. We do not attempt to give a more explicit description of the transition function to point out that the preceding discussion and the fact that we can simulate the evolution of the system over time imply that an appropriate Xk(sk, uk) is a random element of an appropriate space encapsulating all the sourcesf(·, ·, ·), since this does not add anything of value to our treatment. It sufficesf(·, ·, ·) exists.2.4. Transition CostsAlong with a transition from state In our particular implementation, we use the cost functionsk to sk+1 through decision uk, we incur a cost g(sk, uk, sk+1).g <> :(sk, uk, sk+1) =8>1 if the event the call in question is urgent and the response time exceeds ¢ 0 otherwisee(sk+1) corresponds to an ambulance arrival at a call scene,.(1) Here ¢ is a fixed threshold response time that is on the order of 10 minutes. Therefore, the cost function (1) allows us to count the number of high priority calls whose response times exceed a threshold response time. We are interested in the performance of the system over the finite planning horizon [0, T], so let g(s, ·, ·) = 0 whenever ¿ (s) > T. In our implementation, T7Page 7 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60corresponds to two weeks. We have chosen this particular cost function because of its simplicity, and also because most performance benchmarks in the EMS industry are formulated in terms of the percentage of calls that are reached within a time standard. Our approach allows other cost functions without affecting the general treatment.2.5. Objective Function and Optimality EquationA policy is a mapping from the state space to the action space, prescribing which action to take for each possible state of the system. Throughout the paper, we use prescribed by policy trajectory of the system k k k k k k and the discounted total expected cost incurred by starting from initial state μ(s) to denote the actionμ when the state of the system is s. If we follow policy μ, then the state{sμ: k = 1, 2, . . .} evolves according to sμ+1 = f(sμ, μ(sμ),Xk(sμ, μ(sμ)))s is given byJμ(s) = Eh 1 Xk=1® k k k k¿(sμ) g(sμ, μ(sμ), sμ+1) | sμ1 = si,where random variables k k state. It is well-known that the policy be found by computing the value function through the optimality equation® 2 [0, 1) is a fixed discount factor. The expectation in the expression above involves the{sμ: k = 1, 2, . . .} and ¿ (sμ) is the time at which the system visits the kthμ¤ that minimizes the discounted total expected cost canJ(s) = minu and letting The difficulty with the optimality equation (2) is that the state variable takes uncountably many values. Even if we are willing to discretize the state variable to obtain a countable state space, the state variable is still a high-dimensional vector and solving the optimality equation (2) through classical dynamic-programming approaches is computationally very difficult. In the next two sections, we propose a method to construct tractable approximations to the value function. The discounting in (2) may seem odd, as it implies that we are minimizing the expected2U(s)(Ehg(s, u, f(s, u,X(s, u))) + ®¿(f(s,u,X(s,u)))−¿(s) J(f(s, u,X(s, u)))i) (2)μ¤(s) be the minimizer of the right-hand side above; see Bertsekas and Shreve (1978).discounted computational device, in the sense that the discount factor is very helpful in stabilizing the ADP approach that we describe in the next two sections. The key issue is that the effect of relocation decisions can be small relative to the simulation noise, and discounting appears to mitigate this to some extent. This observation is in agreement with empirical observations from the case studies in Chapter 8 of Bertsekas and Tsitsiklis (1996). The increase in stability is also supported by the theoretical results in Chapter 6.2 of Bertsekas and Tsitsiklis (1996). When presenting our computational results in Sections 5 and 6, we report the of calls that are not reached within the threshold response time. These results indicate that 8number of calls that are not reached within the threshold time. This is purely aundiscounted numberPage 8 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60although we construct the value function approximations with a view towards minimizing the expected in minimizing the expected discounted cost, the same value function approximations provide very good performanceundiscounted cost.3. Approximate Dynamic ProgrammingThe ADP approach that we use to construct approximations to the value function is closely related to the traditional policy iteration algorithm in the Markov decision processes literature. We begin with a brief description of the policy iteration algorithm. Throughout the rest of the paper, the greedy policy induced by an arbitrary function ˆ decision in the set argminJ(·) refers to the policy that takes au whenever the state of the system is 2U(s) (Ehg(s, u, f(s, u,X(s, u))) + ®¿(f(s,u,X(s,u)))−¿(s) ˆ J(f(s, u,X(s, u)))i) (3)s.Policy IterationStep 1. Initialize the iteration counter Step 2. (Policy improvement) Let Step 3. (Policy evaluation) Let cost incurred when starting from state Step 4. Increase In our setting the cost function is finite, so Proposition 9.17 in Bertsekas and Shreve (1978) applies and hence pointwise to the solution to the optimality equation (2). The difficulty with the policy iteration algorithm is that when dealing with uncountable state spaces, it is not even feasible to store incurred by using policy of the formn to 1 and initialize J1(·) arbitrarily.μn be the greedy policy induced by Jn(·).Jn+1(·) = Jμn(·), where Jμn(s) denotes the expected discounteds and using policy μn.n by 1 and go to Step 2.g(·, ·, ·) is nonnegative and the set of feasible decisions U(s)Jn(·) convergesJμn(·), let alone compute the expected discounted costμn. We overcome this difficulty by using value function approximationsJ(s, r) =P Xp=1rp Ãp(s).In the expression above, 1 the parameters so that 9r = {rp : p = 1, . . . , P} are tuneable parameters and {Ãp(·) : p =, . . . , P} are fixed basis functions. The challenge is to construct the basis functions and tuneJ(·, r) is a good approximation to the solution to the optimality equationPage 9 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60(2). To achieve this, each basis function the solution to the optimality equation. In Section 4, we describe one set of basis functions that work well. Once a good set of basis functions is available, we can use the following approximate version of the policy iteration algorithm to tune the parameters Ãp(·) should capture some essential information about{rp : p = 1, . . . , P}.Approximate Policy IterationStep 1. Initialize the iteration counter n to 1 and initialize r1 = {r1p Step 2. (Policy improvement) Let Step 3. (Policy evaluation through simulation) Simulate the trajectory of policy planning horizon [0: p = 1, . . . , P} arbitrarily.μn be the greedy policy induced by J(·, rn).μn over the, T] for Q sample paths. Let {snk( trajectory of policy q) : k = 1, . . . ,K(q)} be the stateμn in sample path q and Gnk( starting from state q) be the discounted cost incurred bysnk( Step 4. (Projection) Letq) and following policy μn in sample path q.rn+1 = argminr2RP ( Q Xq=1K(q) Xk=1 £Gnk(q) − J(snk(q), r)¤2).Step 5. Increase In Step 3 of approximate policy iteration we use simulation to evaluate the expected discounted cost incurred by policy n by 1 and go to Step 2.μn. Therefore, {Gnk( the sampled cost trajectories of the system under policy that the value function approximation There is still one computational difficulty in the approximate policy iteration algorithm. When simulating the trajectory of policy of the form (3) to find the action taken by the greedy policy induced by problem involves an expectation that is difficult to compute. We resort to Monte Carlo simulation to overcome this difficulty. In particular, if the state of the system is action taken by the greedy policy induced by decisions in the feasible set trajectory of the system until the next event and this provides a sample of using multiple samples, we estimate the expectationq) : k = 1, . . . ,K(q), q = 1, . . . ,Q} areμn. In Step 4, we tune the parameters soJ(·, r) provides a good fit to the sampled cost trajectories.μn in Step 3, we need to solve an optimization problemJ(·, rn). This optimizations and we want to find theJ(·, rn) in this state, then we enumerate over allU(s). Starting from state s and taking decision u, we simulate thef(s, u,X(s, u)). ByEhg(s, u, f(s, u,X(s, u))) + ®¿(f(s,u,X(s,u)))−¿(s) J(f(s, u,X(s, u)), rn)ithrough a sample average. Once we estimate the expectation above for all the decision that yields the smallest value and use it as the decision taken by the greedy policy 10u 2 U(s), we choosePage 10 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60induced by sampling error, but it provides good performance in practice. Proposition 6.2 in Bertsekas and Tsitsiklis (1996) provides a performance guarantee for the approximate policy iteration algorithm. (The result is easily extended from finite to infinite state spaces.) This is an encouraging result as it provides theoretical support for the approximate policy iteration algorithm, but its conditions are difficult to verify in practice. In particular, the result assumes that we precisely know the error induced by using regression to estimate the discounted total expected cost of a policy, and it assumes that expectations are computed exactly rather than via sampling as in our case. For this reason, we do not go into the details of this result and refer the reader to Bertsekas and Tsitsiklis (1996) for further details.J(·, rn) when the state of the system is s. This approach is naturally subject to4. Basis FunctionsIn this section, we describe the basis functions function approximations. We use insights developed in Restrepo, , and Topaloglu (2007).{Ãp(·) : p = 1, . . . , P} that we use in our valueP = 6 basis functions, some of which are based on the queueing4.1. BaselineThe first basis function is of the form give a very rough estimate of the discounted number of missed calls between the time associated with state every 1 + Ã1(s) = 1 − ®T−¿(s). The role of this basis function is tos and the end of the planning horizon. In particular, if we assume that we miss a callµ time units, then the discounted number of missed calls over the interval [¿ (s), T] is®µ + ®2 µ + . . . + ®b T−¿(s)µ 1 c µ =− ®[b T−¿(s)µ c+1] µ1 − ® ¼1 − ®T−¿(s)1 ..− ®4.2. Unreachable CallsThe second basis function computes the number of waiting calls for which an ambulance assignment has been made, but the ambulance will not reach the call within the threshold response time. This is easily computed in our setting where travel times are deterministic. In the case of stochastic travel times, we could use the probability that the ambulance will reach the call within the threshold response time instead. We do not give a precise expression for this basis function, since the precise expression is cumbersome yet straightforward to define.4.3. Uncovered Call RateThe third basis function captures the rate of call arrivals that cannot be reached on time by any available ambulance. To define this precisely we need some additional notation. Recall that 11Page 11 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60calls arrive across the region R ½ R2 according to a Poisson point process with arrival intensityC and associate a representative point or “center of mass†( The coverage of subregion of mass ( between points ( indicator function, the coverage of subregion system as= {C(t, x, y) : t ¸ 0, (x, y) 2 R}. Partition the region R into L subregions {½l : l = 1, . . . , L},xl, yl) with each subregion l = 1, . . . , L.l is the number of available ambulances that can reach the centerxl, yl) within the threshold time standard. Let d(t, (x1, y1), (x2, y2)) be the travel timex1, y1) and (x2, y2) when travel starts at time t. Then, letting 1(·) be thel can be written as a function of the state of theNl(s) = X a2A(s)1(d(¿ (s), (xa(s), ya(s)), (xl, yl)) · ¢).Here, time A(s) is the set of available ambulances and (xa(s), ya(s)) is the location of ambulance a at¿ (s) when the system is in state s. The rate of call arrivals at time t in subregion l isCl(t) = Z½lC(t, x, y) dx dy.Therefore, when the state of the system is not covered by any available ambulance bys, we can compute the rate of call arrivals that areÃ3(s) =L Xl=1Cl(¿ (s)) 1(Nl(s) = 0).4.4. Missed Call RateThe previous 2 basis functions capture calls already received that we know we cannot reach on time, and the rate of arriving calls that cannot be reached on time because they are too far from any available ambulance. We could also fail to reach a call on time due to queueing effects from ambulances being busy with other calls. The fourth basis function represents an attempt to capture this effect. This basis function is of the formÃ4(s) =L Xl=1Cl(¿ (s)) Pl(s),where time are busy with other calls. We estimate processes in different subregions as Erlang-loss systems. In Erlang-loss systems, calls arriving when all servers are busy are lost. In reality such calls are queued and served as ambulances become free, but the time threshold is almost always missed for such calls, so counting them as lost seems reasonable. The issue that such calls impose some load on the true system but are discarded in an Erlang-loss system creates a slight mismatch, but our computational results show that this function is still highly effective as a basis function. 12Pl(s) is the probability that all ambulances that could reach a call in subregion l on{Pl(s) : l = 1, . . . , L} by treating the call servicePage 12 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60The Erlang-loss probability L(¸, μ, c) for a system with arrival rate ¸, service rate μ and cservers isL ((¸, μ, c) =¸/μ)c/c!P ic=0(¸/μ)i/i!.To characterize the Erlang-loss system for subregion we need to specify the number of servers, and the arrival and service rates. Let of available ambulances that can serve a call in subregion Thenl, given that the state of the system is s,Nl(s) be the setl within the threshold response time.Nl(s) = {a 2 A(s) : d(¿ (s), (xa(s), ya(s)), (xl, yl)) · ¢}.We use be the service rate in the loss system, i.e., the rate at which an ambulance can serve a call at time on the time spent at the scene of a call and any transfer time at a hospital, since travel times are usually small relative to these quantities. In our practical implementation, we use historical data to estimate the time spent at the call scenes and the hospitals and add a small padding factor to capture travel times. Finally, let ¤ by ambulances in the set devising a value for time c = |Nl(s)| as the number of servers in the Erlang loss system for subregion l. Let μl(t)t in subregion l. It is difficult to come up with a precise value for μl(t). It primarily dependsl(s) be the rate of call arrivals that should be servedNl(s). Coming up with a value for ¤l(s) is even more difficult thanμl(t). One option is to let ¤l(s) = Cl(¿ (s)), which is the rate of call arrivals at¿ (s) in subregion l. However, ambulances in the set Nl(s) serve not only calls in subregionl ¤, but also calls in other subregions. To attempt to capture this letl(s) = X a2Nl(s)L Xi=1C so that ¤ in the set i(¿ (s)) 1(d(¿ (s), (xa(s), ya(s)), (xi, yi)) · ¢), (4)l(s) reflects the total call arrival rate in subregions that are close to any of the ambulancesNl(s). We then use the approximationP There are at least 2 shortcomings in the estimate for ¤l(s) ¼ L(¤l(s), μl(¿ (s)), |Nl(s)|). (5)l(s) in (4) and the approximation toPl(s) in (5). First, there is double counting in the estimate of ¤l(s). In particular, if 2 ambulancesa ¤ subregions. To be more precise, if there are three subregions 1, a2 2 Nl(s) can both reach subregion ` within the time threshold, then the summation forl(s) counts C`(¿ (s)) twice. Second, C`(¿ (s)) could be counted in the demand rates for multiplel1, l2, ` and two ambulancesa then the summations for ¤1 2 Nl1(s), a2 2 Nl2(s) such that both a1 and a2 can reach subregion ` within the time threshold,l1(s) and ¤l2(s) both count C`(¿ (s)). Therefore, we typically haveP l l factor 13L=1 ¤l(s) > PL=1 Cl(¿ (s)). To overcome this problem, we dilute the call arrival rates by a·. In particular, we use the call arrival rate ·C`(¿ (s)) in (4) instead of C`(¿ (s)) soPage 13 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60that we may roughly have l l preliminary experimentation. Interestingly, as we will see in our computational results, it is not necessarily the case that the best choice of tuneable parameter in our ADP approach that requires experimentation.PL=1 ¤l(s) = PL=1 ·Cl(¿ (s)). We find a good value for · through· lies in (0, 1). We emphasize that · is the only4.5. Future Uncovered Call RateIn certain settings, we may not be willing to redeploy ambulances that are already in transit to a particular location. In this case, from the perspective of covering future calls, the ambulance destinations are as important as their current locations. This is the motivation underlying the fifth and sixth basis functions. Our fifth basis function parallels the third one, replacing the true locations of ambulances by their destinations. The definition of this basis function is therefore identical to that of of the ambulances that we use to compute future configuration that is obtained by letting all ambulances in transit reach their destinations and all stationary ambulances remain at their current locations. To be precise, given that the system is in state new state l 1 l this case, we can write the fifth basis function asÃ3, but the configurationNl(s) is not the current one, but rather, an estimateds = (¿, e,A,C) with A = (a1, . . . , aN) and ai = (¾ai , oai , dai , tai ), we define a~s(s) = (¿ + 1/PL=1 Cl(¿ ), e, ~ A,C) with ~A = (~a1, . . . ,~aN) and ~ai = (~¾ai , dai , dai , ¿ +/PL=1 Cl(¿ )), where ~¾ai is the status of ambulance ai when it reaches its destination dai . InÃ5(s) =L Xl=1Cl(¿ (~s(s))) 1(Nl(~s(s)) = 0).The time l next call arrival. The next call may arrive before or after the ambulances actually reach their destinations, but we heuristically use the time that the estimated future configuration of the ambulances time ¿ (~s(s)) = ¿ + 1/PL=1 Cl(¿ ) is used as an approximation to the expected time of the¿ (~s(s)) simply to look into the future. The idea is~A is more likely to hold at the future¿ (~s(s)) than at the current time ¿ (s).4.6. Future Missed Call RateThe sixth basis function, lost due to queueing congestion. As with the fifth basis function, it uses an estimated future configuration of the ambulances. It is defined asÃ6, parallels Ã4 in that it captures the rate of calls that will likely beÃ6(s) =L Xl=1Cl(¿ (~s(s))) Pl(~s(s)).14Page 14 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 605. Computational Results on EdmontonIn this section, we present computational results for the EMS system in the city of Edmonton, Alberta in Canada. We begin with a description of the data set along with our assumptions.5.1. Experimental SetupOur data are based on the data set used in the computational experiments in Ingolfsson, Erkut, and Budge (2003). The city of Edmonton has a population of 730,000 and covers an area of around 40 assume for simplicity that all ambulances are of the same type and operate all day. An ambulance, upon arriving at a call scene, treats the patient for an exponentially distributed amount of time with mean 12 minutes. After treating the patient at the call scene, the ambulance transports the patient to a hospital with probability 0.75. The probability distribution for the hospital chosen is inferred from historical data. The time an ambulance spends at the hospital has a Weibull distribution with mean 30 minutes and standard deviation 13 minutes. The turn-out time for the ambulance crews is 45 seconds. In other words, if the ambulance crew is located at a base when it is notified of a call, then it takes 45 seconds to get ready. An ambulance crew already on the road does not incur the turn-out time. The road network that we use in our computational experiments models the actual road network on the avenue level. There are 252 nodes and 934 arcs in this road network. The travel times are deterministic and do not depend on the time of the day. There was not enough historical data to develop a detailed model of the call arrivals so we proceeded with a revealing, but not necessarily realistic, model. The model maintains a constant overall arrival rate, but the distribution of the location of calls changes in time. We divided the city of Edmonton into 20× 30 km2. The EMS system includes 16 ambulances, 11 bases and 5 hospitals. We×17 subregions and assumed that the rate of call arrivals in subregionl at time t is given byCl(t) = C£°l + ±l sin(2¼t/24)¤,where satisfy t is measured in hours. In the expression above, C, °l and ±l are fixed parameters thatC ¸ 0, °l ¸ |±l|, P340l=1 °l = 1 and P340l=1 ±l = 0. We have P340l=1 °l +P340l so that we can interpret as the probability that a call arriving at time arrival rate in subregion arrival rate in subregion day in subregion 7 calls per hour. We chose appropriate values of the business subregions early in the day and higher call arrival rates in the residential subregions later in the day. 15=1 ±l sin(2¼t/24) = 1C as the total call arrival rate into the system and °l + ±l sin(2¼t/24)t falls in subregion l. If ±l > 0, then the peak calll occurs at hours {6, 30, 54, . . .}, whereas if ±l < 0, then the peak calll occurs at hours {18, 42, 66, . . .}. The average call arrival rate over al is C°l. We estimated C and °l using historical data and C ranged from 3 to±l so that we have higher call arrival rates inPage 15 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60We implemented ADP on top of the BartSim simulation engine developed by and Mason (2004). In all of our computational experiments, we use a discount factor of day. The simulation horizon is 14 days. We use 30 sample paths in Step 3 of the approximate policy iteration algorithm. After some experimentation, we found that sixth basis functions gave the best results. We use the static redeployment policy as a benchmark. In particular, the static redeployment policy preassigns a base to each ambulance and redeploys each ambulance back to its preassigned base whenever it becomes free after serving a call. We found a good static redeployment policy by simulating the performance of the system under a large number of possible base assignments and choosing the base assignment that gave the best performance.® = 0.85 per· = 0.1 in the fourth and5.2. Baseline PerformanceIn our first set of computational experiments, we test the performance of ADP under the assumption that the only time we can redeploy an ambulance is when it becomes free after serving a call. We do not redeploy ambulances at other times. Following the discussion of Section 2.2, this is equivalent to setting free after serving a call, in which case just becoming free. The motivation for using this redeployment strategy is that it minimizes the disturbance to the crews and never redeploys an ambulance that is located at an ambulance base. Figure 1 shows the performance of ADP. The horizontal axis gives the iteration number in the approximate policy iteration algorithm and the vertical axis gives the percentage of calls not reached within the threshold response time. The plot gives the percentage of calls missed by the greedy policy induced by the value function approximation at a particular iteration. The dashed horizontal line shows the best performance attained by ADP, while the thin horizontal line shows the performance of the best static policy. ADP attains a good policy within two iterations and then bounces around this policy. The best ADP policy misses 21.9% ( the calls, whereas the best static redeployment policy misses 25.3% ( figures are undiscounted numbers of missed calls.) The ADP approach does not converge on a single policy. This lack of convergence is not a concern from a practical viewpoint since we can simply choose the best policy that is obtained during the course of the approximate policy iteration algorithm and implement this policy in practice. To emphasize the importance of choosing the basis functions carefully, Figure 2 shows the performance of ADP when we use an arrival rate in the fourth and sixth basis functions. The best policy obtained through ADP misses 23.6% 16R(s) = ?, except when e(s) corresponds to an ambulance becomingR(s) is the singleton set corresponding to the ambulance± 0.2%) of±0.2%) of the calls. (These¸ = Cl(¿ (s)) instead of the quantity in (4)Page 16 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 605 10 15 20 25 21 21.5 22 22.5 23 23.5 24 24.5 25 25.5 26 Average Number of Lost Calls (% total Calls) Iteration NumberFigure 1: Performance of ADP and the benchmark policy. ( calls in Figure 1. Furthermore, the fluctuation in the performance of the policies in Figure 2 is much more drastic when compared with Figure 1. The results indicate that choosing the basis functions carefully makes a significant difference in the performance of ADP.± 0.2%) of the calls. This is in contrast with the best policy missing 21.9% (± 0.2%) of the5.3. Comparison with Random SearchFor a fixed set of basis functions {Ãp(·) : p = 1, . . . , P}, a set of values for the tuneable parametersr approximation induces a greedy policy. Therefore, a brute-force approach for finding a good set of values for the tuneable parameters is to carry out a random search over an appropriate subset of sets of values for the tuneable parameters. To implement this idea, we first use ADP to obtain a good set of values for the tuneable parameters. Letting = {rp : p = 1, . . . , P} characterize a value function approximation J(·, r) and this value functionRP and use simulation to test the performance of the greedy policies induced by the different{ˆrp : p = 1, . . . , P} be this set of values, we sample r = {rp : p = 1, . . . , P}uniformly over the box 2 2 2 2 the performance of the greedy policy induced by the value function approximation sampled 1,100 sets of values for the tuneable parameters and this provides 1,100 value function approximations. Figure 3 gives a histogram for the percentage of the calls missed by the greedy policies induced by these 1,100 value function approximations. The vertical lines correspond to the percentage of calls missed by the best policy obtained by ADP and the best static redeployment policy. The figure indicates that very few (3%) of the sampled sets of values for the tuneable parameters provide better performance than the best policy obtained by ADP. 17£ˆr1− 1ˆr1, ˆr1+ 1ˆr1¤×. . .×£ˆrP − 1ˆrP , ˆrP + 1ˆrP ¤and use simulation to testJ(·, r). WePage 17 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 605 10 15 20 25 23 24 25 26 27 28 29 30 31 32 33 34 Average Number of Lost Calls (% total Calls) Iteration NumberFigure 2: Performance of ADP with poorly chosen basis functions. On the other hand, approximately 28% of the samples provide better performance than the best static redeployment policy. The random search procedure we use is admittedly rudimentary and one can use more sophisticated techniques to focus on the more promising areas of the search space. Nevertheless, our results indicate that when one looks at the broad landscape of the possible values for the tuneable parameters, ADP is quite effective in identifying good parameters. Moreover, the computation time required by the random search procedure is on the order of several days, whereas the computation time required by ADP is a few hours.5.4. Making Additional RedeploymentsThe computational experiments in Section 5.2 and 5.3 allow redeployments only when an ambulance becomes free after serving a call. We now explore the possibility of improving performance by allowing additional ambulance redeployments. Define an additional event type “consider redeployment†and schedule an event of this type with a certain frequency that is detailed below. Whenever an event of this type is triggered, we consider redeploying the ambulances that are either at a base or returning to a base, so that ambulances at such times. The set an ambulance becoming free after serving a call, and at all other events, We use two methods to vary the redeployment frequency. In the first method, we equally space consider-redeployment events to obtain frequencies between 0 and 15 per hour. In the second method, the frequency of consider-redeployment events is fixed at 24 per hour, but we make a redeployment only when the estimated benefit from making the redeployment exceeds 18R(s) can contain multipleR(s) continues to be a singleton when e(s) corresponds toR(s) = ?.Page 18 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 6018 20 22 24 26 28 30 32 34 36 0 10 20 30 40 50 60 70 80 90 Average percentage of missed calls Frequency ADP*Static*Figure 3: Performance of the 1,100 greedy policies obtained through random search. the estimated benefit from not making the redeployment by a significant margin. More precisely, letting matrix of zeros corresponding to the decision matrix of not making a redeployment, if we have argmin² 2 [0, 1) be a tolerance margin and using ¯0(s) to denote the |R(s)| × |B| dimensionalu2U(s) (Ehg(s, u, f(s, u,X(s, u))) + ®¿(f(s,u,X(s,u)))−¿(s) J(f(s, u,X(s, u)), r)i)· (1 − ²)Ehg(s,¯0, f(s, u,X(s, ¯0))) + ®¿(f(s,¯0,X(s,¯0)))−¿(s) J(f(s,¯0,X(s, ¯0)), r)i,then we make the redeployment decision indicated by the optimal solution to the problem on the left-hand side. Otherwise, we do not make a redeployment. Larger values of frequency of redeployments. We vary Figure 4 shows the performance improvement obtained by the additional redeployments. The horizontal axis gives the frequency of the redeployments measured as the number of redeployments per ambulance per day. The vertical axis gives the percentage of missed calls. The dashed (solid) data line corresponds to the first (second) method of varying the redeployment frequency. From Figure 1 we miss 21.9% of the calls without making any additional redeployments. By making about 10 additional redeployments per ambulance per day, we can decrease the percentage of missed calls to 20.2%. Beyond this range, we reach a plateau and additional redeployments do not provide much improvement. Another important observation is that the second method tends to provide significantly better performance improvements with the same frequency of additional redeployments. For example, the second method reduces the percentage of missed calls to 20.2% with 10 additional redeployments per ambulance per day, whereas the first method needs 15 additional redeployments per day to reach the same level. 19² decrease the² between 0.1 and 0.005.Page 19 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 600 5 10 15 20 25 30 19 19.5 20 20.5 21 21.5 22 22.5 Efficient Frontier: improvement vs. relocations per ambulance−day Relocations per ambulance−day Average Number of Missed Calls (% of total calls) Regularly scheduled relocs. var. freqs Selective relocsFigure 4: Performance of ADP as a function of the frequency of the additional redeployments. Therefore, it appears that making redeployments only when the value function approximation signals a significant benefit is helpful in avoiding pointless redeployments.6. Computational Results on a Second Metropolitan AreaIn this section, we present the performance of ADP on the EMS system operating in a second metropolitan area. This EMS system is also studied in s (2007). We cannot disclose the name of the metropolitan area due to confidentiality agreements.6.1. Experimental SetupThe population of the city in question is more than 5 times that of Edmonton and its size is around 180 times, 88 bases and 22 hospitals. The turn-out times, call-scene times and hospital-transfer times are comparable to those in Edmonton. We were able to use a detailed model for determining to which hospital a patient needs to be transported. In particular, the destination hospital depends on the location of the call. Calls originating at a given location are transported to any of a small set of hospitals – usually no more than 2 or 3 out of the 22 hospitals in the system. The corresponding probabilities are inferred from historical data. We assume that all ambulances are of the same type and a call that is not served within 8 minutes is interpreted as missed. The road network that we use in our computational experiments models the actual network on the avenue level and there are 4,955 nodes and 11,876 arcs in this road network. 20× 100 km2. The EMS system includes up to 97 ambulances operating during peakPage 20 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60The call arrival model that we used is quite realistic. The data that we used were collected from one year of operations of the EMS system and consisted of aggregated counts of calls for each hour of the week during a whole year, for each of 100 irregular and non-convex shape of the metropolitan area, roughly 80% of these zones had zero total call counts and did not intervene in the dynamics. From the remaining 20% a significant percentage had very low hourly counts of at most 5 calls. Thus, it was necessary to apply smoothing procedure for the lowest intensity zones so as to reduce the sampling noise. In order to do this, we first classified the zones in a few groups according to their average intensity along the week. Then, for the lowest intensity groups, we computed a total intensity for each hour and then distributed this total intensity uniformly among the zones in this group. In this way we obtained an intensity model that combined a uniform low intensity background with actual (accurate) counts on the highest intensity zones. The average call arrival rate was 570 calls per day and fluctuated on any day of the week from a low of around 300 calls per day to a high of around 750 calls per day. We used the actual base assignments and ambulance shifts as a benchmark. In the EMS system, a maximum of 97 ambulances operate at noon. Out of these, 66 ambulances work all day in two 12 hour shifts and the remaining 31 have single shifts typically ranging from 10 to 12 hours per day. The resulting shift schedule provides 1,708 ambulance hours per day. Ambulances are redeployed to their preassigned bases whenever they become free after serving a call. We refer the reader to s (2007) for details on the ambulance shifts.× 100 geographic zones. Due to the6.2. Baseline PerformanceFigure 5 shows the performance of ADP. The interpretations of the axes in this figure are the same as those in Figures 1 and 2. The solid horizontal line plots the percentage of calls missed by the benchmark policy and the dashed horizontal line plots the best performance obtained using ADP. The solid data line plots the performance of ADP when we use function, whereas the dashed line plots the performance when we use was found by experimentation to give the best policy and least fluctuations across iterations of all the values tried in the interval (0 For both values of perform very similarly. The improvements in the number of reached calls are roughly 3 4 than with some initial experimentation. 21· = 2 in the fourth basis· = 1. The value · = 2, 2].·, the best ADP policies are attained in one iteration and these policies.0% and.0% in the case of · = 1 and · = 2. We get significantly more stable performance with · = 2· = 1. This indicates that it is important to carefully tune the parameter · throughPage 21 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 602 4 6 8 10 12 14 25 30 35 40 45 50 55 Average # of missed calls (% of total number of calls) Iteration Numberk = 1.0k = 2.0Figure 5: Performance of ADP and the benchmark policy.6.3. Effect of Turn-Out TimeRecall that if the ambulance crew is stationed at a base when it is notified of a call, then it takes 45 seconds to get ready, i.e., the turn-out time is 45 seconds. On the other hand, an ambulance crew that is already on the road does not incur turn-out time. A potential argument against ambulance redeployment is that any gains are simply due to ambulance crews being on the road more often, and therefore incurring less turn-out time delays. To check the validity of this argument, we used ADP under the assumption that turn-out time is zero. In other words, an ambulance crew does not take any time to get ready, even if it is located at a base when it is notified of a call. Table 1 shows the results for two different call arrival regimes (633 calls per day and 846 calls per day). The third and fourth rows show the absolute and relative improvement of ADP over the benchmark strategy. The results indicate that ADP provides significant improvement over the benchmark strategy in all cases. At first sight, this improvement seems slightly smaller for the cases without turn-out time, but the relative improvement is roughly the same in all cases.6.4. Varying Call Arrival Rates and Fleet SizesWe now explore the effect of different call arrival rates and fleet sizes on the performance of ADP. We first carry out a number of runs under the experimental setup described in Section 6.1, but we vary the call arrival rates by uniformly scaling them by a constant factor. Recall that the average call arrival rate in the experimental setup in Section 6.1 is around 570 calls per day. Figure 6 shows the percentage of the calls that are missed by the best policy obtained by ADP 22Page 22 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60633 calls per day Turn out Turn out = 45 secs. = 0 secs. % of calls missed 27.5 23.0 by benchmark % of calls missed 23.9 19.3 by ADP Improvement 3.6 3.2 Rel. improvement 3.6 / 27.5 3.2 / 23.0 = 0.13 = 0.139 846 calls per day Turn out Turn out = 45 secs. = 0 secs. % of calls missed 35.5 30.7 by benchmark % of calls missed 30.5 26.3 by ADP Improvement 4.9 4.5 Rel. improvement 4.9 / 35.5 4.5 / 30.7 = 0.139 = 0.145Table 1: Effect of turn-out time on the performance gap between ADP and the benchmark policy.450 500 550 600 650 700 750 800 850 20 25 30 35 40 Average # of calls/day Average # of missed calls lost (% of total number of calls) Benchmark Policies Best ADP policiesFigure 6: Performance of ADP and benchmark policy for different call arrival rates. and the benchmark strategy as a function of the average call arrival rate. The solid data line corresponds to ADP, whereas the dashed data line corresponds to the benchmark strategy. ADP provides substantial improvements over all call arrival regimes. The improvements provided by ADP are more significant when the call arrival rate is relatively high. It appears that when the call arrival rate is high, there is more room for improvement by redeploying ambulances carefully and ADP effectively takes advantage of this greater room for improvement. In a second set of computational experiments, we hold the call arrival rate constant and vary the number of ambulances in the fleet. Figure 7 shows the performance of ADP and the benchmark strategy as a function of the fleet size. Since we do not have access to the base assignments used by the benchmark strategy for different fleet sizes, we modify the base assignments described in Section 6.1 heuristically. In particular, to produce a base assignment with fewer ambulances, we choose the ambulances assigned to bases serving low demand and take them off shift. Similarly, to produce a base assignment with extra ambulances, we add 23Page 23 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 6080 85 90 95 100 20 25 30 35 40 45 Number of Ambulances Average # of missed calls (% of total number of calls) Static Policy Best ADP redeployment policyFigure 7: Performance of ADP and benchmark policy for different fleet sizes. ambulances to the bases with the highest demand. The results indicate that ADP performs consistently better than the benchmark policy. An important observation from Figure 7 is that if our goal is to keep the percentage of the missed calls below a given threshold, say 28%, then ADP allows us to reach this goal with roughly 5 or 6 fewer ambulances than the benchmark policy. This translates into significant cost savings in an EMS system. It is also interesting that the performance of the benchmark policy does not improve significantly beyond 97 or 98 ambulances in the fleet, whereas ADP continues to provide progressively lower missed-call rates. This partly reflects the quality of our heuristic benchmark strategy, but it also indicates that a redeployment strategy can help mitigate poor static allocations and make effective use of extra capacity.7. ConclusionsIn this paper, we formulated the ambulance redeployment problem as a dynamic program and used an approximate version of the policy iteration algorithm to deal with the high-dimensional and uncountable state space. Extensive experiments on two problem scenarios showed that ADP can provide high-quality redeployment policies. The basis functions that we constructed open up the possibility of using other approaches, such as temporal-difference learning and the linearprogramming approach, to tune the parameters exploring the linear-programming approach. Other future research will incorporate additional degrees of realism into our model. We plan to include stochastic travel times, multiple call priorities, other cost functions and more realistic 24{rp : p = 1, . . . , P}. Indeed, we are currentlyPage 24 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60ambulance dynamics that involve multiple ambulances serving certain calls. Incorporating these complexities may require constructing additional basis functions.AcknowledgmentsWe thank Currell, Armann Ingolfsson and Mason for their advice and support, and for providing the data sets used in our computational experiments. This work was partially supported by National Science Foundation Grants DMI 0400287 and DMI 0422133.ReferencesAdelman, D. 2004. A price-directed approach to stochastic inventory routing. Research Adelman, D. 2007. Dynamic bid-prices in revenue management. Adelman, , Adam J. Mersereau. 2007. Relaxations of weakly coupled stochastic dynamic programs. Andersson, T. 2005. Decision support tools for dynamic fleet management. Ph.D. thesis, Department of Science and Technology, Linkoepings Universitet, Norrkoeping, Sweden. Andersson, T., P. Vaerband. 2007. Decision support tools for ambulance dispatch and relocation.Operations52 499–514.Operations Research 55 647–661.Operations Research .Journal of the Operational Research Society Berman, O. 1981a. Dynamic repositioning of indistinguishable service units on transportation networks. Berman, O. 1981b. Repositioning of distinguishable urban service units on networks. and Operations Research Berman, O. 1981c. Repositioning of two distinguishable service vehicles on networks. Transactions on Systems, Man, and Cybernetics Bertsekas, D.P., S.E Shreve. 1978. Academic Press, New York. Bertsekas, D.P., J.N. Tsitsiklis. 1996. Massachusetts. Brotcorne, L., G. Laporte, F. Semet. 2003. Ambulance location and relocation models. Journal of Operations Research de Farias, D. P., B. Van Roy. 2003. The linear programming approach to approximate dynamic programming. Farias, V. F., B. Van Roy. 2006. Tetris: A study of randomized constraint sampling. G. Calafiore, F. Dabbene, eds., Springer-Verlag. Farias, V. F., B. Van Roy. 2007. An approximate dynamic programming approach to network revenue management. Tech. rep., Stanford University, Department of Electrical Engineering. Gendreau, M., G. Laporte, S. Semet. 2001. A dynamic model and parallel tabu search heuristic for real time ambulance relocation. 2558 195–201.Transportation Science 15.Computers8 105–118.IEEESMC-11.Stochastic Optimal Control: The Discrete Time Case..Neuro-Dynamic Programming. Athena Scientific, Belmont,European147 451–463.Operations Research 51 850–865.Probabilistic and Randomized Methods for Design Under Uncertainty.Parallel Computing 27 1641–1653.Page 25 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60Gendreau, M., G. Laporte, S. Semet. 2006. The maximal expected coverage relocation problem for emergency vehicles. Goldberg, J. B. 2007. Personal Communication. , S. G., A. J. Mason. 2004. Ambulance service planning: Simulation and data visualisation. M. L. Brandeau, F. Sainfort, W. P. Pierskalla, eds., Health Care: A Handbook of Methods and Applications and Management Science, Kluwer Academic, 77–102. Ingolfsson, A. 2006. The impact of ambulance system status management. Presentation at 2006 INFORMS Conference. Ingolfsson, A., E. Erkut, S. Budge. 2003. Simulation of single start station for Edmonton EMS.Journal of the Operational Research Society 57 22–28.Operations Research and. Handbooks in Operations ResearchJournal of the Operational Research Society Kolesar, P., W. E. . 1974. An algorithm for the dynamic relocation of fire companies.54 736–746.Operations Research Nair, R., E. -Hooks. 2006. A case study of ambulance location and relocation. Presentation at 2006 INFORMS Conference. , Warren B. 2007. Dimensionality Restrepo, M., S.G , H. Topaloglu. 2007. Erlang loss models for the static deployment of ambulances. Submitted. s, D. P. 2007. Optimised ambulance redeployment strategies. Master’s thesis, Department of Engineering Science, University of Auckland, Auckland, New Zealand. Schweitzer, P., A. Seidmann. 1985. Generalized polynomial approximations in Markovian decision processes. Si, Jennie, G. Barto, Warren B. , Wunsch II, eds. 2004. Learning and Approximate Dynamic Programming Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. 22 249–274.Approximate Dynamic Programming: Solving the Curses of. Wiley & Sons, Hoboken, NJ.Journal of Mathematical Analysis and Applications 110 568–582.Handbook of. Wiley-Interscience, Piscataway, NJ.Machine Learning3 Topaloglu, H., W. B. . 2006. Dynamic programming approximations for stochastic, timestaged integer multicommodity flow problems. Tsitsiklis, J., B. Van Roy. 1997. An analysis of temporal-difference learning with function approximation. Tsitsiklis, J. N. 1994. Asynchronous stochastic approximation and 9–44.INFORMS Journal on Computing 18 31–42.IEEE Transactions on Automatic Control 42 674–690.Q-learning. Machine Learning16 Tsitsiklis, J.N, B. Van Roy. 2001. Regression methods for pricing complex American-style options. Van Roy, , Dimitri P. Bertsekas, Yuchun Lee, N. Tsitsiklis. 1997. A neuro dynamic programming approach to retailer inventory management. on Decision and Control Watkins, C. J. C. H., P. Dayan. 1992. Yan, X., P. Diaconis, P. Rusmevichientong, B. Van Roy. 2005. Solitaire: Man versus machine.185–202.IEEE Transactions on Neural Networks 12 694–703.Proceedings of the IEEE Conference.Q-learning. Machine Learning 8 279–292.Advances in Neural Information Processing Systems 2617.Page 26 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60  Joby Berkley - Owner Jobo's Hamburger Grill 6548 Lake Worth Blvd., # 200 Lake Worth, Texas 76135 www.joboshamburgergrill.com Work Fax Cell joby@...   SSM makeover??? Ambulance Placement Research For those of you that might be interested in the new study being conduted by Cornell University Dr. Shane here is his acedemic science paper. Some of you may night be interested in it, as it contains a huge amount of formulas, but you might enjoy the result section. This was sent to me by Dr. and it shows the complexity of unit placement.  Just FYI  Joby Berkley - Owner Jobo's Hamburger Grill 6548 Lake Worth Blvd., # 200 Lake Worth, Texas 76135 www.joboshamburgergrill.com Work Fax Cell joby@... Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.