SSM makeover??? Ambulance Placement Research

June 20, 2008

For those of you that might be interested in the new study being conduted by

Cornell University Dr. Shane here is his acedemic science paper. Some

of you may night be interested in it, as it contains a huge amount of formulas,

but you might enjoy the result section. This was sent to me by Dr. and

it shows the complexity of unit placement.

Just FYI

Joby Berkley - Owner

Jobo's Hamburger Grill

6548 Lake Worth Blvd., # 200

Lake Worth, Texas 76135

www.joboshamburgergrill.com

Work

Fax

Cell

joby@...

June 20, 2008

Approximate Dynamic Programming for Ambulance RedeploymentJournal: INFORMS

Journal on ComputingManuscript ID: draft

Manuscript Type: Original Article

Date Submitted by the

Author:

n/a

Complete List of Authors: Restrepo, Mateo; Cornell University, C.A.M (Center for

Applied

Math); Cornell University, School of Operations Research and

Information Engineering

Topaloglu, Huseyin; Cornell University, School of Operations

Research and Information Engineering

, Shane; Cornell University, School of Operations

Research and Information Engineering

Keywords:

067 Dynamic programming-optimal control : Applications, 272

Health care, Ambulance service, 762 Simulation, Applications, 314

Information systems : Decision support systemshttp://joc.pubs.informs.orgINFORMS

Journal on Computing: For Review OnlyApproximate Dynamic Programming for

Ambulance

RedeploymentMateo RestrepoCenter for Applied Mathematics

Cornell University, Ithaca, NY 14853, USA

mr324@... G. , Huseyin TopalogluSchool of Operations

Research and Information Engineering

Cornell University, Ithaca, NY 14853, USA{

April 30, 2008sgh9,ht88}@... present an approximate dynamic

programming approach for making ambulance

redeployment decisions in an emergency medical service system. The primary

decision

is where we should redeploy idle ambulances so as to maximize the number of

calls reached

within a given delay threshold. We begin by formulating this problem as a

dynamic

program. To deal with the high-dimensional and uncountable state space in the

dynamic

program, we construct approximations to the value functions that are

parameterized by a

small number of parameters. We tune the parameters of the value function

approximations

using simulated cost trajectories of the system. Computational experiments

demonstrate

the performance of the approach on emergency medical service systems in two

metropolitan

areas. We report practically significant improvements in performance relative to

benchmark

static policies.Page 1 of 26http://joc.pubs.informs.orgINFORMS Journal on

Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

601. IntroductionRising costs of medical equipment, increasing call volumes and

worsening traffic conditions are

putting emergency medical service (EMS) managers under pressure to meet

performance goals set

by regulators or contracts. Ambulance redeployment is one possible approach that

can potentially

help EMS managers. Ambulance redeployment, also known as relocation, move up,

system-status

management or dynamic repositioning, refers to any strategy by which a

dispatcher repositions

idle ambulances to compensate for others that are busy, and hence, unavailable.

The increasing

availability of geographic information systems and the increasing affordability

of computing power

have finally created ideal conditions for bringing real-time ambulance

redeployment approaches

to fruitful implementation.

In this paper, we present an approximate dynamic programming (ADP) approach for

making real-time ambulance redeployment decisions. We begin by formulating the

ambulance

redeployment problem as a dynamic program. This dynamic program involves a

high-dimensional

and uncountable state space and we address this difficulty by constructing

approximations to the

value functions that are parameterized by a small number of parameters. We tune

the parameters

of the value function approximations through an iterative and simulation-based

method. Each

iteration of the simulation-based method consists of two steps. In the first

step, we simulate the

trajectory of the greedy policy induced by the current value function

approximation and collect

cost trajectories of the system. In the second step, we tune the parameters of

the value function

approximation by solving a regression problem that fits the value function

approximation to

the collected cost trajectories. This yields a new set of parameters that

characterize new value

function approximations, and so, we can go back and repeat the same two steps.

In this respect,

the idea we use closely resembles the classical policy iteration algorithm in

the Markov decision

process literature. In particular, the first (second) step is analogous to the

policy-evaluation

(policy-improvement) step of the policy iteration algorithm.

There are two streams of literature that are related to our work. The first one

is the literature

on ADP. A generic approach for ADP involves using value function approximations

of the formP

p

fixed basis functions; see Bertsekas and Tsitsiklis (1996), (2007). There

are a number

of methods to tune the parameters

p

approximation to the value function. For example, temporal difference learning

and Q-learning

use stochastic approximation ideas in conjunction with simulated trajectories of

the system

to iteratively tune the parameters; see Sutton (1988), Watkins and Dayan (1992),

Tsitsiklis

(1994), Bertsekas and Tsitsiklis (1996), Tsitsiklis and Van Roy (1997), Si,

Barto, , and

Wunsch II (2004). On the other hand, the linear-programming approach for ADP

finds a good

set of values for the parameters by solving a large linear program whose

decision variables are

2P=1 rp Ãp(Â·), where {rp : p = 1, . . . , P} are tuneable parameters and

{Ãp(Â·) : p = 1, . . . , P} are{rp : p = 1, . . . , P} so that PP=1 rp Ãp(Â·)

yields a goodPage 2 of 26http://joc.pubs.informs.orgINFORMS Journal on

Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60{

and Mersereau (2007). Both classes of approaches are aimed at tuning the

parameters rp : p = 1, . . . , P}; see Schweitzer and Seidmann (1985), de Farias

and Van Roy (2003), Adelman{rp

p

choice of the basis functions

an art form, requiring substantial knowledge of the problem structure.

Applications of ADP

include inventory control (Van Roy, Bertsekas, Lee, and Tsitsiklis, 1997),

inventory routing

(Adelman, 2004), option pricing (Tsitsiklis and Van Roy, 2001), game playing

(Yan, Diaconis,

Rusmevichientong, and Van Roy, 2005; Farias and Van Roy, 2006), dynamic fleet

management

(Topaloglu and , 2006), and network revenue management (Adelman, 2007;

Farias and

Van Roy, 2007).

The second stream of literature that is related to our work is the literature on

ambulance

redeployment. One class of models involves solving integer programs in real time

whenever an

ambulance redeployment decision needs to be made; see Kolesar and (1974),

Gendreau,

Laporte, and Semet (2001), Brotcorne, Laporte, and Semet (2003), Gendreau,

Laporte, and

Semet (2006), Nair and -Hooks (2006), s (2007). The objective

function in

these integer programs generally involves a combination of backup coverage for

future calls

and relocation cost of ambulances. These models are usually computationally

intensive, since

they require solving an optimization problem every time a decision is made. As a

result, a

parallel computing environment is often (but not always) used to implement a

working real-time

system. A second class of models is based on solving integer programs in a

preparatory phase.

This approach provides a lookup table describing, for each number of available

ambulances,

where those ambulances should be deployed. Dispatchers attempt to dispatch so as

to keep

the ambulance configuration close to the one suggested by the lookup table; see

Ingolfsson

(2006), Goldberg (2007). A third class of models attempts to capture the

randomness in

the system explicitly, either through a dynamic programming formulation or

through heuristic

approaches. Berman (1981a), Berman (1981b) and Berman (1981c) represent the

first papers that

provide a dynamic programming approach for the ambulance redeployment problem.

However,

these papers follow an exact dynamic programming formulation and, as is often

the case, this

formulation is tractable only in oversimplified versions of the problem with few

vehicles and

small transportation networks. Andersson (2005) and Andersson and Vaerband

(2007) make the

ambulance redeployment decision by using a â€œpreparednessâ€ function that

essentially measures

the capability of a certain ambulance configuration to cover future calls. The

preparedness

function is similar in spirit to the value function in a dynamic program,

measuring the impact

of current decisions on the future evolution of the system. However, the way the

preparedness

function is constructed is heuristic in nature.

When compared with the three classes of models described above, our approach

provides

a number of advantages. In contrast to the models that are based on integer

programs, our

3= 1, . . . , P} so that PP=1 rp Ãp(Â·) yields a good approximation to the

value function. The{Ãp(Â·) : p = 1, . . . , P}, on the other hand, is regarded

as more ofPage 3 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing:

For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60approach captures the random evolution of the system over time since it is

based on a dynamic

programming formulation of the ambulance redeployment problem. Furthermore, the

decisions

made by our approach in real time can be computed very quickly as this requires

solving a simple

optimization problem that minimizes the sum of the immediate cost and the value

function

approximation. In lookup table approaches, there may be more than one way to

redeploy the

ambulances so that the ambulance configuration over the transportation network

matches the

configuration suggested by the lookup table. Therefore, table lookup approaches

still leave

some aspects of dispatch decisions to subjective interpretation by dispatchers.

Our approach,

on the other hand, can fully automate the decision-making process. In

traditional dynamicprogramming

approaches, one is usually limited to very small problem instances, whereas ADP

can be used on problem instances with realistic dimensions. Our approach allows

working with a

variety of objective functions, such as the number of calls that are not served

within a threshold

time standard or the total response time for the calls. Furthermore, our

approach allows the

possibility of constraining the frequency and destinations of ambulance

relocations. This is

important since a relocation scheme should balance improvements in service

levels with the

additional demands imposed on ambulance crews.

In summary, we make the following research contributions. 1) We develop a

tractable ADP

approach for the ambulance redeployment problem. Our approach employs value

function

approximations of the form

p

tune the parameters

formulation of the problem, our approach is able to capture the random evolution

of the system

over time. 2) We develop a set of basis functions

function approximations for the ambulance redeployment problem. This opens up

the possibility

of using other ADP approaches, such as temporal difference learning and the

linear-programming

approach. 3) We provide computational experiments on EMS systems in two

metropolitan areas.

Our results indicate that ADP has the potential to obtain high-quality

redeployment policies in

real systems. They also show that our approach compares favorably with benchmark

policies

that are similar to those used in practice.

The remainder of this paper is organized as follows. In Section 2, we present a

dynamic

programing formulation for the ambulance redeployment problem. In Section 3, we

describe our

ADP approach. In Section 4, we discuss the basis functions that we use in our

value function

approximations. In Sections 5 and 6, we report computational results for two

metropolitan areas.

We conclude in Section 7 and point out some directions for further research.

4PP=1 rp Ãp(Â·) and uses sampled cost trajectories of the system to{rp : p = 1,

.. . . , P}. Since it is based on the dynamic programming{Ãp(Â·) : p = 1, . . .

, P} that yield good valuePage 4 of 26http://joc.pubs.informs.orgINFORMS Journal

on Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

602. Ambulance Redeployment as a Markov Decision ProcessIn this section, we

present a dynamic programming formulation of the ambulance redeployment

problem. As will shortly be clear, our model involves an uncountable state

space. For an

excellent account of the basic terminology, notation and fundamental results

regarding dynamic

programming in uncountable states spaces, we refer the reader to Bertsekas and

Shreve (1978).2.1. State SpaceTwo main components in the state of an EMS system

at a given time are the vectors

(

and each waiting call in the system. To simplify the presentation, we assume

that we do not

keep more than

This is not really a restriction from a practical perspective since

of each ambulance is given by a tuple A =a1, . . . , aN) and C = (c1, . . . ,

cM) containing information about the state of each ambulanceM waiting calls in

the system, possibly by diverting them to another EMS system.M can be quite

large. The statea = (Â¾a, oa, da, ta), where Â¾a is the status of the

ambulance,o

time of the ambulance movement. To serve a call, an ambulance first moves to the

call scene

and provides service at the scene for a certain amount of time. Following this,

the ambulance

possibly transports the patient to a hospital, and after spending some time at

the hospital, the

ambulance becomes free to serve another call. Therefore, the status of an

ambulance

â€œoff shift,â€ â€œidle at base,â€ â€œgoing to call,â€ â€œat call scene,â€

â€œgoing to hospital,â€ â€œat hospitalâ€ or

â€œreturning to base.â€ If the ambulance is stationary at location

ambulance is in motion, then a and da are respectively origin and destination

locations of the ambulance and ta is the startingÂ¾a can beo, then we have oa =

da = o. If theta corresponds to the starting time of this movement. Otherwise,t

status of the ambulance is â€œat hospital,â€ then

arrived at the hospital. This time is kept in the state of an ambulance to be

able to give a Markov

formulation for the non-Markovian elements in the system, such as

nonexponentially distributed

service times and deterministic travel times. Similarly, for a call, we have

where

arrived into the system and

â€œassigned to an ambulanceâ€ or â€œqueued for service.â€

We model the dynamics of the system as an event-driven process. Events are

triggered

by changes in the status of the ambulances or by call arrivals. Therefore, the

possible event

types in the system are â€œcall arrives,â€ â€œambulance goes off shift,â€

â€œambulance comes on shift,â€

â€œambulance departs for call scene,â€ â€œambulance arrives at call scene,â€

â€œambulance leaves call

scene,â€ â€œambulance arrives at hospital,â€ â€œambulance leaves hospitalâ€

and â€œambulance arrives

at base.â€ We assume that we can make decisions (for any ambulance, and not

just the one

involved in the event) only at the times of these events. This comes at the cost

of some loss of

5a corresponds to the starting time of the current phase in the service cycle.

For example, if theta corresponds to the time at which the ambulancec = (Â¾c,

oc, tc, pc),Â¾c is the status of the call, oc is the location of the call, tc is

the time at which the callpc is the priority level of the call. The status of a

call Â¾c can bePage 5 of 26http://joc.pubs.informs.orgINFORMS Journal on

Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60optimality, since making decisions between the times of the events may improve

performance.

From a practical perspective, however, events are frequent enough to allow ample

decision-making

opportunities.

By restricting our attention to the times of events, we visualize the system as

jumping from

one event time to another. Therefore, we can use the tuple

of the system, where

and

the state trajectory of the system can be written as

the system just after the

Throughout the paper, we use

when the state of the system is

the tuple s = (Â¿, e,A,C) to represent the stateÂ¿ corresponds to the current

time, e corresponds to the current event type,A and C respectively correspond to

the state of the ambulances and the calls. In this case,{sk : k = 1, 2, . . .},

where sk is the state ofkth event occurs. Note that the time is rolled into our

state variable.Â¿ (s) and e(s) to respectively denote the time and the event

types. In other words, Â¿ (s) and e(s) are the first two components ofs = (Â¿,

e,A,C).2.2. ControlsWe assume that calls are served in decreasing order of

priority, and within a given priority level

are served in first-in-first-out order. We further assume that the closest

available ambulance is

dispatched to a call. This is not an exact representation of reality, but is

close enough for our

purposes: We will show that ADP yields an effective redeployment strategy under

this dispatch

policy, and we expect that it will continue to do so for more realistic dispatch

policies.

Let

the system is

is

reallocation decisions in the binary matrices

feasible decisions is thenR(s) denote the set of ambulances that are available

for redeployment when the state ofs. Let uab(s) = 1 if we redeploy ambulance a

to base b when the state of the systems, and 0 otherwise. Letting B be the set

of ambulance bases, we can capture the potentialu(s) = {uab(s) : a 2 R(s), b 2

B}. The set ofU(s) = nu(s) 2 {0, 1}|R(s)|Ã—|B| :Xb2Buab(s) Â· 1 8 a 2 R(s)o.The

constraints in this definition simply state that each ambulance that is

considered for

redeployment can be redeployed to at most one base. An important assumption is

that the

cardinality of

by enumerating over all feasible solutions.

We can use different definitions of

example, all available ambulances are considered for relocation if

ambulances that are either idle at base or returning to base. But this may lead

to impractically

many relocations, so we can restrict

to an ambulance becoming free after serving a call, in which case we can take

singleton set corresponding to the newly freed ambulance. Here the ambulance

crews are â€œon the

6U(s) is small so that an optimization problem over this feasible set can be

solvedR(s) to permit different amounts of relocation. ForR(s) denotes the set

ofR(s), for example to ? unless the event e(s) correspondsR(s) equal to thePage

6 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60roadâ€ when considered for relocation so there is no additional disruption to

the crews relative to

the standard â€œreturn to baseâ€ policy.2.3. Fundamental DynamicsCall arrivals

are generated across the region

a known arrival intensity

fixed policy for serving calls, but our general approach does not depend on the

particular form

of this policy. If there are no available ambulances to serve a call, then the

call is placed into

a waiting queue. An ambulance serving a call proceeds through a sequence of

events, including

arriving at the scene, treating the patient, transporting and handing over the

patient at the

hospital. The main source of uncertainty in this call service cycle are the

times spent between

events. We assume that probability distributions for all of the event durations

are known.

Besides these sources of randomness, the major ingredient governing the dynamics

is the

decision-making of dispatchers. As a result, the complete trajectory of the

system is given byR Â½ R2 according to a Poisson point process withC = {C(t, x,

y) : t Â¸ 0, (x, y) 2 R}. As mentioned above, we have a{(sk, uk) : k = 1, 2, . .

..}, where sk is the state of the system at the time of the kth event and ukis

the decision (if any) made by the dispatcher when the state of the system is

the dynamics of the system symbolically bysk. We capturesk+1 = f(sk, uk,Xk(sk,

uk)),where

of randomness described above. We do not attempt to give a more explicit

description of the

transition function

to point out that the preceding discussion and the fact that we can simulate the

evolution of the

system over time imply that an appropriate Xk(sk, uk) is a random element of an

appropriate space encapsulating all the sourcesf(Â·, Â·, Â·), since this does

not add anything of value to our treatment. It sufficesf(Â·, Â·, Â·) exists.2.4.

Transition CostsAlong with a transition from state

In our particular implementation, we use the cost functionsk to sk+1 through

decision uk, we incur a cost g(sk, uk, sk+1).g

<>

:(sk, uk, sk+1) =8>1 if the event

the call in question is urgent and the response time exceeds Â¢

0 otherwisee(sk+1) corresponds to an ambulance arrival at a call scene,.(1)

Here Â¢ is a fixed threshold response time that is on the order of 10 minutes.

Therefore, the

cost function (1) allows us to count the number of high priority calls whose

response times

exceed a threshold response time. We are interested in the performance of the

system over the

finite planning horizon [0, T], so let g(s, Â·, Â·) = 0 whenever Â¿ (s) > T. In

our implementation, T7Page 7 of 26http://joc.pubs.informs.orgINFORMS Journal on

Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60corresponds to two weeks. We have chosen this particular cost function because

of its simplicity,

and also because most performance benchmarks in the EMS industry are formulated

in terms of

the percentage of calls that are reached within a time standard. Our approach

allows other cost

functions without affecting the general treatment.2.5. Objective Function and

Optimality EquationA policy is a mapping from the state space to the action

space, prescribing which action to take

for each possible state of the system. Throughout the paper, we use

prescribed by policy

trajectory of the system

k

and the discounted total expected cost incurred by starting from initial state

Î¼(s) to denote the actionÎ¼ when the state of the system is s. If we follow

policy Î¼, then the state{sÎ¼: k = 1, 2, . . .} evolves according to sÎ¼+1 =

f(sÎ¼, Î¼(sÎ¼),Xk(sÎ¼, Î¼(sÎ¼)))s is given byJÎ¼(s) = Eh 1 Xk=1Â®

k

kÂ¿(sÎ¼) g(sÎ¼, Î¼(sÎ¼), sÎ¼+1) | sÎ¼1 = si,where

random variables

k

state. It is well-known that the policy

be found by computing the value function through the optimality equationÂ® 2 [0,

1) is a fixed discount factor. The expectation in the expression above involves

the{sÎ¼: k = 1, 2, . . .} and Â¿ (sÎ¼) is the time at which the system visits

the kthÎ¼Â¤ that minimizes the discounted total expected cost canJ(s) = minu

and letting

The difficulty with the optimality equation (2) is that the state variable takes

uncountably

many values. Even if we are willing to discretize the state variable to obtain a

countable state

space, the state variable is still a high-dimensional vector and solving the

optimality equation (2)

through classical dynamic-programming approaches is computationally very

difficult. In the next

two sections, we propose a method to construct tractable approximations to the

value function.

The discounting in (2) may seem odd, as it implies that we are minimizing the

expected2U(s)(Ehg(s, u, f(s, u,X(s, u))) + Â®Â¿(f(s,u,X(s,u)))âˆ’Â¿(s) J(f(s,

u,X(s, u)))i) (2)Î¼Â¤(s) be the minimizer of the right-hand side above; see

Bertsekas and Shreve (1978).discounted

computational device, in the sense that the discount factor is very helpful in

stabilizing the

ADP approach that we describe in the next two sections. The key issue is that

the effect of

relocation decisions can be small relative to the simulation noise, and

discounting appears to

mitigate this to some extent. This observation is in agreement with empirical

observations from

the case studies in Chapter 8 of Bertsekas and Tsitsiklis (1996). The increase

in stability is also

supported by the theoretical results in Chapter 6.2 of Bertsekas and Tsitsiklis

(1996). When

presenting our computational results in Sections 5 and 6, we report the

of calls that are not reached within the threshold response time. These results

indicate that

8number of calls that are not reached within the threshold time. This is purely

aundiscounted numberPage 8 of 26http://joc.pubs.informs.orgINFORMS Journal on

Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60although we construct the value function approximations with a view towards

minimizing the

expected

in minimizing the expected discounted cost, the same value function

approximations provide very good performanceundiscounted cost.3. Approximate

Dynamic ProgrammingThe ADP approach that we use to construct approximations to

the value function is closely

related to the traditional policy iteration algorithm in the Markov decision

processes literature.

We begin with a brief description of the policy iteration algorithm. Throughout

the rest of the

paper, the greedy policy induced by an arbitrary function Ë†

decision in the set

argminJ(Â·) refers to the policy that takes au

whenever the state of the system is 2U(s) (Ehg(s, u, f(s, u,X(s, u))) +

Â®Â¿(f(s,u,X(s,u)))âˆ’Â¿(s) Ë† J(f(s, u,X(s, u)))i) (3)s.Policy IterationStep 1.

Initialize the iteration counter

Step 2. (Policy improvement) Let

Step 3. (Policy evaluation) Let

cost incurred when starting from state

Step 4. Increase

In our setting the cost function

is finite, so Proposition 9.17 in Bertsekas and Shreve (1978) applies and hence

pointwise to the solution to the optimality equation (2).

The difficulty with the policy iteration algorithm is that when dealing with

uncountable state

spaces, it is not even feasible to store

incurred by using policy

of the formn to 1 and initialize J1(Â·) arbitrarily.Î¼n be the greedy policy

induced by Jn(Â·).Jn+1(Â·) = JÎ¼n(Â·), where JÎ¼n(s) denotes the expected

discounteds and using policy Î¼n.n by 1 and go to Step 2.g(Â·, Â·, Â·) is

nonnegative and the set of feasible decisions U(s)Jn(Â·) convergesJÎ¼n(Â·), let

alone compute the expected discounted costÎ¼n. We overcome this difficulty by

using value function approximationsJ(s, r) =P Xp=1rp Ãp(s).In the expression

above,

1

the parameters so that

9r = {rp : p = 1, . . . , P} are tuneable parameters and {Ãp(Â·) : p =, . . . ,

P} are fixed basis functions. The challenge is to construct the basis functions

and tuneJ(Â·, r) is a good approximation to the solution to the optimality

equationPage 9 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For

Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60(2). To achieve this, each basis function

the solution to the optimality equation. In Section 4, we describe one set of

basis functions that

work well. Once a good set of basis functions is available, we can use the

following approximate

version of the policy iteration algorithm to tune the parameters Ãp(Â·) should

capture some essential information about{rp : p = 1, . . . , P}.Approximate

Policy IterationStep 1. Initialize the iteration counter n to 1 and initialize

r1 = {r1p

Step 2. (Policy improvement) Let

Step 3. (Policy evaluation through simulation) Simulate the trajectory of policy

planning horizon [0: p = 1, . . . , P} arbitrarily.Î¼n be the greedy policy

induced by J(Â·, rn).Î¼n over the, T] for Q sample paths. Let {snk(

trajectory of policy q) : k = 1, . . . ,K(q)} be the stateÎ¼n in sample path q

and Gnk(

starting from state q) be the discounted cost incurred bysnk(

Step 4. (Projection) Letq) and following policy Î¼n in sample path q.rn+1 =

argminr2RP ( Q Xq=1K(q) Xk=1 Â£Gnk(q) âˆ’ J(snk(q), r)Â¤2).Step 5. Increase

In Step 3 of approximate policy iteration we use simulation to evaluate the

expected

discounted cost incurred by policy n by 1 and go to Step 2.Î¼n. Therefore, {Gnk(

the sampled cost trajectories of the system under policy

that the value function approximation

There is still one computational difficulty in the approximate policy iteration

algorithm.

When simulating the trajectory of policy

of the form (3) to find the action taken by the greedy policy induced by

problem involves an expectation that is difficult to compute. We resort to Monte

Carlo simulation

to overcome this difficulty. In particular, if the state of the system is

action taken by the greedy policy induced by

decisions in the feasible set

trajectory of the system until the next event and this provides a sample of

using multiple samples, we estimate the expectationq) : k = 1, . . . ,K(q), q =

1, . . . ,Q} areÎ¼n. In Step 4, we tune the parameters soJ(Â·, r) provides a

good fit to the sampled cost trajectories.Î¼n in Step 3, we need to solve an

optimization problemJ(Â·, rn). This optimizations and we want to find theJ(Â·,

rn) in this state, then we enumerate over allU(s). Starting from state s and

taking decision u, we simulate thef(s, u,X(s, u)). ByEhg(s, u, f(s, u,X(s, u)))

+ Â®Â¿(f(s,u,X(s,u)))âˆ’Â¿(s) J(f(s, u,X(s, u)), rn)ithrough a sample average.

Once we estimate the expectation above for all

the decision that yields the smallest value and use it as the decision taken by

the greedy policy

10u 2 U(s), we choosePage 10 of 26http://joc.pubs.informs.orgINFORMS Journal on

Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60induced by

sampling error, but it provides good performance in practice.

Proposition 6.2 in Bertsekas and Tsitsiklis (1996) provides a performance

guarantee for the

approximate policy iteration algorithm. (The result is easily extended from

finite to infinite state

spaces.) This is an encouraging result as it provides theoretical support for

the approximate

policy iteration algorithm, but its conditions are difficult to verify in

practice. In particular,

the result assumes that we precisely know the error induced by using regression

to estimate the

discounted total expected cost of a policy, and it assumes that expectations are

computed exactly

rather than via sampling as in our case. For this reason, we do not go into the

details of this

result and refer the reader to Bertsekas and Tsitsiklis (1996) for further

details.J(Â·, rn) when the state of the system is s. This approach is naturally

subject to4. Basis FunctionsIn this section, we describe the basis functions

function approximations. We use

insights developed in Restrepo, , and Topaloglu (2007).{Ãp(Â·) : p =

1, . . . , P} that we use in our valueP = 6 basis functions, some of which are

based on the queueing4.1. BaselineThe first basis function is of the form

give a very rough estimate of the discounted number of missed calls between the

time associated

with state

every

1 + Ã1(s) = 1 âˆ’ Â®Tâˆ’Â¿(s). The role of this basis function is tos and the

end of the planning horizon. In particular, if we assume that we miss a callÂµ

time units, then the discounted number of missed calls over the interval [Â¿

(s), T] isÂ®Âµ + Â®2 Âµ + . . . + Â®b Tâˆ’Â¿(s)Âµ

1 c Âµ =âˆ’ Â®[b Tâˆ’Â¿(s)Âµ c+1] Âµ1 âˆ’ Â® Â¼1 âˆ’ Â®Tâˆ’Â¿(s)1

..âˆ’ Â®4.2. Unreachable CallsThe second basis function computes the number of

waiting calls for which an ambulance

assignment has been made, but the ambulance will not reach the call within the

threshold

response time. This is easily computed in our setting where travel times are

deterministic. In

the case of stochastic travel times, we could use the probability that the

ambulance will reach

the call within the threshold response time instead. We do not give a precise

expression for this

basis function, since the precise expression is cumbersome yet straightforward

to define.4.3. Uncovered Call RateThe third basis function captures the rate of

call arrivals that cannot be reached on time by

any available ambulance. To define this precisely we need some additional

notation. Recall that

11Page 11 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For

Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60calls arrive across the region R Â½ R2 according to a Poisson point process

with arrival intensityC

and associate a representative point or â€œcenter of massâ€ (

The coverage of subregion

of mass (

between points (

indicator function, the coverage of subregion

system as= {C(t, x, y) : t Â¸ 0, (x, y) 2 R}. Partition the region R into L

subregions {Â½l : l = 1, . . . , L},xl, yl) with each subregion l = 1, . . . ,

L.l is the number of available ambulances that can reach the centerxl, yl)

within the threshold time standard. Let d(t, (x1, y1), (x2, y2)) be the travel

timex1, y1) and (x2, y2) when travel starts at time t. Then, letting 1(Â·) be

thel can be written as a function of the state of theNl(s) = X a2A(s)1(d(Â¿ (s),

(xa(s), ya(s)), (xl, yl)) Â· Â¢).Here,

time A(s) is the set of available ambulances and (xa(s), ya(s)) is the location

of ambulance a atÂ¿ (s) when the system is in state s. The rate of call arrivals

at time t in subregion l isCl(t) = ZÂ½lC(t, x, y) dx dy.Therefore, when the

state of the system is

not covered by any available ambulance bys, we can compute the rate of call

arrivals that areÃ3(s) =L Xl=1Cl(Â¿ (s)) 1(Nl(s) = 0).4.4. Missed Call RateThe

previous 2 basis functions capture calls already received that we know we cannot

reach on

time, and the rate of arriving calls that cannot be reached on time because they

are too far

from any available ambulance. We could also fail to reach a call on time due to

queueing effects

from ambulances being busy with other calls. The fourth basis function

represents an attempt

to capture this effect. This basis function is of the formÃ4(s) =L Xl=1Cl(Â¿

(s)) Pl(s),where

time are busy with other calls. We estimate

processes in different subregions as Erlang-loss systems. In Erlang-loss

systems, calls arriving

when all servers are busy are lost. In reality such calls are queued and served

as ambulances

become free, but the time threshold is almost always missed for such calls, so

counting them

as lost seems reasonable. The issue that such calls impose some load on the true

system but

are discarded in an Erlang-loss system creates a slight mismatch, but our

computational results

show that this function is still highly effective as a basis function.

12Pl(s) is the probability that all ambulances that could reach a call in

subregion l on{Pl(s) : l = 1, . . . , L} by treating the call servicePage 12 of

26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60The Erlang-loss probability L(Â¸, Î¼, c) for a system with arrival rate Â¸,

service rate Î¼ and cservers isL

((Â¸, Î¼, c) =Â¸/Î¼)c/c!P

ic=0(Â¸/Î¼)i/i!.To characterize the Erlang-loss system for subregion

we need to specify the number of servers, and the arrival and service rates. Let

of available ambulances that can serve a call in subregion

Thenl, given that the state of the system is s,Nl(s) be the setl within the

threshold response time.Nl(s) = {a 2 A(s) : d(Â¿ (s), (xa(s), ya(s)), (xl, yl))

Â· Â¢}.We use

be the service rate in the loss system, i.e., the rate at which an ambulance can

serve a call at

time

on the time spent at the scene of a call and any transfer time at a hospital,

since travel times

are usually small relative to these quantities. In our practical implementation,

we use historical

data to estimate the time spent at the call scenes and the hospitals and add a

small padding

factor to capture travel times. Finally, let Â¤

by ambulances in the set

devising a value for

time c = |Nl(s)| as the number of servers in the Erlang loss system for

subregion l. Let Î¼l(t)t in subregion l. It is difficult to come up with a

precise value for Î¼l(t). It primarily dependsl(s) be the rate of call arrivals

that should be servedNl(s). Coming up with a value for Â¤l(s) is even more

difficult thanÎ¼l(t). One option is to let Â¤l(s) = Cl(Â¿ (s)), which is the

rate of call arrivals atÂ¿ (s) in subregion l. However, ambulances in the set

Nl(s) serve not only calls in subregionl

Â¤, but also calls in other subregions. To attempt to capture this letl(s) = X

a2Nl(s)L Xi=1C

so that Â¤

in the set i(Â¿ (s)) 1(d(Â¿ (s), (xa(s), ya(s)), (xi, yi)) Â· Â¢), (4)l(s)

reflects the total call arrival rate in subregions that are close to any of the

ambulancesNl(s). We then use the approximationP

There are at least 2 shortcomings in the estimate for Â¤l(s) Â¼ L(Â¤l(s), Î¼l(Â¿

(s)), |Nl(s)|). (5)l(s) in (4) and the approximation toPl(s) in (5). First,

there is double counting in the estimate of Â¤l(s). In particular, if 2

ambulancesa

Â¤

subregions. To be more precise, if there are three subregions 1, a2 2 Nl(s) can

both reach subregion ` within the time threshold, then the summation forl(s)

counts C`(Â¿ (s)) twice. Second, C`(Â¿ (s)) could be counted in the demand rates

for multiplel1, l2, ` and two ambulancesa

then the summations for Â¤1 2 Nl1(s), a2 2 Nl2(s) such that both a1 and a2 can

reach subregion ` within the time threshold,l1(s) and Â¤l2(s) both count C`(Â¿

(s)). Therefore, we typically haveP

l

factor

13L=1 Â¤l(s) > PL=1 Cl(Â¿ (s)). To overcome this problem, we dilute the call

arrival rates by aÂ·. In particular, we use the call arrival rate Â·C`(Â¿ (s))

in (4) instead of C`(Â¿ (s)) soPage 13 of 26http://joc.pubs.informs.orgINFORMS

Journal on Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60that we may roughly have

l

preliminary experimentation. Interestingly, as we will see in our computational

results, it is not

necessarily the case that the best choice of

tuneable parameter in our ADP approach that requires experimentation.PL=1 Â¤l(s)

= PL=1 Â·Cl(Â¿ (s)). We find a good value for Â· throughÂ· lies in (0, 1). We

emphasize that Â· is the only4.5. Future Uncovered Call RateIn certain settings,

we may not be willing to redeploy ambulances that are already in transit to

a particular location. In this case, from the perspective of covering future

calls, the ambulance

destinations are as important as their current locations. This is the motivation

underlying the

fifth and sixth basis functions. Our fifth basis function parallels the third

one, replacing the true

locations of ambulances by their destinations.

The definition of this basis function is therefore identical to that of

of the ambulances that we use to compute

future configuration that is obtained by letting all ambulances in transit reach

their destinations

and all stationary ambulances remain at their current locations. To be precise,

given that the

system is in state

new state

l

1

l

this case, we can write the fifth basis function asÃ3, but the

configurationNl(s) is not the current one, but rather, an estimateds = (Â¿,

e,A,C) with A = (a1, . . . , aN) and ai = (Â¾ai , oai , dai , tai ), we define

a~s(s) = (Â¿ + 1/PL=1 Cl(Â¿ ), e, ~ A,C) with ~A = (~a1, . . . ,~aN) and ~ai =

(~Â¾ai , dai , dai , Â¿ +/PL=1 Cl(Â¿ )), where ~Â¾ai is the status of ambulance

ai when it reaches its destination dai . InÃ5(s) =L Xl=1Cl(Â¿ (~s(s)))

1(Nl(~s(s)) = 0).The time

l

next call arrival. The next call may arrive before or after the ambulances

actually reach their

destinations, but we heuristically use the time

that the estimated future configuration of the ambulances

time Â¿ (~s(s)) = Â¿ + 1/PL=1 Cl(Â¿ ) is used as an approximation to the

expected time of theÂ¿ (~s(s)) simply to look into the future. The idea is~A is

more likely to hold at the futureÂ¿ (~s(s)) than at the current time Â¿ (s).4.6.

Future Missed Call RateThe sixth basis function,

lost due to queueing congestion. As with the fifth basis function, it uses an

estimated future

configuration of the ambulances. It is defined asÃ6, parallels Ã4 in that it

captures the rate of calls that will likely beÃ6(s) =L Xl=1Cl(Â¿ (~s(s)))

Pl(~s(s)).14Page 14 of 26http://joc.pubs.informs.orgINFORMS Journal on

Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

605. Computational Results on EdmontonIn this section, we present computational

results for the EMS system in the city of Edmonton,

Alberta in Canada. We begin with a description of the data set along with our

assumptions.5.1. Experimental SetupOur data are based on the data set used in

the computational experiments in Ingolfsson, Erkut,

and Budge (2003). The city of Edmonton has a population of 730,000 and covers an

area of

around 40

assume for simplicity that all ambulances are of the same type and operate all

day. An ambulance,

upon arriving at a call scene, treats the patient for an exponentially

distributed amount of time

with mean 12 minutes. After treating the patient at the call scene, the

ambulance transports the

patient to a hospital with probability 0.75. The probability distribution for

the hospital chosen

is inferred from historical data. The time an ambulance spends at the hospital

has a Weibull

distribution with mean 30 minutes and standard deviation 13 minutes. The

turn-out time for

the ambulance crews is 45 seconds. In other words, if the ambulance crew is

located at a base

when it is notified of a call, then it takes 45 seconds to get ready. An

ambulance crew already on

the road does not incur the turn-out time. The road network that we use in our

computational

experiments models the actual road network on the avenue level. There are 252

nodes and 934

arcs in this road network. The travel times are deterministic and do not depend

on the time of

the day.

There was not enough historical data to develop a detailed model of the call

arrivals so we

proceeded with a revealing, but not necessarily realistic, model. The model

maintains a constant

overall arrival rate, but the distribution of the location of calls changes in

time. We divided the

city of Edmonton into 20Ã— 30 km2. The EMS system includes 16 ambulances, 11

bases and 5 hospitals. WeÃ—17 subregions and assumed that the rate of call

arrivals in subregionl at time t is given byCl(t) = CÂ£Â°l + Â±l

sin(2Â¼t/24)Â¤,where

satisfy t is measured in hours. In the expression above, C, Â°l and Â±l are

fixed parameters thatC Â¸ 0, Â°l Â¸ |Â±l|, P340l=1 Â°l = 1 and P340l=1 Â±l = 0.

We have P340l=1 Â°l +P340l

so that we can interpret

as the probability that a call arriving at time

arrival rate in subregion

day in subregion

7 calls per hour. We chose appropriate values of

the business subregions early in the day and higher call arrival rates in the

residential subregions

later in the day.

15=1 Â±l sin(2Â¼t/24) = 1C as the total call arrival rate into the system and

Â°l + Â±l sin(2Â¼t/24)t falls in subregion l. If Â±l > 0, then the peak calll

occurs at hours {6, 30, 54, . . .}, whereas if Â±l < 0, then the peak calll

occurs at hours {18, 42, 66, . . .}. The average call arrival rate over al is

CÂ°l. We estimated C and Â°l using historical data and C ranged from 3 toÂ±l so

that we have higher call arrival rates inPage 15 of

26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60We implemented ADP on top of the BartSim simulation engine developed by

and

Mason (2004). In all of our computational experiments, we use a discount factor

of

day. The simulation horizon is 14 days. We use 30 sample paths in Step 3 of the

approximate

policy iteration algorithm. After some experimentation, we found that

sixth basis functions gave the best results.

We use the static redeployment policy as a benchmark. In particular, the static

redeployment

policy preassigns a base to each ambulance and redeploys each ambulance back to

its preassigned

base whenever it becomes free after serving a call. We found a good static

redeployment policy

by simulating the performance of the system under a large number of possible

base assignments

and choosing the base assignment that gave the best performance.Â® = 0.85 perÂ·

= 0.1 in the fourth and5.2. Baseline PerformanceIn our first set of

computational experiments, we test the performance of ADP under the

assumption that the only time we can redeploy an ambulance is when it becomes

free after serving

a call. We do not redeploy ambulances at other times. Following the discussion

of Section 2.2,

this is equivalent to setting

free after serving a call, in which case

just becoming free. The motivation for using this redeployment strategy is that

it minimizes

the disturbance to the crews and never redeploys an ambulance that is located at

an ambulance

base.

Figure 1 shows the performance of ADP. The horizontal axis gives the iteration

number in

the approximate policy iteration algorithm and the vertical axis gives the

percentage of calls not

reached within the threshold response time. The plot gives the percentage of

calls missed by

the greedy policy induced by the value function approximation at a particular

iteration. The

dashed horizontal line shows the best performance attained by ADP, while the

thin horizontal

line shows the performance of the best static policy. ADP attains a good policy

within two

iterations and then bounces around this policy. The best ADP policy misses 21.9%

(

the calls, whereas the best static redeployment policy misses 25.3% (

figures are undiscounted numbers of missed calls.)

The ADP approach does not converge on a single policy. This lack of convergence

is not a

concern from a practical viewpoint since we can simply choose the best policy

that is obtained

during the course of the approximate policy iteration algorithm and implement

this policy in

practice.

To emphasize the importance of choosing the basis functions carefully, Figure 2

shows the

performance of ADP when we use an arrival rate

in the fourth and sixth basis functions. The best policy obtained through ADP

misses 23.6%

16R(s) = ?, except when e(s) corresponds to an ambulance becomingR(s) is the

singleton set corresponding to the ambulanceÂ± 0.2%) ofÂ±0.2%) of the calls.

(TheseÂ¸ = Cl(Â¿ (s)) instead of the quantity in (4)Page 16 of

26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

605 10 15 20 25

21

21.5

22

22.5

23

23.5

24

24.5

25

25.5

26

Average Number of Lost Calls (% total Calls)

Iteration NumberFigure 1: Performance of ADP and the benchmark policy.

(

calls in Figure 1. Furthermore, the fluctuation in the performance of the

policies in Figure 2 is

much more drastic when compared with Figure 1. The results indicate that

choosing the basis

functions carefully makes a significant difference in the performance of ADP.Â±

0.2%) of the calls. This is in contrast with the best policy missing 21.9% (Â±

0.2%) of the5.3. Comparison with Random SearchFor a fixed set of basis functions

{Ãp(Â·) : p = 1, . . . , P}, a set of values for the tuneable parametersr

approximation induces a greedy policy. Therefore, a brute-force approach for

finding a good set

of values for the tuneable parameters is to carry out a random search over an

appropriate subset

of

sets of values for the tuneable parameters.

To implement this idea, we first use ADP to obtain a good set of values for the

tuneable

parameters. Letting = {rp : p = 1, . . . , P} characterize a value function

approximation J(Â·, r) and this value functionRP and use simulation to test the

performance of the greedy policies induced by the different{Ë†rp : p = 1, . . .

, P} be this set of values, we sample r = {rp : p = 1, . . . , P}uniformly over

the box

2

the performance of the greedy policy induced by the value function approximation

sampled 1,100 sets of values for the tuneable parameters and this provides 1,100

value function

approximations. Figure 3 gives a histogram for the percentage of the calls

missed by the greedy

policies induced by these 1,100 value function approximations. The vertical

lines correspond

to the percentage of calls missed by the best policy obtained by ADP and the

best static

redeployment policy. The figure indicates that very few (3%) of the sampled sets

of values

for the tuneable parameters provide better performance than the best policy

obtained by ADP.

17Â£Ë†r1âˆ’ 1Ë†r1, Ë†r1+ 1Ë†r1Â¤Ã—. . .Ã—Â£Ë†rP âˆ’ 1Ë†rP , Ë†rP + 1Ë†rP Â¤and

use simulation to testJ(Â·, r). WePage 17 of

26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

605 10 15 20 25

23

24

25

26

27

28

29

30

31

32

33

34

Average Number of Lost Calls (% total Calls)

Iteration NumberFigure 2: Performance of ADP with poorly chosen basis functions.

On the other hand, approximately 28% of the samples provide better performance

than the best

static redeployment policy.

The random search procedure we use is admittedly rudimentary and one can use

more

sophisticated techniques to focus on the more promising areas of the search

space. Nevertheless,

our results indicate that when one looks at the broad landscape of the possible

values for the

tuneable parameters, ADP is quite effective in identifying good parameters.

Moreover, the

computation time required by the random search procedure is on the order of

several days,

whereas the computation time required by ADP is a few hours.5.4. Making

Additional RedeploymentsThe computational experiments in Section 5.2 and 5.3

allow redeployments only when an

ambulance becomes free after serving a call. We now explore the possibility of

improving

performance by allowing additional ambulance redeployments. Define an additional

event type

â€œconsider redeploymentâ€ and schedule an event of this type with a certain

frequency that

is detailed below. Whenever an event of this type is triggered, we consider

redeploying the

ambulances that are either at a base or returning to a base, so that

ambulances at such times. The set

an ambulance becoming free after serving a call, and at all other events,

We use two methods to vary the redeployment frequency. In the first method, we

equally

space consider-redeployment events to obtain frequencies between 0 and 15 per

hour. In the

second method, the frequency of consider-redeployment events is fixed at 24 per

hour, but we

make a redeployment only when the estimated benefit from making the redeployment

exceeds

18R(s) can contain multipleR(s) continues to be a singleton when e(s)

corresponds toR(s) = ?.Page 18 of 26http://joc.pubs.informs.orgINFORMS Journal

on Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

6018 20 22 24 26 28 30 32 34 36

0

10

20

30

40

50

60

70

80

90

Average percentage of missed calls

Frequency

ADP*Static*Figure 3: Performance of the 1,100 greedy policies obtained through

random search.

the estimated benefit from not making the redeployment by a significant margin.

More precisely,

letting

matrix of zeros corresponding to the decision matrix of not making a

redeployment, if we have

argminÂ² 2 [0, 1) be a tolerance margin and using Â¯0(s) to denote the |R(s)| Ã—

|B| dimensionalu2U(s) (Ehg(s, u, f(s, u,X(s, u))) + Â®Â¿(f(s,u,X(s,u)))âˆ’Â¿(s)

J(f(s, u,X(s, u)), r)i)Â· (1 âˆ’ Â²)Ehg(s,Â¯0, f(s, u,X(s, Â¯0))) +

Â®Â¿(f(s,Â¯0,X(s,Â¯0)))âˆ’Â¿(s) J(f(s,Â¯0,X(s, Â¯0)), r)i,then we make the

redeployment decision indicated by the optimal solution to the problem on

the left-hand side. Otherwise, we do not make a redeployment. Larger values of

frequency of redeployments. We vary

Figure 4 shows the performance improvement obtained by the additional

redeployments.

The horizontal axis gives the frequency of the redeployments measured as the

number of

redeployments per ambulance per day. The vertical axis gives the percentage of

missed

calls. The dashed (solid) data line corresponds to the first (second) method of

varying the

redeployment frequency. From Figure 1 we miss 21.9% of the calls without making

any additional

redeployments. By making about 10 additional redeployments per ambulance per

day, we can

decrease the percentage of missed calls to 20.2%. Beyond this range, we reach a

plateau and

additional redeployments do not provide much improvement. Another important

observation is

that the second method tends to provide significantly better performance

improvements with

the same frequency of additional redeployments. For example, the second method

reduces the

percentage of missed calls to 20.2% with 10 additional redeployments per

ambulance per day,

whereas the first method needs 15 additional redeployments per day to reach the

same level.

19Â² decrease theÂ² between 0.1 and 0.005.Page 19 of

26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

600 5 10 15 20 25 30

19

19.5

20

20.5

21

21.5

22

22.5

Efficient Frontier: improvement vs. relocations per ambulanceâˆ’day

Relocations per ambulanceâˆ’day

Average Number of Missed Calls (% of total calls)

Regularly scheduled relocs. var. freqs

Selective relocsFigure 4: Performance of ADP as a function of the frequency of

the additional redeployments.

Therefore, it appears that making redeployments only when the value function

approximation

signals a significant benefit is helpful in avoiding pointless redeployments.6.

Computational Results on a Second Metropolitan

AreaIn this section, we present the performance of ADP on the EMS system

operating in a second

metropolitan area. This EMS system is also studied in s (2007). We cannot

disclose the

name of the metropolitan area due to confidentiality agreements.6.1.

Experimental SetupThe population of the city in question is more than 5 times

that of Edmonton and its size is

around 180

times, 88 bases and 22 hospitals. The turn-out times, call-scene times and

hospital-transfer times

are comparable to those in Edmonton. We were able to use a detailed model for

determining to

which hospital a patient needs to be transported. In particular, the destination

hospital depends

on the location of the call. Calls originating at a given location are

transported to any of a

small set of hospitals â€“ usually no more than 2 or 3 out of the 22 hospitals

in the system. The

corresponding probabilities are inferred from historical data. We assume that

all ambulances are

of the same type and a call that is not served within 8 minutes is interpreted

as missed. The

road network that we use in our computational experiments models the actual

network on the

avenue level and there are 4,955 nodes and 11,876 arcs in this road network.

20Ã— 100 km2. The EMS system includes up to 97 ambulances operating during

peakPage 20 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For

Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60The call arrival model that we used is quite realistic. The data that we used

were collected

from one year of operations of the EMS system and consisted of aggregated counts

of calls for

each hour of the week during a whole year, for each of 100

irregular and non-convex shape of the metropolitan area, roughly 80% of these

zones had zero

total call counts and did not intervene in the dynamics. From the remaining 20%

a significant

percentage had very low hourly counts of at most 5 calls. Thus, it was necessary

to apply

smoothing procedure for the lowest intensity zones so as to reduce the sampling

noise. In order

to do this, we first classified the zones in a few groups according to their

average intensity along

the week. Then, for the lowest intensity groups, we computed a total intensity

for each hour

and then distributed this total intensity uniformly among the zones in this

group. In this way

we obtained an intensity model that combined a uniform low intensity background

with actual

(accurate) counts on the highest intensity zones. The average call arrival rate

was 570 calls per

day and fluctuated on any day of the week from a low of around 300 calls per day

to a high of

around 750 calls per day.

We used the actual base assignments and ambulance shifts as a benchmark. In the

EMS

system, a maximum of 97 ambulances operate at noon. Out of these, 66 ambulances

work all

day in two 12 hour shifts and the remaining 31 have single shifts typically

ranging from 10 to 12

hours per day. The resulting shift schedule provides 1,708 ambulance hours per

day. Ambulances

are redeployed to their preassigned bases whenever they become free after

serving a call. We

refer the reader to s (2007) for details on the ambulance shifts.Ã— 100

geographic zones. Due to the6.2. Baseline PerformanceFigure 5 shows the

performance of ADP. The interpretations of the axes in this figure are the

same as those in Figures 1 and 2. The solid horizontal line plots the percentage

of calls missed by

the benchmark policy and the dashed horizontal line plots the best performance

obtained using

ADP. The solid data line plots the performance of ADP when we use

function, whereas the dashed line plots the performance when we use

was found by experimentation to give the best policy and least fluctuations

across iterations of

all the values tried in the interval (0

For both values of

perform very similarly. The improvements in the number of reached calls are

roughly 3

4

than with

some initial experimentation.

21Â· = 2 in the fourth basisÂ· = 1. The value Â· = 2, 2].Â·, the best ADP

policies are attained in one iteration and these policies.0% and.0% in the case

of Â· = 1 and Â· = 2. We get significantly more stable performance with Â· = 2Â·

= 1. This indicates that it is important to carefully tune the parameter Â·

throughPage 21 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For

Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

602 4 6 8 10 12 14

25

30

35

40

45

50

55

Average # of missed calls (% of total number of calls)

Iteration Numberk = 1.0k = 2.0Figure 5: Performance of ADP and the benchmark

policy.6.3. Effect of Turn-Out TimeRecall that if the ambulance crew is

stationed at a base when it is notified of a call, then it takes

45 seconds to get ready, i.e., the turn-out time is 45 seconds. On the other

hand, an ambulance

crew that is already on the road does not incur turn-out time. A potential

argument against

ambulance redeployment is that any gains are simply due to ambulance crews being

on the road

more often, and therefore incurring less turn-out time delays.

To check the validity of this argument, we used ADP under the assumption that

turn-out

time is zero. In other words, an ambulance crew does not take any time to get

ready, even if it

is located at a base when it is notified of a call. Table 1 shows the results

for two different call

arrival regimes (633 calls per day and 846 calls per day). The third and fourth

rows show the

absolute and relative improvement of ADP over the benchmark strategy. The

results indicate

that ADP provides significant improvement over the benchmark strategy in all

cases. At first

sight, this improvement seems slightly smaller for the cases without turn-out

time, but the

relative improvement is roughly the same in all cases.6.4. Varying Call Arrival

Rates and Fleet SizesWe now explore the effect of different call arrival rates

and fleet sizes on the performance of

ADP. We first carry out a number of runs under the experimental setup described

in Section 6.1,

but we vary the call arrival rates by uniformly scaling them by a constant

factor. Recall that

the average call arrival rate in the experimental setup in Section 6.1 is around

570 calls per day.

Figure 6 shows the percentage of the calls that are missed by the best policy

obtained by ADP

22Page 22 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For

Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60633 calls per day

Turn out Turn out

= 45 secs. = 0 secs.

% of calls missed 27.5 23.0

by benchmark

% of calls missed 23.9 19.3

by ADP

Improvement 3.6 3.2

Rel. improvement 3.6 / 27.5 3.2 / 23.0

= 0.13 = 0.139

846 calls per day

Turn out Turn out

= 45 secs. = 0 secs.

% of calls missed 35.5 30.7

by benchmark

% of calls missed 30.5 26.3

by ADP

Improvement 4.9 4.5

Rel. improvement 4.9 / 35.5 4.5 / 30.7

= 0.139 = 0.145Table 1: Effect of turn-out time on the performance gap between

ADP and the benchmark policy.450 500 550 600 650 700 750 800 850

20

25

30

35

40

Average # of calls/day

Average # of missed calls lost (% of total number of calls)

Benchmark Policies

Best ADP policiesFigure 6: Performance of ADP and benchmark policy for different

call arrival rates.

and the benchmark strategy as a function of the average call arrival rate. The

solid data line

corresponds to ADP, whereas the dashed data line corresponds to the benchmark

strategy. ADP

provides substantial improvements over all call arrival regimes. The

improvements provided by

ADP are more significant when the call arrival rate is relatively high. It

appears that when the

call arrival rate is high, there is more room for improvement by redeploying

ambulances carefully

and ADP effectively takes advantage of this greater room for improvement.

In a second set of computational experiments, we hold the call arrival rate

constant and

vary the number of ambulances in the fleet. Figure 7 shows the performance of

ADP and

the benchmark strategy as a function of the fleet size. Since we do not have

access to the

base assignments used by the benchmark strategy for different fleet sizes, we

modify the base

assignments described in Section 6.1 heuristically. In particular, to produce a

base assignment

with fewer ambulances, we choose the ambulances assigned to bases serving low

demand and

take them off shift. Similarly, to produce a base assignment with extra

ambulances, we add

23Page 23 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For

Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

6080 85 90 95 100

20

25

30

35

40

45

Number of Ambulances

Average # of missed calls (% of total number of calls)

Static Policy

Best ADP redeployment policyFigure 7: Performance of ADP and benchmark policy

for different fleet sizes.

ambulances to the bases with the highest demand.

The results indicate that ADP performs consistently better than the benchmark

policy. An

important observation from Figure 7 is that if our goal is to keep the

percentage of the missed

calls below a given threshold, say 28%, then ADP allows us to reach this goal

with roughly 5

or 6 fewer ambulances than the benchmark policy. This translates into

significant cost savings

in an EMS system. It is also interesting that the performance of the benchmark

policy does

not improve significantly beyond 97 or 98 ambulances in the fleet, whereas ADP

continues to

provide progressively lower missed-call rates. This partly reflects the quality

of our heuristic

benchmark strategy, but it also indicates that a redeployment strategy can help

mitigate poor

static allocations and make effective use of extra capacity.7. ConclusionsIn

this paper, we formulated the ambulance redeployment problem as a dynamic

program and

used an approximate version of the policy iteration algorithm to deal with the

high-dimensional

and uncountable state space. Extensive experiments on two problem scenarios

showed that ADP

can provide high-quality redeployment policies. The basis functions that we

constructed open

up the possibility of using other approaches, such as temporal-difference

learning and the linearprogramming

approach, to tune the parameters

exploring the linear-programming approach.

Other future research will incorporate additional degrees of realism into our

model. We plan

to include stochastic travel times, multiple call priorities, other cost

functions and more realistic

24{rp : p = 1, . . . , P}. Indeed, we are currentlyPage 24 of

26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60ambulance dynamics that involve multiple ambulances serving certain calls.

Incorporating these

complexities may require constructing additional basis

functions.AcknowledgmentsWe thank Currell, Armann Ingolfsson and

Mason for their advice and support,

and for providing the data sets used in our computational experiments. This work

was partially

supported by National Science Foundation Grants DMI 0400287 and DMI

0422133.ReferencesAdelman, D. 2004. A price-directed approach to stochastic

inventory routing.

Research

Adelman, D. 2007. Dynamic bid-prices in revenue management.

Adelman, , Adam J. Mersereau. 2007. Relaxations of weakly coupled

stochastic dynamic

programs.

Andersson, T. 2005. Decision support tools for dynamic fleet management. Ph.D.

thesis,

Department of Science and Technology, Linkoepings Universitet, Norrkoeping,

Sweden.

Andersson, T., P. Vaerband. 2007. Decision support tools for ambulance dispatch

and relocation.Operations52 499â€“514.Operations Research 55

647â€“661.Operations Research .Journal of the Operational Research Society

Berman, O. 1981a. Dynamic repositioning of indistinguishable service units on

transportation

networks.

Berman, O. 1981b. Repositioning of distinguishable urban service units on

networks.

and Operations Research

Berman, O. 1981c. Repositioning of two distinguishable service vehicles on

networks.

Transactions on Systems, Man, and Cybernetics

Bertsekas, D.P., S.E Shreve. 1978.

Academic Press, New York.

Bertsekas, D.P., J.N. Tsitsiklis. 1996.

Massachusetts.

Brotcorne, L., G. Laporte, F. Semet. 2003. Ambulance location and relocation

models.

Journal of Operations Research

de Farias, D. P., B. Van Roy. 2003. The linear programming approach to

approximate dynamic

programming.

Farias, V. F., B. Van Roy. 2006. Tetris: A study of randomized constraint

sampling. G. Calafiore,

F. Dabbene, eds.,

Springer-Verlag.

Farias, V. F., B. Van Roy. 2007. An approximate dynamic programming approach to

network

revenue management. Tech. rep., Stanford University, Department of Electrical

Engineering.

Gendreau, M., G. Laporte, S. Semet. 2001. A dynamic model and parallel tabu

search heuristic

for real time ambulance relocation.

2558 195â€“201.Transportation Science 15.Computers8

105â€“118.IEEESMC-11.Stochastic Optimal Control: The Discrete Time

Case..Neuro-Dynamic Programming. Athena Scientific, Belmont,European147

451â€“463.Operations Research 51 850â€“865.Probabilistic and Randomized Methods

for Design Under Uncertainty.Parallel Computing 27 1641â€“1653.Page 25 of

26http://joc.pubs.informs.orgINFORMS Journal on Computing: For Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60Gendreau, M., G. Laporte, S. Semet. 2006. The maximal expected coverage

relocation problem

for emergency vehicles.

Goldberg, J. B. 2007. Personal Communication.

, S. G., A. J. Mason. 2004. Ambulance service planning: Simulation and

data

visualisation. M. L. Brandeau, F. Sainfort, W. P. Pierskalla, eds.,

Health Care: A Handbook of Methods and Applications

and Management Science, Kluwer Academic, 77â€“102.

Ingolfsson, A. 2006. The impact of ambulance system status management.

Presentation at 2006

INFORMS Conference.

Ingolfsson, A., E. Erkut, S. Budge. 2003. Simulation of single start station for

Edmonton EMS.Journal of the Operational Research Society 57 22â€“28.Operations

Research and. Handbooks in Operations ResearchJournal of the Operational

Research Society

Kolesar, P., W. E. . 1974. An algorithm for the dynamic relocation of fire

companies.54 736â€“746.Operations Research

Nair, R., E. -Hooks. 2006. A case study of ambulance location and

relocation. Presentation

at 2006 INFORMS Conference.

, Warren B. 2007.

Dimensionality

Restrepo, M., S.G , H. Topaloglu. 2007. Erlang loss models for the

static deployment

of ambulances. Submitted.

s, D. P. 2007. Optimised ambulance redeployment strategies. Masterâ€™s

thesis, Department

of Engineering Science, University of Auckland, Auckland, New Zealand.

Schweitzer, P., A. Seidmann. 1985. Generalized polynomial approximations in

Markovian decision

processes.

Si, Jennie, G. Barto, Warren B. , Wunsch II, eds. 2004.

Learning and Approximate Dynamic Programming

Sutton, R. S. 1988. Learning to predict by the methods of temporal differences.

22 249â€“274.Approximate Dynamic Programming: Solving the Curses of. Wiley

& Sons, Hoboken, NJ.Journal of Mathematical Analysis and Applications 110

568â€“582.Handbook of. Wiley-Interscience, Piscataway, NJ.Machine Learning3

Topaloglu, H., W. B. . 2006. Dynamic programming approximations for

stochastic, timestaged

integer multicommodity flow problems.

Tsitsiklis, J., B. Van Roy. 1997. An analysis of temporal-difference learning

with function

approximation.

Tsitsiklis, J. N. 1994. Asynchronous stochastic approximation and 9â€“44.INFORMS

Journal on Computing 18 31â€“42.IEEE Transactions on Automatic Control 42

674â€“690.Q-learning. Machine Learning16

Tsitsiklis, J.N, B. Van Roy. 2001. Regression methods for pricing complex

American-style

options.

Van Roy, , Dimitri P. Bertsekas, Yuchun Lee, N. Tsitsiklis. 1997. A

neuro dynamic

programming approach to retailer inventory management.

on Decision and Control

Watkins, C. J. C. H., P. Dayan. 1992.

Yan, X., P. Diaconis, P. Rusmevichientong, B. Van Roy. 2005. Solitaire: Man

versus machine.185â€“202.IEEE Transactions on Neural Networks 12

694â€“703.Proceedings of the IEEE Conference.Q-learning. Machine Learning 8

279â€“292.Advances in Neural Information Processing Systems

2617.Page 26 of 26http://joc.pubs.informs.orgINFORMS Journal on Computing: For

Review Only1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

Â

Joby Berkley - Owner

Jobo's Hamburger Grill

6548 Lake Worth Blvd., # 200

Lake Worth, Texas 76135

www.joboshamburgergrill.com

Work

Fax

Cell

joby@...

Â

SSM makeover??? Ambulance Placement Research

For thoseÂ of youÂ thatÂ might be interested in the new studyÂ being conduted by

Cornell University Dr. Shane here is his acedemic science paper. Some

of you mayÂ night be interested in it, as it contains a huge amount of formulas,

butÂ you might enjoyÂ the resultÂ section. This was sent to meÂ by Dr.

and it shows the complexity of unit placement.

Â

Just FYI

Â

Joby Berkley - Owner

Jobo's Hamburger Grill

6548 Lake Worth Blvd., # 200

Lake Worth, Texas 76135

www.joboshamburgergrill.com

Work

Fax

Cell

joby@...

June 20, 2008