A* algorithm - Why should the heuristic be an underestimate?

Rosinger

unread,

Dec 12, 2002, 5:48:37 AM12/12/02

to

Hi all,

In the A* algorithm, F = G + H where H is the result returned by the
heuristic function. In many articles that I read about A*, the one
thing I can't understand is: why should the heuristic function return
an underestimate figure?

How do we know it is an underestimate in the first place, because at
any given node N, we do not know the actual (real) distance to the
goal, so how can we say underestimate if we don't know the real
figure?

Can someone please explain it in simple terms, or give me a hint.
Something is blobked for me here ...

Cheers,

Dave

[ comp.ai is moderated. To submit, just post and be patient, or if ]
[ that fails mail your article to <com...@moderators.isc.org>, and ]
[ ask your news administrator to fix the problems with your system. ]

David Kinny

unread,

Dec 12, 2002, 6:08:50 AM12/12/02

to

In <at9pi5$5gk$1...@mulga.cs.mu.OZ.AU> jros...@hotmail.com (Rosinger) writes:

>Hi all,

>In the A* algorithm, F = G + H where H is the result returned by the
>heuristic function. In many articles that I read about A*, the one
>thing I can't understand is: why should the heuristic function return
>an underestimate figure?

A* explores nodes in order of increasing total estimated cost F.
If the heuristic H is guaranteed never to overestimate then F is
a lower bound on the actual cost of a path via a given node, ie
the actual cost via that node must always be >= the estimate.
A* may then "waste time" expanding a node whose distance to the
goal is underestimated, but it will eventually discover that the
actual cost is greater than the estimate and immediately switch
to more promising nodes, hence ensuring that it finds an optimal
path (if one exists) before finding a suboptimal one.

If the heuristic overestimates then A* may find a suboptimal
path to the goal before it finds an optimal one, because an
overestimate causes it to defer exploring some node actually on
an optimal path, choosing instead to expand some more promising
node on a suboptimal path.

>How do we know it is an underestimate in the first place, because at
>any given node N, we do not know the actual (real) distance to the
>goal, so how can we say underestimate if we don't know the real
>figure?

Finding heuristics that never overestimate is certainly possible
even when we don't know the distance from a node to the goal.
For example, the constant function H=0 never overestimates,
although it provides no guidance in the search. In a 4-connected
grid the "Manhattan Distance" (sum of the absolute values of the
differences in X and Y values between the node and the goal) also
never overestimates, and does provide useful guidance.

>Can someone please explain it in simple terms, or give me a hint.
>Something is blobked for me here ...

>Cheers,

>Dave

If it's still not clear I suggest you try it out by hand on a
small example, it should rapidly become obvious what's going on.

David

Alf P. Steinbach

unread,

Dec 12, 2002, 11:27:43 PM12/12/02

to

On 12 Dec 2002 21:48:37 +1100, jros...@hotmail.com (Rosinger) wrote:

>Hi all,
>
>In the A* algorithm, F = G + H where H is the result returned by the
>heuristic function. In many articles that I read about A*, the one
>thing I can't understand is: why should the heuristic function return
>an underestimate figure?
>
>How do we know it is an underestimate in the first place, because at
>any given node N, we do not know the actual (real) distance to the
>goal, so how can we say underestimate if we don't know the real
>figure?
>
>Can someone please explain it in simple terms, or give me a hint.
>Something is blobked for me here ...

Assuming F(N) = distance from node N to a goal node, G(N) = known
smallest distance from the start to node N, H(N) = estimate of
distance from N to goal node.

If H(N) is an overestimate then if the algorithm reaches the goal
node by some other path with distance < F(N) then it will never
look at node N, because H(N) is then seemingly worse than an already
established path. And so even if the _actual_ distance via N is
optimal this path will then be overlooked. Which is incorrect.

If, on the other hand, H(N) is an underestimate then if the algorithm
reaches the goal node by some path with distance D, and the actual
distance via N is less than D, then it is guaranteed that F(N) will
also be less than D, so the algorithm will continue to look at N
(and perhaps find a path shorter than D).

Hth.,

- Alf (novice helper mode)

Pawel Cichocki

unread,

Dec 14, 2002, 12:43:20 AM12/14/02

to

On 12 Dec 2002 22:08:50 +1100, d...@OMIT.cs.mu.oz.au (David Kinny) wrote:
> In <at9pi5$5gk$1...@mulga.cs.mu.OZ.AU> jros...@hotmail.com (Rosinger) writes:
> >In the A* algorithm, F = G + H where H is the result returned by the
> >heuristic function. In many articles that I read about A*, the one
> >thing I can't understand is: why should the heuristic function return
> >an underestimate figure?
>
> A* explores nodes in order of increasing total estimated cost F.
> If the heuristic H is guaranteed never to overestimate then F is
> a lower bound on the actual cost of a path via a given node, ie
> the actual cost via that node must always be >= the estimate.
> A* may then "waste time" expanding a node whose distance to the
> goal is underestimated, but it will eventually discover that the
> actual cost is greater than the estimate and immediately switch
> to more promising nodes, hence ensuring that it finds an optimal
> path (if one exists) before finding a suboptimal one.
>
> If the heuristic overestimates then A* may find a suboptimal
> path to the goal before it finds an optimal one, because an
> overestimate causes it to defer exploring some node actually on
> an optimal path, choosing instead to expand some more promising
> node on a suboptimal path.

It might be worth noting that very often an overestimated heuristic makes the
algorithm find the way faster, even thou the solution is suboptimal.
So if you have an aplication where the path needs to be found _fast_ you might
consider an overestimated heuristic a valid option.

>> (...)