A*

From GPWiki

(Redirected from A star)

Files:GUITutorial_warn.gif The Game Programming Wiki has moved! Files:GUITutorial_warn.gif

The wiki is now hosted by GameDev.NET at wiki.gamedev.net. All gpwiki.org content has been moved to the new server.

However, the GPWiki forums are still active! Come say hello.

Pathfinding is the process of finding a path from one point to another. A* (a-star) is an algorithm that efficiently finds a shortest path in a graph. In many game-programming situations this is the best choice for pathfinding.

Contents

Prerequisites

To be able to perform A* pathfinding you need two things:

  • A graph of your 'world' (this does not have to be a separate data-structure, a grid-based world can be approached as a graph for example)
  • A method to estimate distances between points (if the points are in some kind of coordinate system, this should not be a problem)

In some worlds the first element may be slightly problematic. For example in a world that consists purely of 3d meshes, you might need to generate a separate pathfinding graph to be able to do proper pathfinding. That is beyond the scope of this article though.

Basic Method

Finding a path in itself is trivial to do: Just try all possible paths, and see which one is the shortest. This method has one important drawback... it takes time. In fact if you have a complicated graph/world and you want to do it often it takes so much time that you can forget about real-time performance. This is why more sophisticated pathfinding algorithms were invented - if you do it clever, you can still get the best path without looking at every possible path.

An important concept for efficient pathfinding is sorting paths while you are building them. While you can't really know whether a path will turn out to be any good before you finish it, you can save yourself a lot of time by not starting with the path that goes the opposite direction from where you want to go. A* uses some method (usually the distance between the endpoint of a path and the target point) to approximate how 'good' the path is. If you keep growing the current 'best' path, reevaluating which is the current best at each step, you'll arrive at the target without having to evaluate all the really bad paths out there.

Algorithm

The A* algorithm uses a few main concepts:

The open list 
This contains the nodes that currently need to be considered as possible starts for further extensions of the path. You start by placing the start node into this list.
The closed list 
This contains nodes that have had all their neighbours added to the open list, the paths leading to them are 'done' (not entirely, but I'll get back to that).
The G score 
I have no idea where the letter G came from but that's what this score is called in all explanations. Every node in the open or closed lists has a G score, it indicates the 'length' or 'weight' of the path leading to that node from the start node (low lengths are better, as you might have guessed).
The H score 
The H (heuristic) score resembles the G score except that it represents the distance from the node to the endpoint. Because the path is not complete yet you can not really know this score, and that is where the estimation method mentioned earlier comes in. H scores are 'guessed', in coordinate systems you approximate the distance to the endpoint to get an H score.

So you start with an empty closed list and just the starting point in your open list. For every node in these lists you'll want to store its G score and the node that was used to arrive at this node (to backtrack the path).

Now we have to extend the path until it reaches the endpoint. This goes as follows:

  • Pick the node in the open list for which the sum of the G and H scores is the lowest. The G scores are stored in the nodes of the open list, and the H scores can be calculated for them. If the open list is empty there is no path from the start point to the end point. We'll call the point picked P.
  • For every point that is adjacent to P but not in the closed list: If it is not yet in the open list, add it. The 'previous' node for these new nodes is P, and their G score is the G score of P plus the distance between the new node and P. If it was already in the open list, check it's current G score, and if the new G score would be less than the current one update the G score and 'previous' pointer, otherwise leave it alone. Adjacency in a graph means there is a connection between the points, in a grid it means a point is next to P (whether you include points that are diagonally next to P is up to you). If the new point happens to be the destination point, you've found your path.
  • Move P to the closed list, and start over.

If you use a grid that contains 'impassable' terrain, just ignore nodes that are not passable in the second step, they are not part of the pathing graph. The distance between nodes on a grid is rather straightforward. If you go up, down, left or right you use the size of a grid unit, if you go diagonally you use sqrt(2) times the size of a grid unit. If some terrain is 'difficult' to pass, you can increase the weight for paths on that terrain.

To find an optimal (shortest) path, you have to make sure that your distance estimation function underestimates the distance between points. For the distance estimation method on a grid where you only travel horizontal or vertical you could just take sum of the vertical and horizontal differences between the node and the endpoint. You can experiment a little with what method you use. If your method overestimates the distance the algorithm may run faster, but it will not always find the shortest path.

Optimizations

The names of closed and open list seem to suggest lists as their implementation. This is not the worst choice, but you can do better. The lists have to be searched an awful lot, and if you can make that more efficient you gain a lot of speed.

For the open list you'll need to find the node with the lowest G + H score. Firstly, you'll want to store that score in every node to make searching easier. Keeping the nodes sorted helps, and the binary heap data structure is a more efficient way of doing this in this case.

The closed list will have to be searched each time a new node is looked at - if it is already in closed list it has to be ignored. The relevant property in this search is the location of the node, in a grid you can keep your nodes sorted by location, maybe in a quadtree structure.

If you only want to search a limited area of a grid for a path (or the grid is just small) you can gain a tremendous speed advantage by just allocating all the nodes in the grid right away in a big array. This way you won't have to dynamically allocate any storage during execution, and you can very easily locate nodes in the 'closed list' - instead of a list this will just be a 'closed flag' that is set on nodes that are closed. For the open list you keep a binary heap with pointers into this node array. I'll leave the details up to you.

Links