Programmatically the agent works backwards to create the map of potential (rewards - costs).
This representation of each square is a representation of how hard it will be to get to the goal, by reducing the potential reward gained by the costs of being in that square, which is a sum of all the costs to get from that square to the goal (backwards/recursive is how the program would work to develop this).
By knowing how hard it will be to get from a given square to the goal state, the agent can make the best decision on which path to take to maximize its chance of gaining the most of the goal reward.
The agent is never assured of knowing the future and exactly how much reward it will receive because the action phase is stochastic, where we are not assured to get what we desire at each action step.
So the number is the potential reward, I say potential because this is a stochastic environment. The reward is adjusted for the potential cost associated with getting from there to the reward state/goal state. The reward state, is reduced by the costs to get there, this is part of the problem givens.
It is not necessary to have the number in this square represent the actual cost overall the agent has incurred in the real execution, but just a comparison of actions so it can create the policy that represents the best decision when in that state on which action to take.
This is my understanding, hope it helps, someone else might have a more succinct or technically correct answer.
--David