Omar,
There are a number of ways you can set up your network(s) for this problem, but if you are using publicly available code, it almost certainly isn't creating ten networks.
Instead, it has a single network with some number of layers, and then the last layer is a 10-way softmax. So, the last layer (the Softmax) is what takes the information about the image that is encoded by the lower layers, and translates that into a prediction about how likely the image is to be in class 1 (the written number "1"), class 2, (the written number "2"), class 3 (the written number "3"), etc.
---
In practice, this is more computationally efficient than training ten separate networks. If you wanted to train 10 totally separate networks, the easiest way to do that would be with a for loop that iterates through the ten classes, creates a binary variable indicating which observations are in that class, and then builds a network to predict that binary variable.
But if I were in your shoes, I'd prefer what the conventional 10-way softmax is doing.
Dan