Label images in PASCAL VOC are palette images, meaning that pixel values are indices into a palette (the palette is embedded in the PNG file). Therefore, DIGITS does not need to do any conversion when creating the dataset: labels in the dataset have exactly the same data as in the original label images. Similarly, at the output of the network, DIGITS extracts the predicted classification for every pixel in the output to figure the class ID of every pixel. Then DIGITS uses the palette from the PASCAL VOC labels to show a visualization of the segmentation, using the same colors. If you were to provide your own class names, DIGITS would proceed in the same way. DIGITS does not need to know how the PASCAL VOC folks created their color map because that information is already provided in the palette of each label image.
Alternatively, some datasets (such as SYNTHIA) have their label images in RGB. For those, you do need to tell DIGITS how to map colors to class IDs by providing a color map text file. The color map is used to create the label dataset. In any case, image segmentation datasets in DIGITS always have single-channel labels, where each pixel value is the ID of the target class.
I hope this helps.