Hmm, it looks like you're right. That is definitely an oversight and we should have docs there.
You can imagine all data in a druid cluster being arranged over a timeline. Each segmentGranularity region of your timeline has a set of segments that contains the data for that time region. So if your segmentGranularity is "hour" then each hour might have 1 or 2 or any number of segments that form a complete set. Each of those segments has the same version number, but different partition numbers. The partitionNums start at 0 and go up as far as necessary. When the broker receives a query for a particular hour, it will find the most recent segment version for that hour, then find all partitionNums associated with that highest version, then send the query to nodes that are currently serving those segments, and finally merge the results. If it finds multiple segments with the same partitionNum, it will just pick one of them. But it will query all unique partitionNums it finds. This is because it makes an assumption that two segments for the same datasource, interval, version, and partitionNum actually contain the same data and only exist as separate segments for purposes of redundancy.
In batch indexing, the DeterminePartitions job first determines how many segments need to be created for each segmentGranularity region, and then the IndexGenerator job actually creates those segments and assigns them unique partitionNums. The historical nodes then load the segments with whatever replication factor you have configured.
In realtime indexing, each realtime node (or realtime task, in the indexing service) creates and serves segments with a single partitionNum. The shardSpecs control the manner in which those segments are advertised. There are a few different options, but most people choose linear shardSpecs, where all you need to specify is the partitionNum: {"type":"linear","partitionNum":0}. Because of the way the brokers make queries, you can use shardSpecs both to create redundancy and to scale out ingestion. To create redundancy, you should create two or more realtime nodes/tasks with the same shardSpec and the same underlying data (since the broker will assume same partitionNum = same data, and will only query one of them). To scale out ingestion, you should create two or more realtime nodes/task with different shardSpecs and different underlying data (since the broker will query all of them and merge the results). You can also do both.