Hi,
you can find the adaptions to the model architecture in the
models/yolo.py file.
We modified the Detect / IDetect Layers, which are the final layers of the YOLOv7 Net w.r.t. Object Detection.
These Layers provide the output vector for each anchor consiting of (x_center, y_center, width, height, objectness, class_probs).
We extended this vector with the distance estimate -> (x_center, y_center, width, height, objectness, class_probs, dist_estimate).
During training and inference, sigmoid is applied to the distance estimate (line 164 for inference, for training see loss.py).
We use the normalized distance (according to normalization strategy in hyperparameters file) for training, so there is no need to rescale the output.
During inference however, we strive to provide a metric estimate, thus we need to rescale the value (inverse of normalizaion), see line 167.
You can inspect the commit history of the repository to see all the changes to the architecture.
The most important ones are to:
- Model Architecture (i just explained that part)
- Loss Function
- Data Loader
- Train / Test Scripts