Another example from this morning is train 406 / 408.
406 struck an auto at a crossing and was taken out of service. It disappeared from the API long before it disappeared from the legacy feed, but that's OK because it was out of service. (I still would expect it to be in the vehiclesByRoutes calls for as long as it was also showing in the legacy feed however.)
Train 408 however never went out of service. It held at Belmont for roughly 1 hour before moving up beside 406 to take on those passengers, then finished its trip. When it reached Porter the legacy feed was reporting 60 minutes late. In spite of never being taken out of service, 408 disappeared from the API around 9AM while it was waiting at Belmont.
Obviously in this example the only passengers impacted by the lack of train 408 data were waiting at Porter to go to North Station, or waiting at North Station to meet an arriving passenger (both are typically small head-counts) but what if instead it had been an outbound train? or an inbound train early in the trip? It appears that after no movement of X minutes a train is completely dropped from the API never to return - even if it resumes its trip after that X minute period.
(X seems to be roughly in the range of 20 to 30 minutes.)
I feel strongly that in exceptional cases where a train is delayed for a significant amount of time it is even more important to provide data on it - at least a minimum of all the data points included in the vehiclesByRoutes returns.