Hmm, ok, I use the CIF format rather than the JSON one. I see a handful of duplicates in the full JSON schedule but none in the CIF format one:
$ zcat CIF_ALL_FULL_DAILY%2Ftoc-full.gz | jq '.JsonScheduleV1 | {schedule_start_date,CIF_stp_indicator,CIF_train_uid}' -c | sort | uniq -c | sort -n | tail
1 {"schedule_start_date":"2017-12-09","CIF_stp_indicator":"P","CIF_train_uid":"P52168"}
2 {"schedule_start_date":"2016-12-11","CIF_stp_indicator":"P","CIF_train_uid":"Y60610"}
2 {"schedule_start_date":"2016-12-15","CIF_stp_indicator":"P","CIF_train_uid":"H20523"}
2 {"schedule_start_date":"2016-12-17","CIF_stp_indicator":"P","CIF_train_uid":"H02062"}
2 {"schedule_start_date":"2016-12-17","CIF_stp_indicator":"P","CIF_train_uid":"H14818"}
2 {"schedule_start_date":"2017-01-02","CIF_stp_indicator":"P","CIF_train_uid":"P70644"}
2 {"schedule_start_date":"2017-02-05","CIF_stp_indicator":"O","CIF_train_uid":"Y69696"}
2 {"schedule_start_date":"2017-02-05","CIF_stp_indicator":"O","CIF_train_uid":"Y69703"}
2 {"schedule_start_date":"2017-05-21","CIF_stp_indicator":"P","CIF_train_uid":"H37737"}
42046 {"schedule_start_date":null,"CIF_stp_indicator":null,"CIF_train_uid":null}
$ zcat CIF_ALL_FULL_DAILY%2Ftoc-full.CIF.gz | grep ^BS | cut -c4-15,80 | sort | uniq -c | sort -n | tail
1 Y76899170114P
1 Y76902170109C
1 Y76902170109P
1 Y76903170114C
1 Y76903170114P
1 Y76904170109C
1 Y76904170109P
1 Y76907170109P
1 Y76909170114P
1 Y76911170108P
$