GTFS Best practises versus abusive implementations

129 views
Skip to first unread message

Stefan de Konink

unread,
Jan 21, 2024, 8:24:47 AMJan 21
to GTFS Changes
Hello,

Over 10 years we are providing a nation wide static and real time feeds
and have been participating in the standardisation discussions. While
the <https://old.gtfs.org/best-practices/> and general recommendations
are very clear. It recently came to light that some vendors are actively
operating a distributed deny of service attack on our infrastructure.
And you may wonder distributed, isn't that a bold statement? When it
comes to a Diamond member of Mobility Data, using many of their assigned
subnets, and is accessing our static GTFS (135-250MB) for 66.000 times
since August 2023 this becomes a worst possible lead by example situation.

As leading a not for profit foundation I have more than once made the
statement that open data cannot be made available, unconditionally, to
anyone that requests it. Opposed to the governmental standpoint: even
repetitive requests for the same, by the same party, should be facilitated.

With the current wokeness happening considering """climate change"""
consider the above. Some party has the infrastructure to process 66.000
times a timetable of less than a gigabyte, 10 million stop_times.
Considering the IP-space available for just the downloads a small ISP
would be jealous of. Even if it only was downloaded to make a checksum.
We cannot be the only one noticing. The reaction that came forth with
excuses being that some parties did not have the right time configured
on their server and the party wanted to update the timetable as soon as
it changed to match GTFS-RT, did not give me the acknowledgement how
poor this party is using resources given them without any additional
cost. My "Donations are welcome" was silently ignored.


The above was the worst in class, but even for other "apps" we see from
the same source IP within a minute over 8 parallel downloads for just
our static feed.


Now you may wonder: how is this for GTFS-RT? We have noticed that some
parties actually care. They do implement checks to validate first if our
content was changed. But how often do you publish useful updates towards
vehiclePositions or tripUpdates on your publication platform?
Considering the lack of progress on DIFFERENTIAL
<https://github.com/google/transit/issues/84> which we have available
for over 10 years, in concert with our filesizes, a FULL_DATASET once a
minute is more than enough. How often do you think vendors query? We
observe vendors that fetch updates every second, without any effort to
limit traffic via If-Modified-Since or If-None-Match.

Enough is enough. And I would urge that every producer implements rate
limiting and we start to make a name and shame list. The following gist
is an example for haproxy to implement a smart way to do rate limiting,
it might be updated in the future.

<https://gist.github.com/skinkie/f5f02582142c2a216e2487456ac8bd57>

--
Stefan
OpenPGP_0xDA0A21EE7E3D2959.asc
OpenPGP_signature.asc
Reply all
Reply to author
Forward
0 new messages