2. Consumer pays model
Another solution I've seen recently is a consumer-pays-for-egress model used in Amazon's Openstreetmap data release. Here's one link, I couldn't find the exact article explaining it: https://registry.opendata.aws/
Under this model, the dataset is available as open and is query-able, meaning you can fetch just the parts you want. But if you as a consumer want to fetch a high amount of it, then you have to pay the bandwidth costs incurred.
And if you consume less, then there might be a free slab you come under (not sure what is the case in AWS case)
Provided we retain a basic free tier, I think this takes care of a lot of problems. Now I wouldn't want India's governments to be a beholden AWS customer (because USA etc already are and there's national security considerations that we SHOULD take seriously, not scoff at), but rather like what has been done with UPI, there should be Indian-owned infrastructure and service which offers the same deal. Maybe a prepaid wallet that gets deducted from when we exceed the free tier.
3. Rate limiting
Many times it's not the quantity, but the velocity of data scraping that inflicts high costs on the provider. If scrapers scraped data slowly overnight instead of trying to fetch everything in 10 minutes at peak business hours, we might make things work with the same existing setup used to serve data to sites.
Example : The main concern for say Indian Railways in serving train schedules data would be : the server should not get so clogged up by these scraping bots that people who are trying to book tickets get downtime. Enforcing rate limits can help a lot here. A basic figure: Allow an IP or a user to make max 4 requests per minute. I've recently implemented this using Kong Gateway, and was surprised by how easy it was.
In many cases I'm suspecting that the people in charge had no idea that rate-limiting is even possible, so they went to next option: captcha restrictions to disable automated data fetching entirely. Funnily, that's a far more expensive measure than rate-limiting! And we have an arms race now with scrapers cracking that captcha and then providers making it so difficult to read that eventually humans won't be able to read it anymore. Maybe we can put down our weapons, take a few steps back and communicate that there are options available that work for both sides?
4. Load shifting
One load-shifting example I've seen in my netbanking : If I request for some long-term account statements, instead of trying to give the data immediately, it queues the task in backend and tells me to carry on and come back in a few mins to download. Some other sites mail me the link when they're done gathering the data requested. So, we could have something like this : it distributes the load on server from peak-time spikes to the "lazy" times later when there's not much high traffic. The institution can stay on a cheaper infrastructure, doesn't incur higher costs, and it still accomplishes the goal of providing data.
A mental note:
I came across a quote the other day : A government that is expected to do everything for us, will take everything from us.
Too often I see this expectational attitude amongst folks like "everything must be provided for, free and openly accessible!", without taking into consideration what it all takes or the fact that they're not the only scrapers on the planet. Or the fact that there will always be an unlimited supply of idiots hogging up all the resources for no use other than to show off their latest Go code's concurrency stats.
I agree that we're paying taxes, but those taxes are already accounted for (very inefficiently, but yes), and demanding that they be used to fund all these new web infra for giving us more free stuff leads to the same convenient outcome : increase in our taxes. It's already happening, and it's not sustainable. I especially don't appreciate having to pay more taxes just for the sake of that idiot with the Go code :D.
And you can bet that if we demand that so and so government institution make everything freely available without caring about the details, then they will do it in the most expensive and wasteful way imaginable. The solutions I've written above seem a lot better to me than increasing my taxes plus creating yet another black hole in the govt budget.
I don't know exactly how we can make things work out, but if we ditched the expectational attitude and think more as a team player with the government being part of that team, we could go a long way.