It'll be by encryption. The scheme we're planning will use "profile dats" which represent identities. Because the dats are represented by public keys, it'll be possible to encrypt messages for a profile-dat using its URL. Key-distribution will be through a Web-of-Trust which are assisted by identity services. (More on that another time, but Dat provides similar protocol guarantees to Certificate Transparency, and so it makes a good foundation for identity & key-distribution services.)
It might be possible to use an access-control scheme as well, but it would require 1) users to authenticate on the network, and 2) all recipients to honor the desired access control.
I should also point out that dat URLs are unguessable, and the network encrypts the traffic and discovery such that you have to possess the URL in order to read the network traffic. That makes the URLs secret. Put another way, they are "read capabilities."
Gains:
- Can read/write without an internet connection
- Also works over the local wifi
- No network latency for writes, and downloaded data is cached for fast reads as well
- User controls the dataset and can move it between applications at-will
- Better privacy story with end-to-end encryption
- Bandwidth sharing (p2p)
- No ops for developers because there's no service to maintain
Losses:
- As you say, no central authority enforcing rules on what can/cant be read and written
- Data aggregated from multiple users is eventually consistent; no transactions or strict-ordering is possible
- No concept of "global knowledge"; there may be updates or datasets that havent reached your computer yet
You're right about the scaling profile, in that most users aren't going to be able to download billions of records. I wrote about this here:
http://pfrazee.github.io/blog/achieving-scale. Short answer is, we can use dat-web crawlers to create aggregated like the ones you suggest. So long as users do their writes to dats, they maintain the advantage of an open data system which Dat provides, even if they do reads from the services.