the host-level graph is built from links pointing from the host of one
web page to the host of another web page. Same for registered domains
and the domain-level graph.
The transpose of the graph contains the backlinks from host to host,
resp. domain to domain. Some questions are easy to answer, eg. which
hosts/domains link to a host/domain of interest.
The graphs are compact but are
- not on page-level,
- notoriously incomplete (the crawls only cover a sample of the web)
- and do not include link attributes or anchor texts
I agree that there are great use cases for a backlink index. However,
an average main crawl includes 500+ billion page-level links. That's
100 times as many rows/records as the index of the crawled pages.
Consequently, the full backlink index would be expensive to create and
download, and also not cheap to query.
For now, one way to process data to extract backlinks are the WAT files.
Alternatively, you could look first for backlinking hosts/domains and
then pick pages by host/domain name.