Hi Michael,
I do believe Nutch keeps all the robots.txt files it comes across, though that would be in the raw (and not publicly distributed) Nutch crawler output that we have before we process it to become the publicly available WARC/WAT/WET files. I also don't remember off the top of my head how simple it is to pull that information out - at the very least it would require processing the raw datasets again, which is a fairly large task.
I do feel that a dataset of robots.txt files could be valuable resource however and for that very reason (and for my own enjoyment) I've played around with writing a robots.txt crawler in Go. Grabbing robots.txt files is a fun and relatively simple problem as you don't need to worry too much about rate limiting, as most web domains are entirely separate!
For your purpose, I'm curious as to how indicative robots.txt would be though. To have an idea of which companies from a certain industry don't have a web presence (and which robots.txt files you'd be interested in looking at), you'd likely already have a list of them, as "discovering" their absence from the Common Crawl data would be quite a hard problem. If you do have a small list that you can enumerate easily, getting the robots.txt files from their domains and parsing them with one of the many available robots parsing tools would lead to the results you're interested in without having to wade through hundreds of millions of domains worth of robots.txt data.
P.S. I loved the reference to Slurm Soda! All glory to the hypnotoad, of course.