Hi Colin,
Thanks for bringing this up. I'd have expected to see a HTTPS URL within one WAT file, let alone 4% of a crawl archive, so I went to investigate.
To clarify, HTTPS support was backported to the Nutch implementation that Common Crawl uses. I spent last night hunting down why HTTPS URLs were not be produced in the December crawl archives. I eventually found the cause - a single extra whitespace in a config file that resulted in filtering HTTPS URLs when they shouldn't have.
As such, there won't be any HTTPS URLs in the most recent crawl archives but the error will be fixed for February 2015 and all future crawls.