AWS's solution is described in the "Compute Checksum" section of their Solution Components
page. Basically, they calculate checksums on 20 GB of data at a time. If the file has more data, a new lambda function picks up where the old one left off. That's a brilliantly simple solution, allowing them to run fixity checks on multi-terabyte files without any single worker ever violating the five minute timeout.
The checksum algorithms are currently limited to md5 and sha1, as Nathan pointed out. I'm sure they can add others without much trouble.
I figured AWS and other storage providers would get to this point someday. The closer they are to having their own native implementations of services offered by specialty providers like APTrust and DuraCloud, the easier it will be for them to recruit new customers directly and disintermediate the specialty provider.
On a technical level, it forces DDPs like us to periodically reevaluate whether our own implementations are still useful, reliable, and cost-effective. Services like AWS's new fixity checker have their upsides (easy deployments, low maintenance, no infrastructure requirements) and their downsides (very unpredictable costs, reliance on opaque, closed-source third party technologies). They also have some complexities that might not be apparent from Amazon's technical overview. For example, you still have to keep a database of all your checksums somewhere, and you still have run some system to record the results of Amazon's fixity checks.
If people are interested, we can discuss this in a future tech call, partner meeting, NDSA meeting, or wherever makes sense. To me, the existential questions this new service raises are more interesting than the technical questions. What value do DDPs provide over basic storage services like S3 and Glacier as they accumulate new add-ons like this? Can the preservation community explain the value of DDPs in this evolving technical landscape to the people who decide their budgets?
Lead Developer, APTrust