Hi Erik,
That sounds very interesting. Are you planning to share the resulting dataset in one way or another?
To answer your question: I'd go for a distributed cluster of average servers instead of a single huge one with Elasticsearch installed on all the nodes alongside Storm. Something like the equivalent specs of an EC2 m5.2xlarge instance, but the only important thing to have are SSD drives as this makes quite a difference to Elasticsearch's performance.
Going distributed is also probably more interesting from a teaching perspective. It also makes it more robust re-hardware failure and depending on the configuration of your network, having more machines might be more efficient.
What are you planning to do with the documents once crawled? Indexed for search? Store only extracted metadata? Store the entire pages at the WARC format?
Kind regards
Julien