Hi all,
We are evaluating Frontera as a candidate for large scale crawling as part of an education program focused on data engineering. We are planning to crawl a sizable portion of the public web in Switzerland.
I would be very grateful for any indication regarding optimal hardware for running frontera on such a scale (e.g. if single server with lots of cores and ram are a preferable option versus distributed setups).
Due to our focus on teaching we do not plan to opt for cloud based solutions.
Thanks a lot in advance,
Erik