I'm considering the idea of building a niche search engine. The goal would be to only index programming related information, starting with a particular tech stack. We could do this by starting with a curated list of domains and carefully crafting a set of crawling rules. Could the group please give me an idea as to how difficult I should expect this to be, and what are the largest issues I should expect to run into? Is it a matter of paying for bandwidth or more of a technical/engineering problem?
For context this isn't intended as a solo project, but rather as a startup. We're trying to estimate how much we should expect it to cost us and how long it would take in order to decide if to pursue the idea. The engine would be used by LLMs in order to provide them with background information, so we would make design choices around that.
I would appreciate any guidance in the matter.
Lukas