Actually, this is not the case. When a digital object with a text layer (such as a PDF) is imported into AtoM, the text layer is indexed and searchable. In fact, there is a field limiter in the Advanced search so you can restrict a search to just the text in a digital object:
However, there are a couple things to keep in mind with this.
First, just because the text is OCR'ed, it doesn't mean it's good quality - especially with handwritten or older documents! I discuss this with an example in the following thread:
Second, AtoM's database currently has a size limit on the transcript field - this field is a TEXT type field in the property table, whose max length is currently set for 65,535 bytes. This would equal 65,535 characters for single-byte encoded characters (such as Latin-1 encoding), but AtoM uses UTF-8, in which characters can be 1-4 bytes, depending. Soooooooo.... it depends on the characters as to how much text that translates to, unfortunately. When the limit is surpassed, the transcript simply clips - the digital object will be saved, but it means that later pages in a very large PDF or text document may not in fact be searchable.
Scalability in general
AtoM should be able to scale to support millions of records. I would expect to run into issues after about ~2 million records, but even then there are workarounds. Much of it has to do with the deployment - the resources allocated, how you disperse and scale those resources, and also whether you use a 2-site deployment model, which can allow you to increase the caching and reduce the load on the public-facing front end. This has been discussed in the forum recently here:
I know of at least one AtoM site (that is unfortunately not public, so I cannot share) that was heavily modified locally and currently supports 9 million records - but this is an extreme example, and most users would run into major scalability issues beyond about 1-2 million records.
We also continue to add performance and scalability enhancements and code optimizations in each release. At some point we will reach the limit of what we can do without completely rewriting AtoM the ORM included in Symfony 1.x is one of AtoM's biggest bottlenecks currently), but we are considering massive scalability as an important design factor as we evaluate options for AtoM3 - and we intend to ensure that there is an upgrade path from AtoM 2 to AtoM3 when it is finally developed.
So yes - it is not a perfect solution, and you may run into problems as you add millions of records. Many of these can be solved or worked around depending on your deployment and the resources allocated, but AtoM itself, like many web-based applications, may eventually have issues. I hope that perhaps this will at least allow to better evaluate your options and determine if AtoM will meet your needs.