- You want to control for quality of service, uptime and failover for accessing a node
- You want to install custom components inside or near the node software (on the same server or data center). Examples include custom CouchDB map/reduce queries, custom python API interfaces, backend data indexing and extraction software/services.
- You want to interface your node into a private network of LR (or other data).
- You want to protect yourself and your data from the possibility that the public nodes might be turned off someday
Other than those reasons, probably the stock public LR nodes will serve your purpose. We plan to run them indefinitely and are doing our best to make sure indefinitely means a long time.
You are correct that a node is generally a "mirror" of another node. Nodes can choose to partially replicate/mirror from each other or fully replicate with each other. A node itself requires a reasonable amount of hardware to run but primarily uses cpu/ram when it is in operation and is relatively tame when not active so it can be in a shared or virtual environment reasonably. It does want a fair bit of ram and a few TB of disk for the entire data set and all the indices.
Availability of Content
You can place copyright restrictions on your content (or leave the ones that are there in place). All that you put into LR is metadata that describes your content. Others would consume that metadata, index and eventually direct users to access your content. If that content is behind a paywall or registration wall, then most users would not be able to access (which would probably, today, cause most indexers not to consume your content).
However, we have talked with a few traditional publishers about publishing their data in a format that allows content consumers to understand that the license is not totally free, so they can warn their users before they click over..
You might also consider an NY Times approach, where the first 10 resources (whatever they are) are free, and after that you charge. In this way, you remain compatible with free content indexers, but intensive users are required to get a license.
Does this help?
Steve