EC2 = virtual machine instances
EBS = SSD-based storage
EBS is priced at $0.10 per GB, per month. There is no pay ahead option for EBS storage.
I use EBS SSD for:
- OS and server application software
- Apache DocumentRoot
- Temporary storage: Tomcat temp, YAS3FS cache
- Fedora: objectStore and resourceIndex
- Loading Dock: ingest staging, BagIt generation, Drupal temp, and Fedora upload directory (I place related file operations in close proximity)
EBS has a current maximum volume size limit of 16TB, which is small for the long haul... I went with S3 for an unlimited depth Fedora datastreamStore. (I never want to hear my team say they're out of disk space ever again, seriously)
Besides the maximum volume size limit, EBS also requires you to plan for the future ahead of time; a lot of cost in my installation stems from this fact. That's why I'm so jazzed about the upcoming E
FS, it grows with you. You don't have to provision E
FS space ahead of time and pay for space you're not using yet. Once released, I will re-engineer a lot of my installation to use E
FS and save money by not having to guess about the future.
On my schematic, there are only two EC2 VMs: 1 staff-only/ingest server and 1 public facing server. And 1 DB server in the center (RDS).
My schematic represents a first generation system and what you're seeing
is akin to a dissection; a frog-like dissection of the Islandora system for
the purpose of understanding. For example, I now understand Fedora's objectStore and resourceIndex don't need 1TB of storage out of the gate; I overspecced because I was ignorant about the true resource requirements. My hope is to help the Islandora community by giving back everything I have learned about the true resource and performance requirements for X-sized ingest.
I anticipated my first collection to be 500,000 large images; I had that number in mind in designing everything. I benchmarked using the largest images I could find, like Hubble images, and my system is absolutely able for the task. However, another collection ended up arriving on my Loading Dock first: large audio. Large audio was way larger than large images and really taxed the system in new ways. I'd now say that if you're planning on ingesting large audio, and by large I mean 4GB+ WAVs, then 30GB of server memory is not enough... I'm currently switching my server VMs to HVMs, which means I'm moving to servers which can have the maximum amount of memory AWS offers. I'll immediately double memory from 30 to 60GB, but we'll have the option of quick shifting to 122 and 244 GB of memory depending on the ingest task at hand.
On the topic of on-site vs. cloud, here's a list of things I no longer have to worry about:
1. Storage acquisition and drive replacement (which truthfully are only designed to last less than 4 years...)
2. Server room infrastructure, including power, water, fire suppression, networking, etc.
3. Shopping for servers, including being surprised by vendor misrepresentations and getting stuck with underspecced servers...
4. Server replacement
5. Uptime. EC2 is "at least 99.95%", I can't it do better myself on-site. It feels good saying that out loud, because it's the truth. We don't have three shifts of sys admins; my requirements are a life outside of work!
If I stopped at #1 that would be enough... For a digital repository, we're not just in this for a couple of years. I'm interested in no less than a 100 year requirement. Imagine the number of magnetic storage replacement cycles over 100 years... (I'll save you a calculation: 33 replacement cycles during 100 years, if replaced responsibly every 3 years). If we really did some soul searching, we'd arrive on we don't meet that level of responsibility... Administrators (of the suit variety) don't understand hardware, unless hardware is down... I'm an old man in the server business, it's always been that way. It's time for a sea change...
In summary, It's far cheaper in the long term to let someone else worry and execute timely hardware and storage replacement cycles. The truth will set us free, free to do whatever we do best (which isn't on-site hardware and storage replacement cycles).
Also, what is the cost of choosing on-site server and infrastructure work over family? Be sure to include your personal life in your considerations. Your life is a mandatory requirement.
<B