anyone hosting their production Fedora / Islandora instance in the cloud? where?

416 views
Skip to first unread message

Ernie Gillis

unread,
Sep 29, 2015, 1:33:43 PM9/29/15
to islandora
Hi everyone!
I have been digging through the various threads to find answers.

This "islandora_deployments" info on github [1] is a great place for some of this, but I'm still left to ask the question....
     Is anyone hosting their production instance of Fedora and Islandora in the cloud? If yes, where?

Amazon Web Services seems to be a good place for doing some of the data storage (on Glacier or S3 - including DuraCloud). I have wondered about DuraCloud, but it doesn't seem to be something that would be used for live connections to Islandora (or maybe I'm mistaken). 

There may be multiple solutions for one "instance" (i.e. Fedora on DuraCloud, Drupal / Islandora on Acquia, Solr somewhere else, etc). Am curious what is out there, what's working, etc -- I would also suggest anyone adding more to the "islandora_deployments" if you haven't already :) 


Jared Whiklo

unread,
Sep 29, 2015, 2:21:57 PM9/29/15
to isla...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

We used to have our Islandora instance on Amazon. Cost became an issue.

cheers,
jared
> -- For more information about using this group, please read our
> Listserv Guidelines:
> http://islandora.ca/content/welcome-islandora-listserv --- You
> received this message because you are subscribed to the Google
> Groups "islandora" group. To unsubscribe from this group and stop
> receiving emails from it, send an email to
> islandora+...@googlegroups.com
> <mailto:islandora+...@googlegroups.com>. Visit this group
> at http://groups.google.com/group/islandora. To view this
> discussion on the web visit
> https://groups.google.com/d/msgid/islandora/ac50bd90-4062-470d-b41f-5d
99af333d41%40googlegroups.com
>
>
<https://groups.google.com/d/msgid/islandora/ac50bd90-4062-470d-b41f-5d9
9af333d41%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

- --
Jared Whiklo
jwh...@gmail.com
- --------------------------------------------------
The tragedy of Canada is that it could have had British culture,
French cuisine and American technology. Instead it got American
culture, British cuisine and French technology.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.14 (Darwin)

iEYEARECAAYFAlYK1sIACgkQqhIY384dF1Y+LQCgiLdP92dzO8LSe0xxoZxBn+UW
FBkAoKmhDiFLsIGtiQ5tg66QkFopdCNw
=fmbY
-----END PGP SIGNATURE-----

Rosemary Le Faive

unread,
Sep 30, 2015, 8:02:25 AM9/30/15
to islandora
Like Ernie, I'm curious too. The pricing scales seem rather advantageous for storing large files that are infrequently accessed - like many of the Islandora instances that we run. 

-Rosie

Brad Spry

unread,
Sep 30, 2015, 10:09:08 AM9/30/15
to islandora
Mr. Gillis,

UNC Charlotte's Islandora infrastructure is hosted on AWS, some technical details are here:
https://github.com/Islandora-Labs/islandora_deployments/blob/master/UNC_Charlotte.md

On the topic of cost, great AWS cost savings can be secured by paying ahead for EC2 virtual machines and RDS databases.   By paying ahead three years, the cost effectively becomes "buy one year, get two years free".   Your purchasing department should be receptive when presented with such substantive cost savings. It also is helpful to speak in terms purchasing departments already understand: Servers = Fixed Assets.

Here is a document I authored with some compelling AWS cost information:
https://docs.google.com/document/d/11SYkbbf3qhO_jFlz-dckkLohI4DWpn3RDRgwiH8j8Zg/edit?usp=sharing

AWS is also on the cusp of releasing a new storage technology, EFS, which can be used to replace the pricey, plan ahead, fixed sized EBS SSD.  And if the EFS release price is really right, I'm open to replacing S3 with EFS.


Sincerely,

Brad Spry
UNC Charlotte
Atkins Library

Ernie Gillis

unread,
Sep 30, 2015, 12:17:29 PM9/30/15
to islandora
Thanks Rosemary and Brad .... side topic "Mr. Gillis is my dad" ;) but I digress haha

Brad, your info is awesome! I do have a question about the cost breakdown versus the "System Schematic" PDF (it was something I was looking at rather closely when I started this thread).

For the cost breakdown, you're looking at the EC2. Do you have one in reference to the EBS? Also, for EBS, your schematic looks like you're using some EBS instances for live / production interaction (as opposed to cold or dark storage). I short, I'm looking for a sense of scope on how many different services you may be using, and what your usage is like?

I don't believe I'm going to scale as broadly, but I'm definitely at a turning point on thinking to my next step. On site implementation would be costly, but all my numbers for cloud solutions (for both processing and storage needs) seem to get to that same dollar value just 3 years in. It's neither good nor bad, but more having answers for what could be the perceived pros and cons for those dealing with the numbers later.

Brad Spry

unread,
Sep 30, 2015, 4:22:45 PM9/30/15
to islandora
EC2 = virtual machine instances
EBS = SSD-based storage

EBS is priced at $0.10 per GB, per month.   There is no pay ahead option for EBS storage.

I use EBS SSD for:
  • OS and server application software
  • Apache DocumentRoot
  • Temporary storage: Tomcat temp, YAS3FS cache
  • Fedora: objectStore and resourceIndex
  • Loading Dock: ingest staging, BagIt generation, Drupal temp, and Fedora upload directory (I place related file operations in close proximity)

EBS has a current maximum volume size limit of 16TB, which is small for the long haul...   I went with S3 for an unlimited depth Fedora datastreamStore. (I never want to hear my team say they're out of disk space ever again, seriously)

Besides the maximum volume size limit, EBS also requires you to plan for the future ahead of time; a lot of cost in my installation stems from this fact.   That's why I'm so jazzed about the upcoming EFS, it grows with you.   You don't have to provision EFS space ahead of time and pay for space you're not using yet.  Once released, I will re-engineer a lot of my installation to use EFS and save money by not having to guess about the future.

On my schematic, there are only two EC2 VMs:  1 staff-only/ingest server and 1 public facing server.   And 1 DB server in the center (RDS).

My schematic represents a first generation system and what you're seeing is akin to a dissection; a frog-like dissection of the Islandora system for the purpose of understanding.  For example, I now understand Fedora's objectStore and resourceIndex don't need 1TB of storage out of the gate; I overspecced because I was ignorant about the true resource requirements.   My hope is to help the Islandora community by giving back everything I have learned about the true resource and performance requirements for X-sized ingest.

I anticipated my first collection to be 500,000 large images; I had that number in mind in designing everything.  I benchmarked using the largest images I could find, like Hubble images, and my system is absolutely able for the task.      However, another collection ended up arriving on my Loading Dock first: large audio.   Large audio was way larger than large images and really taxed the system in new ways.   I'd now say that if you're planning on ingesting large audio, and by large I mean 4GB+ WAVs, then 30GB of server memory is not enough...   I'm currently switching my server VMs to HVMs, which means I'm moving to servers which can have the maximum amount of memory AWS offers.   I'll immediately double memory from 30 to 60GB, but we'll have the option of quick shifting to 122 and 244 GB of memory depending on the ingest task at hand.

On the topic of on-site vs. cloud, here's a list of things I no longer have to worry about:

1. Storage acquisition and drive replacement (which truthfully are only designed to last less than 4 years...)
2. Server room infrastructure, including power, water, fire suppression, networking, etc.
3. Shopping for servers, including being surprised by vendor misrepresentations and getting stuck with underspecced servers...
4. Server replacement
5. Uptime.  EC2 is "at least 99.95%", I can't it do better myself on-site.  It feels good saying that out loud, because it's the truth.  We don't have three shifts of sys admins; my requirements are a life outside of work!

If I stopped at #1 that would be enough...  For a digital repository, we're not just in this for a couple of years.  I'm interested in no less than a 100 year requirement.  Imagine the number of magnetic storage replacement cycles over 100 years... (I'll save you a calculation: 33 replacement cycles during 100 years, if replaced responsibly every 3 years).   If we really did some soul searching, we'd arrive on we don't meet that level of responsibility...  Administrators (of the suit variety) don't understand hardware, unless hardware is down...   I'm an old man in the server business, it's always been that way.  It's time for a sea change...

In summary, It's far cheaper in the long term to let someone else worry and execute timely hardware and storage replacement cycles.   The truth will set us free, free to do whatever we do best (which isn't on-site hardware and storage replacement cycles).

Also, what is the cost of choosing on-site server and infrastructure work over family?   Be sure to include your personal life in your considerations.  Your life is a mandatory requirement.

<B


Carell Jackimiek

unread,
Oct 1, 2015, 2:31:01 PM10/1/15
to islandora
Hello,

         I just wanted to add to Jared Whiklo's reply. At the University of Manitoba, we had our Islandora instance on Amazon for a couple of years. At that time it was supported by Discovery Garden. Unfortunately, as well as a growing storage expense, we reached a storage limit for our EC2 instance. At that time the Logical Volume Manager (LVM) had a volume limit of 1TB and for a single server, a maximum of 24 volumes. Discovery Garden has since let me know that  Amazon now has allowed SSD storage volumes of up ot 16TB so a single instance can scale up to 384 TB. If using Fedora 4, multiple EC2 instances are possible so this would not be a limitation, but we are still on Fedora 3. We only used Glacier for long term storage. 

Carell.

Kun Lin

unread,
Apr 25, 2018, 12:14:21 PM4/25/18
to islandora
Has anyone try any other cloud provider other than Amazon?  Any OpenStack hosting provider?
Reply all
Reply to author
Forward
0 new messages