Linux installation guide feedback

88 views
Skip to first unread message

mmikit...@gmail.com

unread,
Nov 6, 2014, 2:19:55 PM11/6/14
to ica-ato...@googlegroups.com
Hello,

I have been working through the Linux installation guide (see https://www.accesstomemory.org/en/docs/2.1/admin-manual/installation/linux/), and I have some feedback for you:
  1. Please provide Apache installation instructions in addition to Nginx (I suspect that the majority of people use Apache)
  2. Move the php-fpm configuration to a separate section e.g., "Performance tuning/further optimizations". This does not seem necessary for a basic install.
  3. In general, related to the previous point, the system, and by extension, the installation guide seems over-engineered for high volume sites. I suspect that a very small number of people require such features, and the majority of people just want a minimal, low-complexity system. You can put the performance/optimization add-ons into another section
  4. The "Create the database" step recommends using the "root" username which is a big security no-no. A separate user with minimal permissions should be created.
  5. Related to point #4, the password should be given at a prompt instead of at the command invocation to avoid adding it to the history (or other logs).
  6. When I unpacked the installation tar ball, a number of files and directories were world read and writable, notably in the config dir, which contains credentials. The default permissions should be adjusted, and users should be guided through steps on how to establish basic secure, beyond ensuring that the Firewall is properly configured.
All the best,
matt

Dan Gillean

unread,
Nov 6, 2014, 8:13:09 PM11/6/14
to ica-ato...@googlegroups.com
Hi there, 

Thank you so much for your input. We value this kind of feedback from our community, and are always looking for ways we can improve both the security and usability of our application, and its documentation. We've prepared some responses to your comments below.


Please provide Apache installation instructions in addition to Nginx (I suspect that the majority of people use Apache)

That's a great idea! We hope someone in our community hears you and contributes with some docs and is up to maintaining them. 

At Artefactual, we use Nginx because we have found that its configuration is simpler and the consumption of resources is lower - however, we also understand that Apache is a widely used web server. Our developers are very familiar with Nginx, and it is what we use in both development and production - for this reason, we are comfortable offering public documentation and support in our user forum for it. Our developers are less familiar with the Apache web server, and we worry about being unable to offer support for a feature we document - or being able to reliably document and test installation in multiple environments over time. 

We have always strived to keep everything about AtoM open - the code is open source, we supply comprehensive free documentation, and we provide free user support via the User Forum. However, as a small company that is still trying to encourage broader participation from skilled developers in the community, we are limited in how much time and resources we can commit to maintaining information about multiple installation configurations and other deployment decisions. 

However, despite this, we would definitely like to see a place for this. At the moment, we are in the process of preparing a new wiki for AtoM, which will consolidate and replace several older and out-of-date wikis and resources from our 1.x branch. In this new wiki, we hope to create a Community Resources section, where community-created resources (including documentation for alternate installation configurations)  can be shared, discussed, and collectively maintained. We hope that when we have this resource ready, you might consider contributing installation documentation using Apache! 

In the meantime, we do take pull requests and our project is in Github (https://github.com/artefactual/atom-docs).  We have also prepared a section which outlines some best practices for AtoM documentation, along with more information about Sphinx, the  documentation platform we are using. See: 


Move the php-fpm configuration to a separate section e.g., "Performance tuning/further optimizations". This does not seem necessary for a basic install.

Do you mean the whole php-fpm section or just the part of the configuration that relates to performance, e.g. opcache.*?

We consider php-fpm to be an important dependency in AtoM at the moment. php5-fpm can also play well with Apache, though perhaps because you are trying to install in a different environment, you have a different conception of what you need? php5-fpm is also replaceable with mod_php if you prefer - though this will only work with Apache, and we haven't tested this at Artefactual. 

In any case, again: if you have concrete suggestions you'd like to help us implement, we welcome pull requests! This is probably the easiest way to show us what you mean, and we can always discuss specifics directly on the pull request.


In general, related to the previous point, the system, and by extension, the installation guide seems over-engineered for high volume sites. I suspect that a very small number of people require such features, and the majority of people just want a minimal, low-complexity system. You can put the performance/optimization add-ons into another section.

What features do you mean, specifically? If you're talking about components like APC or Elasticsearch, I'm afraid to tell you that these are core dependencies in AtoM at the moment. Other non-essential dependencies, used for handling digital objects in AtoM, are already marked as optional. 

There are two things we feel it is important to remember. One, we have some users who are small archives with small holdings - but we also have users who are trying to scale the application to handle hundreds of thousands, or even millions, of records. We need to strike a balance between the two. Experienced sysadmins can make their own decisions about what they need to implement to support their intended usage, or what they would like to change. 

The other thing to remember is that AtoM has evolved iteratively over the last 7 years, in spurts. Our community-driven business model has been successful in keeping AtoM as a viable and constantly improving project - but it also means that it is very difficult for us to find sponsors who are willing to help us fund work necessary for back-end clean-up and code maintenance over time. There are many ways we would like to optimize and improve AtoM at all levels - most notably, getting it off of the Symfony 1x framework - but without sponsoring institutions to help us make this possible, we are only able to do bits and pieces at a time. 

Because of this, the code base is, at core, largely the same as it was in 2007 - and as the software's features and functionalities have grown, these have been appended on to the original code base. The resulting environment and its dependencies does have much room for improvement, we know! But without either sponsorship, or siginificant help from developers in our community, we're unable to take on radical rewrites of the code base to simplify the dependencies and clean up the code.


The "Create the database" step recommends using the "root" username which is a big security no-no. A separate user with minimal permissions should be created.

We're open to this possibility. One thing to point out is that many of our users, including those doing the installation, are often archivists - not experienced system administrators. There are people out there with whole careers dedicated to MySQL analysis - we are sure that experts could improve this process significantly. Our initial hope was to keep these steps as simple as possible for a wide variety of users. 

We would like to point out that we are simply using root to create the database - afterwards, we immediately recommend, for security purposes, that another user be created:
Then, in the following section, we encourage the end user to use this new, non-root user when setting up the web installer:
Ultimately, the MySQL user you end up using does need CREATE permissions, so they can create new tables. Unfortunately, CREATE permissions include both creating tables and creating databases. 

If you have further suggestions on how we might improve this - again, we welcome a pull request! 


Related to point #4, the password should be given at a prompt instead of at the command invocation to avoid adding it to the history (or other logs).

That's a great point. One of our developers has prepared a pull request on that section of the documentation, and we welcome your feedback on it before we merge it, so we can be sure that we're addressing the points you've raised:


When I unpacked the installation tar ball, a number of files and directories were world read and writable, notably in the config dir, which contains credentials. The default permissions should be adjusted, and users should be guided through steps on how to establish basic secure, beyond ensuring that the Firewall is properly configured.

Another great catch, and something we intend to look into. This will take us a bit longer to fully investigate and address, so we have filed an issue ticket here: 

Thank you again for your input! We value this kind of feedback greatly. Let us know if you have further comments, questions, or concerns. And please do consider contributing back to our documentation! 

Cheers,

Dan Gillean, MAS, MLIS
AtoM Product Manager / Systems Analyst,
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

--
You received this message because you are subscribed to the Google Groups "ICA-AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at http://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/baff7d39-95f5-4c1a-b2b4-dc04c5bab5f2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mmikit...@gmail.com

unread,
Nov 7, 2014, 10:08:56 AM11/7/14
to ica-ato...@googlegroups.com
Hi Dan,

Thank you for the quick and thorough response! I'll try to address your response in order.

1. RE: Apache installation instructions: I can contribute documentation, but I do not know whether I should wait for the new wiki or contribute to the GitHub repo. My preference is to write it once, so please advise on the best course of action.

2. RE: php-fpm section: This may be fallout from the focus on Nginx. I am used to seeing minimal LAMP projects say something like:
  1. sudo apt-get install libapache2-mod-php5
  2. Edit php.ini with necessary security parameters (register globals, magic quotes, etc.)
  Then, if someone wants to fine tune the system, there is a separate section/guide. There could even be a separate section for using php-fpm with Apache.

3. RE: degree of system and documentation complexity: We can address the php-fpm issue mentioned in #2 with a separate Apache section, though I do have concerns with the system architecture. Is there a reason for using APC and not native PHP and optimizing the data structures where needed? And, although Propel provides the benefit of an abstracted ORM and multiple DB support, I think that it adds undue complexity to your organization in the form of documentation and support given that the installation guide recommends MySQL, and I suspect that if someone need a MySQL database, he/she will find a way to obtain one. Furthermore, I cannot proceed with the installation due to a non-existent "propel" database (this is being addressed in another thread, https://groups.google.com/forum/#!topic/ica-atom-users/Cii7aUUSs-g). 

4. RE: Using root in the "Create database" step: I overlooked the section on creating a non-root user, so thank you for mentioning that!

5. RE: Supplying the password at the prompt: I just commented on the pull request.

6: RE: Directory/file permissions in the tarball: Thank you for posting the issue.

I trust this helps.

All the best,
matt

Jesús García Crespo

unread,
Nov 7, 2014, 12:26:21 PM11/7/14
to ica-ato...@googlegroups.com
Hi Matt,

On Fri, Nov 7, 2014 at 7:08 AM, <mmikit...@gmail.com> wrote:
3. RE: degree of system and documentation complexity: Is there a reason for using APC and not native PHP and optimizing the data structures where needed?

Yes. APC - namely the data cache part of it - allows us to save the output of expensive operations and share the results across multiple request/responses contexts. You could have the fastest piece of code in place and still find convenient to cache its results to speed up your request handlers. It also exposes cache invalidation options, like time-to-live or manual invalidation.
Additionally, APC makes your code faster by generating intermediate code to save you from parsing your PHP scripts (aka opcache). That's specially useful in giant frameworks like Symfony where the number of files and classes is over a couple of thousands. PHP made opcache optional, which is a bad idea to me. Think of Python, Ruby, Java or C# web frameworks, their developers give all that for granted as it's frequently a non-optional core functionality.

APC was not ported to PHP 5.5. Its user-land cache is now provided by APCu and opcode is managed by Zend OPcache, which claims to be faster and better designed. Our docs reflect these changes and we upgraded our code accordingly. Curiously, OPcache is now compiled by default but disabled in the configuration.
APC/APCu is not distributed though, i's not possible to share cache across hosts. However our cache layer is compatible with memcache in case that you have more than one application front-end and you want to share things like user sessions. Still, we didn't mention that in our docs for simplicity. So you can start small in a single-node environment, but you can scale too.

Regarding to other components like Elasticsearch, in our 1.x releases (still available) our search engine was a PHP-based implementation of Lucene written by Zend. We had all kind of performance, concurrency and resource consumption issues that led us to look for a replacement. Elasticsearch is blazing fast and distributable plus it offers innumerable search APIs like facets, suggestions, etc...

We knew that adding extra services to the stack was going to make things more complicated but we try to alleviate that by having better docs and having more people in the community giving technical support.


--
Jesús García Crespo,
Software Engineer, Artefactual Systems Inc.
http://www.artefactual.com | +1.604.527.2056

Dan Gillean

unread,
Nov 7, 2014, 1:53:05 PM11/7/14
to ica-ato...@googlegroups.com
Hi Matt,

Thank *you* for taking the time to share your feedback with us - it's exactly this that makes us love working in open source, and something we want to encourage more in our community, which is an interesting mix of cultural heritage workers of various stripes, and developers.

As for the documentation - I think for now, it might be best to hold off until we have the new wiki up. I will re-post on this thread when it's available! It might be a bit - there's a lot of content to migrate, review, rewrite, merge, and clean up, and all this work is unsponsored - meaning we do it whenever we don't have client work that is taking priority. But we're committed to improving access to our resources, consolidating all the old namespaces and sites, and updating our documentation wherever and whenever possible, so it will happen!

Thanks again for your feedback,

Dan Gillean, MAS, MLIS
AtoM Product Manager / Systems Analyst,
Artefactual Systems, Inc.
604-527-2056
@accesstomemory

On Fri, Nov 7, 2014 at 7:08 AM, <mmikit...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages