What is actually stored in the database and storage folder?

486 views
Skip to first unread message

msim...@gmail.com

unread,
Oct 2, 2014, 4:06:57 AM10/2/14
to hippo-c...@googlegroups.com
Hi everyone,

We're finishing off a new Hippo Community-powered website for a new product, set up to use a Postgres database, and I've been unable to find a clear answer to a question I have. I wonder if you can help me.

What exactly is stored in:
  1. The (Postgres) database
  2. The repo.path storage folder

So far, whenever we've added new components or made more significant hst configuration changes (in local and development environments), I've had to delete both the contents of the database and the storage folder before restarting for them to appear. What exactly am I deleting?

Hypothetically, if I were to delete these in production prior to an update (that will recreate them), what would I be losing?

Thanks for your help.

Mark.

Bert Leunis

unread,
Oct 2, 2014, 4:25:01 AM10/2/14
to hippo-c...@googlegroups.com
On Thu, Oct 2, 2014 at 10:06 AM, <msim...@gmail.com> wrote:
Hi everyone,

We're finishing off a new Hippo Community-powered website for a new product, set up to use a Postgres database, and I've been unable to find a clear answer to a question I have. I wonder if you can help me.

What exactly is stored in:
  1. The (Postgres) database
  2. The repo.path storage folder
When running Hippo, the JCR data is persisted in some storage facility. Jackrabbit offers several options like MySql, Postgres etc. When you run Hippo locally using the archetype, the default facility used is a H2 database that is stored in a folder on your file system. The location of that can be specified using the repo.path property.

I think you may be interested in getting some more info about working with storage and configuration changes by reading the info here:


So far, whenever we've added new components or made more significant hst configuration changes (in local and development environments), I've had to delete both the contents of the database and the storage folder before restarting for them to appear. What exactly am I deleting?

Hypothetically, if I were to delete these in production prior to an update (that will recreate them), what would I be losing?

Thanks for your help.

Mark.

--
Hippo Community Group: The place for all discussions and announcements about Hippo CMS (and HST, repository etc. etc.)
 
To post to this group, send email to hippo-c...@googlegroups.com
RSS: https://groups.google.com/group/hippo-community/feed/rss_v2_0_msgs.xml?num=50
---
You received this message because you are subscribed to the Google Groups "Hippo Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hippo-communi...@googlegroups.com.
Visit this group at http://groups.google.com/group/hippo-community.
For more options, visit https://groups.google.com/d/optout.

Jeroen Reijn

unread,
Oct 2, 2014, 4:35:04 AM10/2/14
to hippo-c...@googlegroups.com
Hi Mark,

Bert explained most of the information, but in general (if properly set up) all data is stored in Postgres inside the db. 

When not running locally the only thing that is left within the repo.path folder is the Lucene index. The Lucene index is used for searching the repository (or doing queries from the HST, etc)

With regards to your question about deleting content I don't think it is necessary and it's definitely not how you would want to move forward.

In a more general case you should be able to leave the data in postgres and the storage on the FS.

I think it would be worthwhile to read the following two pages:


The second page describes how you can instruct the repository to reload the changes you made (locally) and have the system reload them once the changes are deployed in a different environment.

Cheers,

Jeroen


--
Jeroen Reijn
Hippo

Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 101 Main Street, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

http://about.me/jeroenreijn

msim...@gmail.com

unread,
Oct 2, 2014, 6:09:06 AM10/2/14
to hippo-c...@googlegroups.com
Thanks Bert, Jeroen.

Ok, that makes more sense. I've set up Postgres using the instructions on the site, which it's definitely using, and already set repo.path to point at a folder that will be automatically backed up (although it sounds like it wouldn't be the end of the world if it wasn't).

So, going by the linked "Reload on startup initialize items" page, it sounds like I just need to add the "hippo:reloadonstartup" property to the sections of hippoecm-extension.xml I want to automatically reload, and make sure repo.boostrap=true. Any harm in adding it to all of the sections?

I'll stick to the module-wide version ID for now - I found a forum post about changing "<Implementation-Build>${buildVersion}</Implementation-Build>" to use ${timestamp} which apparently works better for git users.

We have plenty of opportunities to test this reload in various environments before launching in a week or so.

Thanks for your help,
Mark.

Bert Leunis

unread,
Oct 2, 2014, 6:26:30 AM10/2/14
to hippo-c...@googlegroups.com
On Thu, Oct 2, 2014 at 12:09 PM, <msim...@gmail.com> wrote:
Thanks Bert, Jeroen.

Ok, that makes more sense. I've set up Postgres using the instructions on the site, which it's definitely using, and already set repo.path to point at a folder that will be automatically backed up (although it sounds like it wouldn't be the end of the world if it wasn't).

So, going by the linked "Reload on startup initialize items" page, it sounds like I just need to add the "hippo:reloadonstartup" property to the sections of hippoecm-extension.xml I want to automatically reload, and make sure repo.boostrap=true. Any harm in adding it to all of the sections?
Harm maybe not, but I can think of some disadvantages:
- The startup will take more time.
- If configuration was changed in the repository and you did not apply that change to your bootstrap data, you will revert that change.
- Any developer that looks at hippoecm-extension.xml later will not know if the reload flag was set for a particular reason.

I would set the flag only if you really need it.

msim...@gmail.com

unread,
Oct 2, 2014, 6:28:42 AM10/2/14
to hippo-c...@googlegroups.com
Ok, thanks Bert, makes sense.

Jasper Floor

unread,
Oct 2, 2014, 7:03:34 AM10/2/14
to hippo-c...@googlegroups.com
You can add a version to an initialize item. Then it will only be reloaded if the version number is newer than the registered one. I think it actually makes sense to put a version number on each item since that shows when it was added. Not strictly necessary if you use scm but absolutely necessary if you use reloadonstartup (not required but certainly a best practice). Just setting a reloadonstartup will cause it to be reloaded every time as Bert says. That is not what you want probably.

mvg,
Jasper

Bert Leunis

unread,
Oct 2, 2014, 7:18:38 AM10/2/14
to hippo-c...@googlegroups.com
On Thu, Oct 2, 2014 at 1:03 PM, Jasper Floor <j.f...@onehippo.com> wrote:
You can add a version to an initialize item.
Hey Jasper, he already said that he choose the other option ("I'll stick to the module-wide version ID for now") ;-)

Jasper Floor

unread,
Oct 2, 2014, 7:48:34 AM10/2/14
to hippo-c...@googlegroups.com
On Thu, Oct 2, 2014 at 1:18 PM, Bert Leunis <b.le...@onehippo.com> wrote:
On Thu, Oct 2, 2014 at 1:03 PM, Jasper Floor <j.f...@onehippo.com> wrote:
You can add a version to an initialize item.
Hey Jasper, he already said that he choose the other option ("I'll stick to the module-wide version ID for now") ;-)

Sorry, I was trying to explain that I think it is a bad idea. I should've started with "you should" rather than "you can".

mvg,
Jasper

msim...@gmail.com

unread,
Oct 3, 2014, 12:56:16 PM10/3/14
to hippo-c...@googlegroups.com
Ok thanks all, yeah, we've already had some dev issues with bootstrapped files overwriting changes made in the repo, at least I know what it is now!

I have another related question. Our environment infrastructure calls for a single CMS instance serving content for multiple Site instances, on different servers and different tomcat instances.

Even though they're all using a shared remote database, unsurprisingly, the Site instances on a different server to the CMS is complaining about the missing "storage" repository.

How can I set up multiple distributed Sites to use a single CMS? I had hoped it was as simple as using a shared database but it seems not.

Thanks again,
Mark.

Jeroen Reijn

unread,
Oct 3, 2014, 2:54:23 PM10/3/14
to hippo-c...@googlegroups.com
Hi Mark,

see my comments inline.

On Fri, Oct 3, 2014 at 6:56 PM, <msim...@gmail.com> wrote:
Ok thanks all, yeah, we've already had some dev issues with bootstrapped files overwriting changes made in the repo, at least I know what it is now!

I have another related question. Our environment infrastructure calls for a single CMS instance serving content for multiple Site instances, on different servers and different tomcat instances.

Even though they're all using a shared remote database, unsurprisingly, the Site instances on a different server to the CMS is complaining about the missing "storage" repository.

How can I set up multiple distributed Sites to use a single CMS? I had hoped it was as simple as using a shared database but it seems not.

Well yes in essence you are right and that should almost be all there is. However each repository instance will keep it's own storage folder for their own lucene index.

So in your case it should be indeed as simple as sharing the same db. See also our linux installation guide.

Now for the lucene index you need to specify a location to store the index (also known as the storage directory). 

If you've followed our installation manual [1]. then there is a section which describes the context.xml.

In that file you can specify where the lucene index is stored.

<Parameter name="repository-directory" value="${catalina.base}/../repository" override="false"/>

In essence you should also be able to use the exact same repository.xml configuration file.

The only important thing to remember is that in case you cluster a bunch of repositories that they will all have their own jackrabbit cluster node id. In the manual this is specific in the setenv.sh and happens dynamically.

CLUSTER_ID="$(whoami)-$(hostname -f)"

msim...@gmail.com

unread,
Oct 3, 2014, 5:16:45 PM10/3/14
to hippo-c...@googlegroups.com
Hi Jeroen, 

I'll take a more detailed look at this when I get back in on Monday, but this looks very useful, thanks. I expected it might be involve some changes these sort of things - if I have any more questions on Monday I'll be in touch!

All the best,
Mark.
Reply all
Reply to author
Forward
0 new messages