ElasticSearch failing after moving AtoM to new disk

Creighton Barrett

unread,

Mar 25, 2015, 12:25:52 PM3/25/15

to ica-ato...@googlegroups.com

Hello,

We accidentally crashed our site after running out of disk space. Our IT office moved the site to a new disk with more space, but we cannot get ElasticSearch to complete rebuilding the index. We have restarted Apache and cleared the cache but it keeps failing.

I'm told that "It could be a space issue, application is stored in /appl – elastic search cache is in /var ." Do we actually need to move the cache for the index to work? Any ideas?

This is our elastic search status after the index failed:

[root@findingaids elasticsearch]# curl -XGET 'http://localhost:9200/_cluster/health/?level=shards&pretty=true'

{

"cluster_name" : "elasticsearch",

"status" : "yellow",

"timed_out" : false,

"number_of_nodes" : 1,

"number_of_data_nodes" : 1,

"active_primary_shards" : 4,

"active_shards" : 4,

"relocating_shards" : 0,

"initializing_shards" : 0,

"unassigned_shards" : 4,

"indices" : {

"atom" : {

"status" : "yellow",

"number_of_shards" : 4,

"number_of_replicas" : 1,

"active_primary_shards" : 4,

"active_shards" : 4,

"relocating_shards" : 0,

"initializing_shards" : 0,

"unassigned_shards" : 4,

"shards" : {

"0" : {

"status" : "yellow",

"primary_active" : true,

"active_shards" : 1,

"relocating_shards" : 0,

"initializing_shards" : 0,

"unassigned_shards" : 1

},

"1" : {

"status" : "yellow",

"primary_active" : true,

"active_shards" : 1,

"relocating_shards" : 0,

"initializing_shards" : 0,

"unassigned_shards" : 1

},

"2" : {

"status" : "yellow",

"primary_active" : true,

"active_shards" : 1,

"relocating_shards" : 0,

"initializing_shards" : 0,

"unassigned_shards" : 1

},

"3" : {

"status" : "yellow",

"primary_active" : true,

"active_shards" : 1,

"relocating_shards" : 0,

"initializing_shards" : 0,

"unassigned_shards" : 1

}

}

}

}

}

We're running 2.1.1 right now. I've also been given an error log that I can pass along if it helps. Sorry, if this isn't entirely clear! Our sys admin who is most familiar with this application is on vacation.

Thanks,

Creighton

Jesús García Crespo

unread,

Mar 25, 2015, 12:44:33 PM3/25/15

to ica-ato...@googlegroups.com

Hi Creighton,

Based on the output of the cluster health API that you sent us, it seems that your ES cluster seems to have only one node available while your index "atom" is configured to have two replicas per shard, so the health status of your cluster won't become green until you add another node to your cluster.

You can tweak the configuration of your indices using the indices API. In particular, take a look to: http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html.

However, I don't see why search:populate would fail under these circumstances. I have reproduced the same scenario by updating my index number of replicas and search:populate still worked for me. Is search:populate reporting any kind of meaningful error in your terminal? We've experienced issues in the past where the search:populate task fails because some of the SQL queries used to fetch the data from the database failed, which is not related to the state of the Elasticsearch cluster.

Regards,

--
You received this message because you are subscribed to the Google Groups "ICA-AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To post to this group, send email to ica-ato...@googlegroups.com.
Visit this group at http://groups.google.com/group/ica-atom-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/f71ec8e0-9b36-4372-9cf0-039dc712fd56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Jesús García Crespo,
Software Engineer, Artefactual Systems Inc.
http://www.artefactual.com | +1.604.527.2056

Margaret Vail

unread,

Mar 25, 2015, 1:11:43 PM3/25/15

to ica-ato...@googlegroups.com

Hi Jesús,

Below is the error we are getting in elasticsearch.log. At the moment we are not getting any meaningful error from search:populate - just the number 1 and it stops.

Earlier, before I deleted the elastic search cache, we were getting an elasticsearch error:
/atom/QubitInformationObject/107377 caused UnavailableShardsException[[atom][2] Primary shard is not active or isn't assigned is a known node. Timeout: [1m], request: org.elasticsearch.action.bulk.BulkShardRequest@27508241]

ElasticSearch.log error:

[2015-03-25 12:45:10,845][DEBUG][action.search.type ] [Snowfall] [atom][3], node[AZDLTwo2SzmPHLKX-5EY7g], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@504c41bc] lastShard [true]

org.elasticsearch.search.SearchParseException: [atom][3]: from[-1],size[10]: Parse Failure [Failed to parse source [{"size":"10","facets":{"languages":{"facet_filter":{"bool":{"must":[{"term":{"publicationStatusId":160}}]}},"terms":{"field":"i18n.languages","size":10}},"levels":{"facet_filter":{"bool":{"must":[{"term":{"publicationStatusId":160}}]}},"terms":{"field":"levelOfDescriptionId","size":10}},"mediatypes":{"facet_filter":{"bool":{"must":[{"term":{"publicationStatusId":160}}]}},"terms":{"field":"digitalObject.mediaTypeId","size":10}},"digitalobjects":{"query":{"term":{"hasDigitalObject":true}},"facet_filter":{"bool":{"must":[{"term":{"publicationStatusId":160}}]}}},"repos":{"facet_filter":{"bool":{"must":[{"term":{"publicationStatusId":160}}]}},"terms":{"field":"repository.id","size":10}},"places":{"facet_filter":{"bool":{"must":[{"term":{"publicationStatusId":160}}]}},"terms":{"field":"places.id","size":10}},"subjects":{"facet_filter":{"bool":{"must":[{"term":{"publicationStatusId":160}}]}},"terms":{"field":"subjects.id","size":10}},"creators":{"facet_filter":{"bool":{"must":[{"term":{"publicationStatusId":160}}]}},"terms":{"field":"creators.id","size":10}},"names":{"facet_filter":{"bool":{"must":[{"term":{"publicationStatusId":160}}]}},"terms":{"field":"names.id","size":10}}},"sort":[{"i18n.nl.title.untouched":"asc"}],"query":{"bool":{"must":[{"term":{"creators.id":"270274"}},{"match_all":{}}]}},"filter":{"bool":{"must":[{"term":{"publicationStatusId":160}}]}}}]]

at org.elasticsearch.search.SearchService.parseSource(SearchService.java:681)

at org.elasticsearch.search.SearchService.createContext(SearchService.java:537)

at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:509)

at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:264)

at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:231)

at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:228)

at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.elasticsearch.search.SearchParseException: [atom][3]: from[-1],size[10]: Parse Failure [No mapping found for [i18n.nl.title.untouched] in order to sort on]

at org.elasticsearch.search.sort.SortParseElement.addSortField(SortParseElement.java:210)

at org.elasticsearch.search.sort.SortParseElement.addCompoundSortField(SortParseElement.java:141)

at org.elasticsearch.search.sort.SortParseElement.parse(SortParseElement.java:86)

at org.elasticsearch.search.SearchService.parseSource(SearchService.java:665)

... 9 more

[2015-03-25 12:45:10,847][DEBUG][action.search.type ] [Snowfall] All shards failed for phase: [query]

Thank you for your help!

Margaret

Jesús García Crespo

unread,

Mar 25, 2015, 2:04:08 PM3/25/15

to ica-ato...@googlegroups.com

Hi Margaret,

It sounds like you need to tell Elasticsearch that the cluster has just one node. I would try with:

In your ES server, /etc/elasticsearch/elasticsearch.yml:

es.node.data=true
es.node.local=true
es.discovery.zen.ping.multicast.enabled=false
es.index.number_of_shards=1
es.index.number_of_replicas=0

In AtoM, either in config/search.yml or plugins/arElasticSearchPlugin/config/search.yml, make sure that the index configuration has also assigned only one shard and zero replicas and run "php symfony cc". It doesn't make sense to have more than one shard if you are not planning to distribute your index across multiple nodes. With that done and after rebuilding the index with search:populate I would expect the problem to be solved. Good luck!

PS: It's perfectly fine to have a cluster just with one single node, in particular in AtoM it's not a big deal at all because ES is not treated as a persistent store.

To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/8cb671cf-a2e5-4c09-90b0-3e3a83eda253%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Creighton Barrett

unread,

Mar 25, 2015, 4:01:30 PM3/25/15

to ica-ato...@googlegroups.com

Thanks so much, Jesús! That worked and the index has successfully been rebuilt.

To view this discussion on the web visit https://groups.google.com/d/msgid/ica-atom-users/CALavCbQuDCnd7%3DnNfMQrYTwv62n8_CJ1%2B5sXasLPPJf2jm9sdA%40mail.gmail.com.

Margaret Vail

unread,

Mar 26, 2015, 8:29:06 AM3/26/15

to ica-ato...@googlegroups.com

I just want to update on what I did that solved the problem - in case anyone else is experiencing something similar.

I followed Jesús' advice and re-checked our mysql error logs.

We were getting the following mysql error:

150324 12:05:38 InnoDB: ERROR: the age of the last checkpoint is 9433838,
InnoDB: which exceeds the log group capacity 9433498.
InnoDB: If you are using big BLOB or TEXT rows, you must set the
InnoDB: combined size of log files at least 10 times bigger than the
InnoDB: largest such row.

When I further investigated the error, I found this solution on stackexchange, http://dba.stackexchange.com/questions/16510/mysql-innodb-innodb-error-the-age-of-the-last-checkpoint-is-innodb-which-exc .

STEP 01) Change the following in /etc/my.cnf

[mysqld]
innodb_log_buffer_size          = 32M
innodb_buffer_pool_size         = 3G
innodb_log_file_size            = 768M

STEP 02) service mysql stop

STEP 03) rm -f /var/lib/mysql/ib_logfile*

STEP 04) service mysql start

This will rebuild the following files

/var/lib/mysql/ib_logfile0
/var/lib/mysql/ib_logfile1

After this change, I cleared the cache and attempted to re-build the index. This time it was successful!!

Thanks again for your help!!

Reply all

Reply to author

Forward