Elasticsearch heap error and watermark exceeded

36 views
Skip to first unread message

LaAlice123456789

unread,
Jan 9, 2017, 5:33:03 AM1/9/17
to obiba-users
Hello,

when uploading large tables (roughly 1.5 million rows, roughly 7 columns) to Opal, I used to get a java heap error and the import was terminated before completion. So I assigned a minimum and maximum heap size of 3G. That did work, but now I get
2017-01-09 11:26:12,509 [elasticsearch[Clown][management][T#3]] INFO  org.elasticsearch.cluster.routing.allocation.decider - [Clown] low disk watermark [85%] exceeded on [8099xfmxQAuepGcwKLsq2w][Clown] free: 2gb[14.4%], replicas will not be assigned to this node
. Does this affect Opal or the data integrity in any way? I only have this one node, so the shards cannot be relocated. Also, I can't really figure out what takes up this much space, it can't be the data. Are there some large files that are not really needed and could be deleted?

I'm using the newest Opal version, postgre for the data and MongoDB for the identifier db.

Yannick Marcon

unread,
Jan 9, 2017, 5:54:47 AM1/9/17
to obiba...@googlegroups.com
Hi,

Elasticsearch is quite verbose about system metrics and in your case the messages are still at INFO level. It says that you should pay attention at you server disk usage.

Elasticsearch is used for two purposes in opal:
* data dictionnary (variable) search: the indexing is automatically triggered whenever a table is updated. In your case 7 variables is not much...
* data search: indexing should be triggered (or planned) by the user. Have you run the data indexer for this imported table? You should expect (depending on the data types) the ES index to be 10 times the original dataset size.

Elasticsearch is used for searching only, not for storing data. You could even scrap manually the ES index without affecting the opal data storage.

The data are stored in postgre/mongo. Are these databases running on the same server as opal? If this is the case, any application using all the resources (could be ES) would affect the integrity of the databases.

Hope this helps
Yannick


--
You received this message because you are subscribed to the Google Groups "obiba-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to obiba-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

LaAlice123456789

unread,
Jan 16, 2017, 4:58:53 AM1/16/17
to obiba-users
Hi,
I assigned more space to Opal, now the error is gone.
I also triggered the indexing and everything worked.
Why could ES affect the integrity of the data, only because it runs on the same server?
I also have another question: Is there a possibility to query only for certain data, for example only for data with entity_id x in table y in R, without loading the whole table and then working on the resulting data.frame?
To unsubscribe from this group and stop receiving emails from it, send an email to obiba-users...@googlegroups.com.

Yannick Marcon

unread,
Jan 16, 2017, 5:45:47 AM1/16/17
to obiba...@googlegroups.com
Hi,

Yes, generally speaking every application running on the same machine are concurrently using the resources of this server; if one application fills in the associated disk, it could have an unpredictable impact on the data integrity of the other applications.

Yes, all the opal content is accessible through web services. Unfortunately the opal R package does not expose a function that gets the values of an entity in a table. But you could write your own function if you wish and I can help if you are interested in that.

Regards
Yannick

To unsubscribe from this group and stop receiving emails from it, send an email to obiba-users+unsubscribe@googlegroups.com.

Larissa Pusch

unread,
Jan 17, 2017, 2:45:25 AM1/17/17
to obiba...@googlegroups.com
Hi,
ok, couldn't we do something like 
get_data <- function(datasource, variable(s), pattern, option) #where variable(s) are the columns, pattern is the requested pattern, option is like/not like/exact
{
if option == exact:
        try: if postgres or sql: select * from table  where variable = pattern
        catch: do the same with a mongo db query
}
 
Or isn't is possible to access the database directly? E.g., does it have to be a query that goes over opal, and if it does, how would I do that?
 
Regards
Larissa
 
Gesendet: Montag, 16. Januar 2017 um 11:45 Uhr
Von: "Yannick Marcon" <yannick...@obiba.org>
An: "obiba...@googlegroups.com" <obiba...@googlegroups.com>
Betreff: Re: Elasticsearch heap error and watermark exceeded
You received this message because you are subscribed to a topic in the Google Groups "obiba-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/obiba-users/PTSGIaOuH6I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to obiba-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages