Too Many open files error

66 views
Skip to first unread message

Iman Mirzadeh

unread,
Sep 14, 2022, 12:01:22 PM9/14/22
to bleve
Hi,

Thanks for developing Bleve. I am currently using it as my search engine for my API. Basically, I'm having multiple crawlers parsing some RSS feeds occasionally, and if there's any new post, It'll be added to the search index.

However, I run into a weird issue which is probably due to a mistake in closing files. After a while, I run into the too many open files error and my process gets killed by OS. For I'll see thousands of open files and running lsof -p gives me:

COMMAND     PID USER   FD      TYPE   DEVICE  SIZE/OFF     NODE NAME
api     2496725 iman   50r      REG    252,1       6945  1290300 /api/bleve/62f4ae4f0bd2bed0ee3f2883/store/0000000982c1.zap
api     2496725 iman   51r      REG    252,1      29234  1290302 /api/bleve/62f4ae4f0bd2bed0ee3f2883/store/0000000982c4.zap
api     2496725 iman   52r      REG    252,1      30213  1290303 /api/bleve/62f4ae4f0bd2bed0ee3f2883/store/0000000982c7.zap
api     2496725 iman   53r      REG    252,1      28480  1290337 /api/bleve/62f4ae4f0bd2bed0ee3f2883/store/0000000982ca.zap
api     2496725 iman   54r      REG    252,1      28126  1290301 /api/bleve/62f4ae4f0bd2bed0ee3f2883/store/0000000982cc.zap
api     2496725 iman   55r      REG    252,1      33065  1290338 /api/bleve/62f4ae4f0bd2bed0ee3f2883/store/0000000982ce.zap
api     2496725 iman   56r      REG    252,1      54425  1290339 /api/bleve/62f4ae4f0bd2bed0ee3f2883/store/0000000982d0.zap
api     2496725 iman   57r      REG    252,1       5774  1290341 /api/bleve/62f4ae4f0bd2bed0ee3f2883/store/0000000982d2.zap
api     2496725 iman   58r      REG    252,1      52884  1290320 /api/bleve/62f4ae4f0bd2bed0ee3f2883/store/0000000982d4.zap
api     2496725 iman   59r      REG    252,1      15320  1290344 /api/bleve/62f4ae4f0bd2bed0ee3f2883/store/0000000982d7.zap
api     2496725 iman   60r      REG    252,1      22573  1290343 /api/bleve/62f4ae4f0bd2bed0ee3f2883/store/0000000982d9.zap
....

At first glance, it seems like there's a leak on my part and I fail to close the index. However, this issue happens after the long use of Bleve  (e.g., more than a few hours). The traffic is relatively the same during these hours and my code is the same, but for some reason, after a while, these open files never get closed, and eventually, my process gets killed. 

Any suggestions/ideas on why this is happening? 

----------- My code ------- 
Here's the code I'm using for indexing documents. I do not have any other function in my code that talks to Bleve besides these:


func GetOrCreateIndex(indexName string) (bleve.Index, error) {
    var idx bleve.Index
    exists, err := pathExists(indexName)
    if err != nil {
        return idx, err
    }
    indexMapping := bleve.NewIndexMapping()
    if !exists {
        idx, err = bleve.New(indexName, indexMapping)
    } else {
        idx, err = bleve.Open(indexName)
    }
    if err != nil {
        return idx, err
    }
    return idx, nil
}

// the other codes use this function directly
func IndexDocument(d SearchDocument) error {
    indexName := bleveIndexName(d.UserID)
    index, err := GetOrCreateIndex(indexName)
    defer index.Close()
    if err != nil {
        return err
    }
    // add to index
    err = index.Index(d.ID, d)
    if err != nil {
        return err
    }
    return nil
}

Abhi Dangeti

unread,
Sep 14, 2022, 12:20:07 PM9/14/22
to bleve
What version of bleve are you using?
We've had some fixes in the recent past that address file reference leaks that could cause the issue you've run into.

I'd urge you to pick up the latest version available if possible - v2.3.4.

Iman Mirzadeh

unread,
Jan 19, 2023, 7:03:53 PM1/19/23
to bleve
Hi,

Sorry that I missed your response. For some reason, after opening this discussion, I couldn't find it in the group.

At the time I submitted this question, I was using v2.3.4, but the issue is still there with v2.3.6. 

For my application, I have a cron job that indexes the incoming documents every once in a while in a bursting fashion. So for instance, I may get ~100-200 separate documents in less than a second, and when I want to index them, I run into too many open-file issues. 

Now, Is there a way to control the persister/merger using some options? If yes, is there a code example/documentation? 
I couldn't find anything except a few GH issues like this one:
* https://github.com/blevesearch/bleve/issues/1344
 
I don't care about the performance for now. All I want is to index many documents sequentially one by one (not in bulk mode). 

Thanks :)
Reply all
Reply to author
Forward
0 new messages