[Dspace-tech] Too many Open Files, DSpace 1.3.2

8 views
Skip to first unread message

Philip Wong

unread,
Aug 24, 2015, 3:53:16 PM8/24/15
to dspac...@lists.sourceforge.net, lbphilip
We are running 1.3.2. There are frequent "Too many open files" errors happened
lately. It usually happened after we saved an abstract field. There was no
error during save. But when we tried to search the record again, the file
error appeared. And it would be found out that the abstract field was not
inserted successfully.

Checking the list it seems that there can be different causes to "Too many
open files":

  1. hit by harvesters
  2. unclosed file stream used in thumbnail
  3. large index file size (> 6 MB?) and running out of file handles

For our site it is unlikely be 1, as the robots.txt is added and there is no
trace of robots in the Tomcat access logs. It should not be 2 as we are not
generating thumbnails. Can it be 3? We are not doing full-text

indexing, however we have inserted many abstracts lately when building the
E-theses collection.

We have
search.maxfieldlength = 10000

That should be enough for abstract which is usually several paragraphs. The
total number of items in our dspace is less than 3000. For the index files,
the file sizes in /dspace/search are:

   33030091 Jul  5 16:17 _2tb.cfs
        419 Jul 12 15:59 _2tb.del
    1104996 Jul 12 12:29 _2z6.cfs
     125381 Jul 12 14:27 _2zq.cfs
     122881 Jul 12 14:51 _30a.cfs
     156762 Jul 12 15:09 _30u.cfs
     102205 Jul 12 15:33 _31e.cfs
      65357 Jul 12 15:40 _31y.cfs
      67755 Jul 12 15:59 _32i.cfs
      7666 Jul 12 15:59 _32k.cfs
         4 Jul 12 15:59 deletable
       101 Jul 12 15:59 segments

Is this too big? How can one avoid not exceeding the 6 MB limit, if there is
really a limit?

When the file error occurred, our catalina log recorded the following message:

2006-07-12 16:00:21 StandardWrapperValve[edit-item]: Servlet.service() for
servlet edit-item threw exception
java.io.FileNotFoundException: /dspace/search/_32i.cfs (Too many open files)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
        at
org.apache.lucene.store.FSInputStream$Descriptor.<init>(FSDirectory.java:376)
        at org.apache.lucene.store.FSInputStream.<init>(FSDirectory.java:405)
        at org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java:268)
        at
org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:63)
        at
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:104)
        at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:94)
        at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:122)
        at org.apache.lucene.store.Lock$With.run(Lock.java:109)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
        at org.dspace.search.DSIndexer.unIndexContent(DSIndexer.java:134)
        at org.dspace.search.DSIndexer.unIndexContent(DSIndexer.java:127)
        at org.dspace.search.DSIndexer.reIndexContent(DSIndexer.java:168)
        at
org.dspace.app.webui.servlet.admin.EditItemServlet.processUpdateItem(EditItemS
ervlet.java:565)

Any hints to this problem? Thank you.

Philip Wong
Library Systems
CityU of Hong Kong

Mark Diggory

unread,
Aug 24, 2015, 3:53:17 PM8/24/15
to Philip Wong, dspac...@lists.sourceforge.net
Hello Philip,

On Jul 12, 2006, at 7:00 AM, Philip Wong wrote:

We are running 1.3.2. There are frequent "Too many open files" errors happened
lately. It usually happened after we saved an abstract field. There was no
error during save. But when we tried to search the record again, the file
error appeared. And it would be found out that the abstract field was not
inserted successfully.

Checking the list it seems that there can be different causes to "Too many
open files":

  1. hit by harvesters
  2. unclosed file stream used in thumbnail
  3. large index file size (> 6 MB?) and running out of file handles

For our site it is unlikely be 1, as the robots.txt is added and there is no
trace of robots in the Tomcat access logs. It should not be 2 as we are not
generating thumbnails. Can it be 3? We are not doing full-text

indexing, however we have inserted many abstracts lately when building the
E-theses collection.

We have
search.maxfieldlength = 10000

That should be enough for abstract which is usually several paragraphs.

Note, that this variable describes the max number of "terms" that can occur in an index, if it is absent from the config then it defaults to Integer.MAX_VALUE. The value is not the character length of your abstract, but the frequency of words in your abstracts at large.

The
total number of items in our dspace is less than 3000. For the index files,
the file sizes in /dspace/search are:

   33030091 Jul  5 16:17 _2tb.cfs
        419 Jul 12 15:59 _2tb.del
    1104996 Jul 12 12:29 _2z6.cfs
     125381 Jul 12 14:27 _2zq.cfs
     122881 Jul 12 14:51 _30a.cfs
     156762 Jul 12 15:09 _30u.cfs
     102205 Jul 12 15:33 _31e.cfs
      65357 Jul 12 15:40 _31y.cfs
      67755 Jul 12 15:59 _32i.cfs
      7666 Jul 12 15:59 _32k.cfs
         4 Jul 12 15:59 deletable
       101 Jul 12 15:59 segments

Is this too big? How can one avoid not exceeding the 6 MB limit, if there is
really a limit?

I'm unsure what limit you are referring to. Can you elaborate?

I would suspect its more a matter of when your index was last optimized. This would reduce the number of segments used to represent the index and thus also optimize the number of open file handles. Are you running index-all on a regular basis? A simple test would be for you to rerun "index-all" if you haven't lately.

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
_______________________________________________
DSpace-tech mailing list

Mark R. Diggory
~~~~~~~~~~~~~
DSpace Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology


Philip Wong

unread,
Aug 24, 2015, 3:54:06 PM8/24/15
to dspac...@lists.sourceforge.net, lbphilip
Hi Mark,
 
Thanks for your reply.
 
We have reset
> search.maxfieldlength = 10000
to 20000
 
And as advised, we run cronjob to index-all daily. However, after adding about 300+ abstracts (within several days), we ran into the "Too many open files" error again.
 
We are running Linux Enterprise 3.0. After the error occurred, I used "lsof" to check the open files and found that there were 991 lines of the following:
 
     jsvc       998   dspace   25r   REG        3,3 35728995  12157167 /dspace/search/_2tb.cfs (deleted)
 
Out of 991, 482 do not have the "(deleted)" word at the end:
 
     jsvc       998   dspace 1006r   REG        3,3 36988949  12157125 /dspace/search/_2tb.cfs
Were these opened files? Why weren't they closed?
 
After restarting Tomcat, these "search" files were gone.
 
>> I'm unsure what limit you are referring to. Can you elaborate?
 
I was referring to the exchanges posted by Jim Downing and MacKenzie Smith back in 26th and 27th of August 2004. They talked about the 6MB index file limit and solutions. The subject was "Lucene compound file format". But since there were 991 open files, it seems our problem is not related to the 6MB index limit size.
 
I wonder if other sites also have this problem.
 
Thanks,
Philip
CityU of HK Library
 
> Date: Wed, 12 Jul 2006 10:07:33 -0400
> From: Mark Diggory <mdig...@MIT.EDU>
> Subject: Re: [Dspace-tech] Too many Open Files, DSpace 1.3.2
> To: Philip Wong <lbph...@cityu.edu.hk>
> Cc: dspac...@lists.sourceforge.net
> Message-ID: <D4DDA6B0-791B-402E...@mit.edu>
> Content-Type: text/plain; charset="us-ascii"
> ...
> ...
Reply all
Reply to author
Forward
0 new messages