Hi, I am pretty new to whoosh. I was trying to index some documents in
a directory. When I do commit() I get an IOError. I am displaying some
extra outputs too to make it readable.
Enter the index directory name:: myindex
enter the source of documents:: /home/emil/workspace/python/project/
index/documents
indexing....
ALLAboutSpyware.txt
ConvertingMoviesToPspFormat.txt
BacktrackingEMAILMessages.txt
BitTorrentTutorials.txt
ABasicGuidetotheInternet.txt
Traceback (most recent call last):
File "index.py", line 31, in <module>
writer.commit()
File "/usr/local/lib/python2.7/dist-packages/whoosh/filedb/
filewriting.py", line 534, in commit
self.generation, self.segment_number, new_segments)
File "/usr/local/lib/python2.7/dist-packages/whoosh/filedb/
fileindex.py", line 99, in _write_toc
stream = storage.create_file(tempfilename)
File "/usr/local/lib/python2.7/dist-packages/whoosh/filedb/
filestore.py", line 79, in create_file
fileobj = open(path, mode)
IOError: [Errno 2] No such file or directory: 'myindex/_MAIN_1.toc.
1331625897.72'
This is my code.
from whoosh.fields import Schema, TEXT, KEYWORD, ID, STORED
from whoosh.analysis import StemmingAnalyzer
from whoosh import index
import os,os.path
import codecs
schema = Schema(path=ID(unique=True,stored=True),content=TEXT)
dir = raw_input('Enter the index directory name:: ')
#print dir
if not os.path.exists(dir):
print 'creating dir', dir, '...'
os.mkdir(dir)
myindex = index.create_in(dir,schema)
writer = myindex.writer()
doc_source_path = str(raw_input('enter the source of documents:: '))
#os.chdir("/home/emil/workspace/python/project/index/documents")
os.chdir(doc_source_path)
print 'indexing....'
for file in os.listdir("."):
#print file
filename = "/home/emil/workspace/python/project/index/documents/" +
str(file)
fileobj=open(filename,'rb')
text=fileobj.read()
#f = codecs.open(filename, 'r', encoding='utf-8')
#body = f.read()
#print body
print unicode(file)
writer.add_document(path=unicode(filename),content=unicode(text))
writer.commit()
Can you guys please tell me what is the reason for this?
On Tue, 13 Mar 2012, emil wrote: > Hi, I am pretty new to whoosh. I was trying to index some documents in a > directory. When I do commit() I get an IOError. I am displaying some > extra outputs too to make it readable.
> Enter the index directory name:: myindex > enter the source of documents:: /home/emil/workspace/python/project/ > index/documents
You supply a relative path to the index directory; and then you chdir(), which makes that path invalid:
> os.chdir(doc_source_path)
If you have to allow entering a relative path on the command line, use os.path.join to convert it to an absolute path before handing it to Whoosh.
On Tue, Mar 13, 2012 at 7:47 PM, Chris Wilson <ch...@aptivate.org> wrote: > Hi Emil,
> On Tue, 13 Mar 2012, emil wrote:
> Hi, I am pretty new to whoosh. I was trying to index some documents in a >> directory. When I do commit() I get an IOError. I am displaying some extra >> outputs too to make it readable.
>> Enter the index directory name:: myindex >> enter the source of documents:: /home/emil/workspace/python/**project/ >> index/documents
> You supply a relative path to the index directory; and then you chdir(), > which makes that path invalid:
> os.chdir(doc_source_path)
> If you have to allow entering a relative path on the command line, use > os.path.join to convert it to an absolute path before handing it to Whoosh.
> Aptivate is a not-for-profit company registered in England and Wales > with company number 04980791.
> -- > You received this message because you are subscribed to the Google Groups > "Whoosh" group. > To post to this group, send email to whoosh@googlegroups.com. > To unsubscribe from this group, send email to whoosh+unsubscribe@** > googlegroups.com <whoosh%2Bunsubscribe@googlegroups.com>. > For more options, visit this group at http://groups.google.com/** > group/whoosh?hl=en <http://groups.google.com/group/whoosh?hl=en>.
On Mar 13, 6:17 am, Chris Wilson <ch...@aptivate.org> wrote:
> If you have to allow entering a relative path on the command line, use
> os.path.join to convert it to an absolute path before handing it to
> Whoosh.
The other option is to simply call os.path.abspath() on the input path
to get an absolute path .... obviously do this before calling
chdir ;-).
I have a (very) small demo app at http://code.google.com/p/pyopensearch/ which may be worth checking out as it covers indexing and searching as
a complete working example (don't be put off by the JSON stuff, you
can ignore it). I try and keep it small so that it is easy to grok. It
doesn't attempt to cover updating the index.