Status: New
Owner: ----
Labels: Type-Defect Priority-Medium
New issue 18 by
caj...@gmail.com: CEDICT reading problem
http://code.google.com/p/cjklib/issues/detail?id=18
What steps will reproduce the problem?
1. import cjklib.dictionary
2. d =
cjklib.dictionary.CEDICT(databaseUrl='sqlite:////path/to/your/cedict.db')
3. d.getAll()
The method above should return all entries in CEDICT database. However, an
AttributeError exception is raised while applying format on this record:
卡拉OK|卡拉OK|ka3 la1 O K|/karaoke (loanword)/
The problem is, reading is not a standard Pinyin. Method
SingleColumnAdapter.format returns None therefore;
NonReadingEntityWhitespace.format raises the exception trying to call split
method on None type.
Problem exists in SVN trunk version (Rev: 446). I am using Ubuntu Linux
11.04.1 LTS
I suggest either fixing such records in installcjkdict script, or fix the
formatter of dictionary module to be able handle such records. My hotfix:
(line 126):
def format(self, string):
toReading = self.toReading or self.fromReading
try:
return self._readingFactory.convert(string, self.fromReading,
toReading, sourceOptions=self.sourceOptions,
targetOptions=self.targetOptions)
except (exception.DecompositionError, exception.CompositionError,
exception.ConversionError):
# wighack
return string
#return None