AttributeError: Attempting to block with an index predicate without indexing records

84 views
Skip to first unread message

Efrem Braun

unread,
Jul 17, 2019, 2:01:20 PM7/17/19
to open source deduplication
Hello,

I started using dedupe a few days ago, and so far I'm a big fan. Thanks for building this great program!

As mentioned on https://github.com/dedupeio/dedupe/issues/655, I came across an issue that I'd like some help debugging. I was running the example csv_example.py program but using my own larger dataset. I used a smaller version of my dataset (10,000 rows), trained a Dedupe object up to a certain point, and then saw how it performed. If I wanted to improve the accuracy, I'd want to train it more, so I'd delete the settings file, keep the training file, and add on matches/non-matches to the training file. This was working well until I began using a larger version of my dataset (100,000 rows). At first, I tried to keep the training file I had made with the smaller version of my dataset, but on the deduper.readTraining(f) step, I got thrown the error in the subject line of this email. I tried deleting the original training file, re-training on the larger dataset, and then repeating what I had done before (see how accurate it was, and then build on the training file to make it more accurate), but I kept running into the error.

Any idea this this might be? My traceback error is shown below.

Thanks!
-Efrem


Traceback (most recent call last):
File "/home/anaconda3/lib/python3.7/site-packages/dedupe/predicates.py", line 147, in __call__ doc_id = self.index._doc_to_id[doc]
AttributeError: 'NoneType' object has no attribute '_doc_to_id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "csv_example_mod.py", line 146 in <module>
  deduper.readTraining(f)
File "/home/anaconda3/lib/python3.7/site-packages/dedupe/api.py", line651, in readTraining
  self.markPairs(training_pairs)
File "/home/anaconda3/lib/python3.7/site-packages/dedupe/api.py", line 723 in markPairs
  self.active_learner.mark(examples, y)
File "/home/anaconda3/lib/python3.7/site-packages/dedupe/labeler.py", line 350, in mark
  def __len__(self):
File "/home/anaconda3/lib/python3.7/site-packages/dedupe/labeler.py", line 200, in fit_transform
  self._old_dupes = dupes
File "/home/anaconda3/lib/python3.7/site-packages/dedupe/training.py", line 31, in learn
  dupe_cover = Cover(self.blocker.predicates, matches)
File "/home/anaconda3/lib/python3.7/site-packages/dedupe/training.py", line 393, in __init
  self._cover(predicates, pairs)
File "/home/anaconda3/lib/python3.7/site-packages/dedupe/training.py", line 401, in _cover
  in enumerate(pairs)
File "/home/anaconda3/lib/python3.7/site-packages/dedupe/training.py", line 402, in <setcomp>
  if (set(predicate(record_1)) &
File "/home/anaconda3/lib/python3.7/site-packages/dedupe/predicates.py", line 149, in __call__
  raise AttributeError("Attempting to block with an index "
AttributeError: Attempting to block with an index predicate without indexing records
Reply all
Reply to author
Forward
0 new messages