Hi Team,
Facing issue with deduper object state while serializing and desearializing
Trying to modify dedupe.console_label(deduper) behavior to API's using (uncertain_pairs() and mark_pairs()).
Steps I followed :
1) Initialized deduper object ( "deduper = dedupe.Dedupe(fields)")
2) Initialize training ( "deduper.prepare_training(data)" )
3) Serialize object as pickle file
4) Deserialize file back to deduper object
5) Get uncertain pairs from dedupe object ("deduper.uncertain_pairs()")
6) Mark labeled pairs to deduper object ("deduper.mark_pairs()")
I am creating Rest API's using fastapi framework,
Step 1 to 3 used in create 1 post request for Initializing Model
Step 4 to 6 Used in labelling pairs from end user
But while marking pairs, It fails with below stack :
NoIndexError
AssertionError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\dedupe\predicates.py in __call__(self, record, **kwargs)
201 try:
--> 202 assert self.index is not None
203 except AssertionError:
AssertionError:
During handling of the above exception, another exception occurred:
NoIndexError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\dedupe\api.py in mark_pairs(self, labeled_pairs)
1232 try:
-> 1233 self.active_learner.mark(examples, y)
1234 except dedupe.predicates.NoIndexError as e:
~\Anaconda3\lib\site-packages\dedupe\labeler.py in mark(self, pairs, y)
392
--> 393 self.fit_transform(self.pairs, self.y)
394
~\Anaconda3\lib\site-packages\dedupe\labeler.py in fit_transform(self, pairs, y)
397 for learner in self.learners:
--> 398 learner.fit_transform(pairs, y)
399
~\Anaconda3\lib\site-packages\dedupe\labeler.py in fit_transform(self, pairs, y)
163 if new_uncovered:
--> 164 self.current_predicates = self.block_learner.learn(dupes, recall=1.0)
165 self._cached_labels = None
~\Anaconda3\lib\site-packages\dedupe\training.py in learn(self, matches, recall, candidate_types)
41 comparison_cover = self.comparison_cover
---> 42 match_cover = self.cover(matches)
43
~\Anaconda3\lib\site-packages\dedupe\training.py in cover(self, pairs)
143 for predicate in self.blocker.predicates:
--> 144 coverage = frozenset(
145 i
~\Anaconda3\lib\site-packages\dedupe\training.py in <genexpr>(.0)
146 for i, (record_1, record_2) in enumerate(pairs)
--> 147 if (set(predicate(record_1)) & set(predicate(record_2, target=True)))
148 )
~\Anaconda3\lib\site-packages\dedupe\predicates.py in __call__(self, record, **kwargs)
203 except AssertionError:
--> 204 raise NoIndexError(
205 "Attempting to block with an index "
NoIndexError: Attempting to block with an index predicate without indexing records
During handling of the above exception, another exception occurred:
UserWarning Traceback (most recent call last)
<ipython-input-12-9f1956935b66> in <module>
----> 1 deduper.mark_pairs({'match': deduper.uncertain_pairs(), 'distinct':[]})
~\Anaconda3\lib\site-packages\dedupe\api.py in mark_pairs(self, labeled_pairs)
1233 self.active_learner.mark(examples, y)
1234 except dedupe.predicates.NoIndexError as e:
-> 1235 raise UserWarning(
1236 (
1237 "The record\n"
UserWarning: The record