dedupe.console_label to API

33 views
Skip to first unread message

Satish Patil

unread,
Jun 22, 2022, 3:04:19 AM6/22/22
to open source deduplication
Hi Team,

Facing issue with deduper object state while serializing and desearializing

Trying to modify dedupe.console_label(deduper) behavior to API's using (uncertain_pairs() and mark_pairs()).

Steps I followed :
1) Initialized deduper object ( "deduper = dedupe.Dedupe(fields)")
2) Initialize training ( "deduper.prepare_training(data)" )
3) Serialize object as pickle file

4) Deserialize file back to deduper object
5) Get uncertain pairs from dedupe object ("deduper.uncertain_pairs()")
6) Mark labeled pairs to deduper object ("deduper.mark_pairs()") 

I am creating Rest API's using fastapi framework, 
Step 1 to 3 used in create 1 post request for Initializing Model
Step 4 to 6 Used in labelling pairs from end user

But while marking pairs, It fails with below stack :

NoIndexError

AssertionError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\dedupe\predicates.py in __call__(self, record, **kwargs) 201 try: --> 202 assert self.index is not None 203 except AssertionError: AssertionError: During handling of the above exception, another exception occurred: NoIndexError Traceback (most recent call last) ~\Anaconda3\lib\site-packages\dedupe\api.py in mark_pairs(self, labeled_pairs) 1232 try: -> 1233 self.active_learner.mark(examples, y) 1234 except dedupe.predicates.NoIndexError as e: ~\Anaconda3\lib\site-packages\dedupe\labeler.py in mark(self, pairs, y) 392 --> 393 self.fit_transform(self.pairs, self.y) 394 ~\Anaconda3\lib\site-packages\dedupe\labeler.py in fit_transform(self, pairs, y) 397 for learner in self.learners: --> 398 learner.fit_transform(pairs, y) 399 ~\Anaconda3\lib\site-packages\dedupe\labeler.py in fit_transform(self, pairs, y) 163 if new_uncovered: --> 164 self.current_predicates = self.block_learner.learn(dupes, recall=1.0) 165 self._cached_labels = None ~\Anaconda3\lib\site-packages\dedupe\training.py in learn(self, matches, recall, candidate_types) 41 comparison_cover = self.comparison_cover ---> 42 match_cover = self.cover(matches) 43 ~\Anaconda3\lib\site-packages\dedupe\training.py in cover(self, pairs) 143 for predicate in self.blocker.predicates: --> 144 coverage = frozenset( 145 i ~\Anaconda3\lib\site-packages\dedupe\training.py in <genexpr>(.0) 146 for i, (record_1, record_2) in enumerate(pairs) --> 147 if (set(predicate(record_1)) & set(predicate(record_2, target=True))) 148 ) ~\Anaconda3\lib\site-packages\dedupe\predicates.py in __call__(self, record, **kwargs) 203 except AssertionError: --> 204 raise NoIndexError( 205 "Attempting to block with an index " NoIndexError: Attempting to block with an index predicate without indexing records During handling of the above exception, another exception occurred: UserWarning Traceback (most recent call last) <ipython-input-12-9f1956935b66> in <module> ----> 1 deduper.mark_pairs({'match': deduper.uncertain_pairs(), 'distinct':[]}) ~\Anaconda3\lib\site-packages\dedupe\api.py in mark_pairs(self, labeled_pairs) 1233 self.active_learner.mark(examples, y) 1234 except dedupe.predicates.NoIndexError as e: -> 1235 raise UserWarning( 1236 ( 1237 "The record\n" UserWarning: The record


Reply all
Reply to author
Forward
0 new messages