Split dataset for each rule, drill down from category to product

Zheng Tzer Lee

unread,

Jun 20, 2019, 11:54:02 AM6/20/19

to mlxtend

I am facing issue with too many products, thus running using 'product category' instead of 'product' itself. Yet, using product category come out with rules which unable to implement, as product category will cover wide range of product which make no sense. Therefore my question: If I first use mlxtend on categories, then come out with 10 rules, first rule = 'Hair Shampoo, Conditional ==> Body Shampoo' Then, I prepare a new dataset with only [Hair Shampoo, Conditional, Body Shampoo] Run again with ONLY [Hair Shampoo, Conditional, Body Shampoo] datasets, for association rule, a datasets for each rule. Is my approach on the right path to obtain a detail workable association rule on product? if wrong, could any of you guide me on which part did i get wrong and how to get a workable rule on large products?

thanks!!!

Sebastian Raschka

unread,

Jun 21, 2019, 1:15:42 PM6/21/19

to Zheng Tzer Lee, mlxtend

Hi there,

I am not sure what the difference between product and product category is.

Is "product" something like "Shampoo from Brand X" whereas product category is more general, like "Shampoo"? Or is it even broader?

I think the general approach, i.e.,

1) creating a dataset of product categories
2) getting rules for these product categories
3) constructing a new dataset, which is taking the product categories into fine-grained products
4) running association rule mining on the dataset from 3)

would generally make sense.

> I am facing issue with too many products,

Btw what is the particular issue with too many products? Is it that it's too slow/too big to fit into memory or is it that the results are not good (no strong rules)?

Best,
Sebastian

> --
> You received this message because you are subscribed to the Google Groups "mlxtend" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mlxtend+u...@googlegroups.com.
> To post to this group, send email to mlx...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/mlxtend/4a5dd419-6b63-4d7a-a7ab-b36552ef85b2%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Sebastian Raschka

unread,

Jun 21, 2019, 3:05:30 PM6/21/19

to Zheng Tzer Lee, mlxtend

Oh I see, so it's probably running out of memory or so. Have you tried the fpgrowth algorithm instead of apriori in mlxtend? It's much more efficient for large datasets and gives the same results. It's currently only available from the master-dev branch, but you can install it via pip already:

pip install git+git://github.com/rasbt/mlxtend.git

Best,
Sebastian

> On Jun 21, 2019, at 12:42 PM, Zheng Tzer Lee <zhen...@gmail.com> wrote:
>
> hi Sebastian,
>
> I tried with [products] in kaggle, keep on 'restart kernel' when reaching for the rules.
> it work fine with [category]
>
> I post it here to see if anyone tried the approach in solving large datasets, because its just my guessing
> is this the correct way? or any better solution?
>
> thanks

Reply all

Reply to author

Forward