Efficient way to loop through large numpy arrays with scikit-fuzzy

301 views
Skip to first unread message

Rebecca Rossi

unread,
Nov 10, 2016, 4:07:36 PM11/10/16
to scikit-fuzzy
I think you could help with a question involving scikit-fuzzy that I posted to Stack Overflow:


I have two input values from two datasets that are hundreds of thousands to millions of rows long that I am trying to feed into a scikit-fuzzy function to obtain a single output value. Eventually, I obtain an output numpy array that is the length of both the input datasets.

I'm trying to pinpoint why this script is taking hours to complete. I know looping through each value in two datasets and then performing the calculation is taking a long time, but I'm curious if any part of the scikit-fuzzy function is causing the processing to slow. For example, do the precision of the fuzzy membership functions and/or the precision of the input data increase or decrease the processing speed? So:

Is
    s['l'] = fuzz.trapmf(s.universe, [0, 0, 5, 7])

faster than 

    s['l'] = fuzz.trapmf(s.universe, [0.00000, 0.00000, 5.00569, 7.04568]

JDWarner

unread,
Nov 10, 2016, 11:08:27 PM11/10/16
to scikit-fuzzy
This year has been incredibly busy.  These forums or GitHub are the best place for assistance; sorry for the StackOverflow silence.

At first glance, StackOverflow has a good idea start with refactoring the setup out from the innermost loop, simply setting new inputs and calling the ControlSystemSimulation.  However, your largest gains will probably be from ensuring your universes are as sparse as possible.

As written, you have 

a = ctrl.Antecedent(np.arange(0, 70.1, 0.1), 'a')
b = ctrl.Antecedent(np.arange(0, 6.01, 0.01), 'b')
c = ctrl.Consequent(np.arange(0, 12.01, 0.01), 'c')

which are lengths 701, 601, and 1201, respectively.  Every calculation applies against these universes.

Importantly, as of Scikit-Fuzzy 0.2 there is no difference in results or accuracy for piecewise-linear membership functions if only the critical points are included.  Linear interpolation is used to fill in the gaps, so all you need are the anchor points; these need not be evenly spaced.  So, instead of the above, try

a = ctrl.Antecedent([0, 3, 6, 16, 24, 30, 45, 70], 'a')
b = ctrl.Antecedent([0, 0.01, 0.02, 0.03, 0.05, 0.1, 0.12, 6.], 'b')
c = ctrl.Consequent([0, 0.01, 0.02, 0.04, 0.05, 0.1, 0.2, 12], 'c')

which now have lengths of 8, 8, and 8.  

When I run this on my workstation, I see a single pass through the slimmed-down FIS function taking 1.7 ms with sparse universes, versus 1.7 ms with non-sparse universes.  So this should gain you an additional factor of at least 5.

Beyond that point, it's worth noting that every simulation run is independent.  If you have multiple cores, you could create separate ControlSystemSimulation objects for each core and start cranking; this should scale well.

There is likely additional opportunity for optimization in the code itself, but first try making your universes as sparse as possible.

Regards,

Josh

JDWarner

unread,
Nov 10, 2016, 11:10:41 PM11/10/16
to scikit-fuzzy
Correction:  1.7 ms for a single FIS call with sparse universes, versus 10 ms as written. 
Reply all
Reply to author
Forward
0 new messages