Recently I observed a strange behavior in the 'Preprocess' tab of ADAMS Investigator and Weka Explorer in comparison to the 'Select attributes' tab.
I uploaded the 'contact-lenses' dataset that has 5 attributes into ADAMS Investigator and Weka Explorer and applied the PrincipalComponents technique from the 'Preprocess' and 'Select attributes' (PrincipalComponents + Ranker; both in default settings) tabs.
Here are my questions:1- Why does the number of attributes increase after applying the filter when it is expected to be reduced?
2- Why are there different results between the Preprocess tab and the Select Attribute tab?
--
You received this message because you are subscribed to the Google Groups "The ADAMS Flow User mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theadamsflow-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/theadamsflow-user/b70848e0-f292-42ad-9ff0-f766b479ebfa%40waikato.ac.nz.
I have now the same result after indicating that 'No class' as an option from the 'Preprocess' tab and the 'varianceCovered' to 0.95 of PrincipalComponents with numToSelec to -1 (Ranker settings) from the 'Select attribute' tab. This seems to solve the problem. Following the same principle by reducing the variance to cover to 0.5, it yields a reduced number of features--this is generally the expected behavior behind PCA. I thought using the default settings of PrincipalComponents was straightforward to reduce data features, but it turns out that tweaking the filter is still needed (at least with some datasets). Am I, right?
Kindly do you have any justification behind getting and increased number of attributes with certain datasets, while with other datasets PrincipalComponents works nicely, as expected, on them even without highlighting No class is used from the Preprocess tab.
I have now the same result after indicating that 'No class' as an option from the 'Preprocess' tab and the 'varianceCovered' to 0.95 of PrincipalComponents with numToSelec to -1 (Ranker settings) from the 'Select attribute' tab. This seems to solve the problem. Following the same principle by reducing the variance to cover to 0.5, it yields a reduced number of features--this is generally the expected behavior behind PCA. I thought using the default settings of PrincipalComponents was straightforward to reduce data features, but it turns out that tweaking the filter is still needed (at least with some datasets). Am I, right?
Yes.
Kindly do you have any justification behind getting and increased number of attributes with certain datasets, while with other datasets PrincipalComponents works nicely, as expected, on them even without highlighting No class is used from the Preprocess tab.
It depends very much on the dataset. Try it with the iris or body datasets, there you will end up with fewer attributes. You tend to get a reduction in number of attributes when there are correlated attributes present.
Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, Hamilton, NZ
Mobile +64 22 190 2375
https://profiles.waikato.ac.nz/peter.reutemann
http://www.data-mining.co.nz/
--
You received this message because you are subscribed to the Google Groups "The ADAMS Flow User mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theadamsflow-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/theadamsflow-user/5d369724-0ec8-426b-b082-3e3a8ff19c3b%40waikato.ac.nz.