How to analyze specific attributes of a data set

Dario Martinez

unread,

May 27, 2024, 4:38:19 PMMay 27

to python-weka-wrapper

Hello, I have a time series, from which through a mining process I extract some attributes, save them in a data set, and then analyze them with weka using classification algorithms, now I am going to include more attributes in that same set of data. data for another purpose, I would like to know within this total data set, how could I access a particular attribute, or some of them, and save them in another data set, to be able to analyze them as an independent time series?

Peter Reutemann

unread,

May 27, 2024, 4:47:52 PMMay 27

to python-we...@googlegroups.com

> Hello, I have a time series, from which through a mining process I extract some attributes, save them in a data set, and then analyze them with weka using classification algorithms, now I am going to include more attributes in that same set of data. data for another purpose, I would like to know within this total data set, how could I access a particular attribute, or some of them, and save them in another data set, to be able to analyze them as an independent time series?

In Weka, you typically apply filters to modify your dataset, either by
attributes/columns or instances/rows.
If you want to remove attributes, then you can use the
weka.filters.unsupervised.attribute.Remove filter. For removing
instances, you can use the
weka.filters.unsupervised.instance.RemoveRange filter.

The "subset" convenience method of the "Instances" Python class
applies the Remove/RemoveRange filters internally to generate the
requested subset:
https://fracpete.github.io/python-weka-wrapper3/weka.core.html?highlight=subset#weka.core.dataset.Instances.subset

Here are some examples of the "subset" method being applied:
https://github.com/fracpete/python-weka-wrapper3-examples/blob/32f24bc4f62079ee8265799250c8e7ec45271d98/src/wekaexamples/core/dataset.py#L94

Having said all that, you can always configure the FilteredClassifier
meta-classifier to work off the original dataset and simply apply the
appropriate filters to the data (like Remove). That way, you don't
have to regenerate intermediate datasets whenever the original dataset
changes, e.g., due to data being added.

Cheers, Peter
--
My Open Source Blog - http://open.fracpete.org

Dario Martinez

unread,

May 27, 2024, 5:21:01 PMMay 27

to python-we...@googlegroups.com

So I understand that to work separately with the new attributes of the data set, I would have to remove the rest of the attributes, and then with the attributes that remained, that is, with the new attributes perform the calculations, but in any case I need to be able to refer to a certain attribute of the resulting data set, to perform calculations and analyze certain elements of it, for example: Calculation = Attribute_A[0] - Attribute_A[10], that is, to be able to call any element of the time series of said attribute and perform my calculations, what I have inside the "[ ]", represents the position in the time series of the element of the attribute that I want to analyze, how could I achieve this?

--
You received this message because you are subscribed to the Google Groups "python-weka-wrapper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-weka-wra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python-weka-wrapper/CAHoQ12%2BxamZVtVmHttp1f1uksQE2Bb8%2B%2BwnazYjLQ4amqaB9wg%40mail.gmail.com.

Peter Reutemann

unread,

May 27, 2024, 7:17:40 PMMay 27

to python-we...@googlegroups.com

> So I understand that to work separately with the new attributes of the data set, I would have to remove the rest of the attributes, and then with the attributes that remained, that is, with the new attributes perform the calculations, but in any case I need to be able to refer to a certain attribute of the resulting data set, to perform calculations and analyze certain elements of it, for example: Calculation = Attribute_A[0] - Attribute_A[10], that is, to be able to call any element of the time series of said attribute and perform my calculations, what I have inside the "[ ]", represents the position in the time series of the element of the attribute that I want to analyze, how could I achieve this?

OK, I see what you're trying to do. If you are just trying to perform
some basic calculations, you might be able to just apply the
AddExpression filter to the data, without having to change/remove
anything:
https://weka.sourceforge.io/doc.dev/weka/filters/unsupervised/attribute/AddExpression.html

Alternatively, if the attributes are numeric, you could use the
"values(int)" method of the "Instances" class. This method will return
the internal values of the specified column (0-based index) as numpy
array. In case of numeric attributes, the internal values are the same
as the actual numeric values.

Cheers, Peter

Dario Martinez

unread,

May 27, 2024, 8:05:42 PMMay 27

to python-we...@googlegroups.com

If so, the data is numeric of type Double, and with the last thing you mention, I would no longer have to separate the data, I would only have to access the column that needs to be analyzed, with respect to the "values(int)" method, Could you give me an example, no matter how simple it may be, of how to use it, it is not clear to me how to specify the column, and the element within it, excuse my ignorance but I am very new to python, I am more of the C language, but I already know I am seeing a lot of use in the Python topic. Greetings

--
You received this message because you are subscribed to the Google Groups "python-weka-wrapper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-weka-wra...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/python-weka-wrapper/CAHoQ12KtsrX1nU7UxeE8uEy9zzmjLQLcEcHLwWNxZ8H4y_wRgg%40mail.gmail.com.

Peter Reutemann

unread,

May 27, 2024, 8:13:32 PMMay 27

to python-we...@googlegroups.com

> If so, the data is numeric of type Double, and with the last thing you mention, I would no longer have to separate the data, I would only have to access the column that needs to be analyzed, with respect to the "values(int)" method, Could you give me an example, no matter how simple it may be, of how to use it, it is not clear to me how to specify the column, and the element within it, excuse my ignorance but I am very new to python, I am more of the C language, but I already know I am seeing a lot of use in the Python topic. Greetings

Here is an example that retrieves 1st and 3rd column from the iris
dataset and outputs the results of multiplying their values:

import weka.core.jvm as jvm
from weka.core.converters import load_any_file

jvm.start()

data = load_any_file("/home/fracpete/development/datasets/uci/iris.arff",
class_index="last")
# 1st column/sepallength
col1 = data.values(0)
# 3rd column/petallength
col3 = data.values(2)

for i in range(len(col1)):
print(col1[i] * col3[i])

jvm.stop()

Cheers, Peter

Dario Martinez

unread,

May 27, 2024, 8:46:48 PMMay 27

to python-we...@googlegroups.com

Ok thank you very much, with this it is clearer to me

--
You received this message because you are subscribed to the Google Groups "python-weka-wrapper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-weka-wra...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/python-weka-wrapper/CAHoQ12L%3DW6r0aQ3t%2B6saG37fKRNS-%2BoiPbnkOixHyGU-0njVnQ%40mail.gmail.com.

Reply all

Reply to author

Forward