csv file has been converted to arff but can't be used

49 views
Skip to first unread message

Aran Joseph

unread,
May 11, 2020, 1:38:39 PM5/11/20
to python-weka-wrapper
Hello sir,

I've tried to convert the file to arff but there's an error. I also tried to use the csv file but it's still failed 


I have attached the file and still curious why this doesn't work. it's getting weird when I open the csv file using spreadsheet, tfidf turns out to be not decimal like if I open it in excel (https://docs.google.com/spreadsheets/d/1WSNCAbF1DPIs0A2DQ4mZhTocb8CXkxPJPD08ITKfH94/edit?usp=sharing)

What are causes of this error ? 

Thank you.
tfsmall.arff

Gichuhi Haron

unread,
May 11, 2020, 2:33:59 PM5/11/20
to python-weka-wrapper
Hello Aran,
A quick fix for your dataset was to delete the first column  that was duplicate for numbering format. Here are your two files. try opening them.
tfsmall.arff
tfsmall.csv

Peter Reutemann

unread,
May 11, 2020, 4:47:11 PM5/11/20
to python-weka-wrapper
> A quick fix for your dataset was to delete the first column that was duplicate for numbering format. Here are your two files. try opening them.

A few more comments on your data:
All columns in a CSV file must have a column header (1st row; used as
attribute names) and these column headers must be unique. Your CSV
file is missing a column header in the first column. Hence removing
the first column worked.
In your ARFF file, instead of using the column headers as attribute
names, you've pushed this into the first data row for some reason. Of
course, the "verified" value in the first data row is not a numeric
value (since you defined that attribute as numeric).

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Aran Joseph

unread,
May 14, 2020, 7:16:15 PM5/14/20
to python-weka-wrapper
Thank you sir! And I want to ask about data splitting. There's a missing argument in save_file about loading variable that wants to be saved into csv file. I have some questions:

- What argument should I put in save_file?
- What are differences between classification = False and classification = True (from experiments section) ?
- What's the function of preserve_order and should it always be used?

Aran Joseph

unread,
May 14, 2020, 7:16:24 PM5/14/20
to python-weka-wrapper
Thank you for helping !

Peter Reutemann

unread,
May 14, 2020, 7:40:59 PM5/14/20
to python-weka-wrapper
[Please don't post images, they disappear in plain text emails]

> Thank you sir! And I want to ask about data splitting. There's a missing argument in save_file about loading variable that wants to be saved into csv file. I have some questions:
>
>
> - What argument should I put in save_file?

The "result" parameter always expects a string pointing to an ARFF
file. If you want to have a CSV file, you need to convert that after
the experiment finished.

> - What are differences between classification = False and classification = True (from experiments section) ?

It depends whether your class attribute is numeric (= regression
problem, classification=False) or nominal (= classification problem,
classification=True).

> - What's the function of preserve_order and should it always be used?

If "true" no randomization of the data is performed before splitting
it. It depends on your dataset: if you have ordered data, you will
most likely want to have randomization turned on. In some cases,
people have dedicated train/test sets and by not randomizing the data
and finding the correct split percentage, they can act they still have
these two datasets, despite only providing the combined dataset.
Reply all
Reply to author
Forward
0 new messages