Comparing a .csv file with a .arff file

Riju Singhal

unread,

Mar 3, 2014, 12:11:58 AM3/3/14

to wekamooc...@googlegroups.com

Hi!

I performed a simple test. I opened the file weather.nominal through weka. I clicked on edit, and selected all the data in the viewer that opened, and copied it. Then I opened an excel sheet and pasted the copied data. I saved it as a .csv with a different file name. When I opened this new file through weka, it did not generate exactly the same result as the .arff. The attributes seemed to have their order reversed. In the attached picture, the graphs on the left is through the .arff file provided with weka, and the one on the right is the .csv that I made. Can anyone please tell why this difference appeared and how to sort it out? I'm using a Window's 8, 64bit.

Thanks,

Riju Singhal.

Dot csv vs Dot arff.tif

Alan Wikid

unread,

Mar 3, 2014, 10:18:05 PM3/3/14

to wekamooc...@googlegroups.com

Hello Riju!

The difference is only the colour.

If class of your first instance (first line), of your csv file was "yes", then the color will be the same. Try to change your first instance to other instance whose class is "yes"[open csv file in a text editor]. Although, in this case, the color is representative and not denote a real problem.

Ps.:

In ARRF file, the color order is sorted by this line: @attribute play {yes, no} . Of course, this depend on the chose attribute that will also be used as the class attribute.
In CSV file, the class of the first instance["Yes" or "No"] takes the first color.

Best Regards

Alan

--
You received this message because you are subscribed to the Google Groups "WekaMOOC-general" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wekamooc-gener...@googlegroups.com.
To post to this group, send an email to wekamooc...@googlegroups.com.
Visit this group at http://groups.google.com/group/wekamooc-general.
To view this discussion on the web, visit https://groups.google.com/d/msgid/wekamooc-general/4f7fd289-24af-4036-a73e-3bf6325179c9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Bhoomin Pandya

unread,

Mar 3, 2014, 10:20:49 PM3/3/14

to wekamooc...@googlegroups.com

Hi Raju:

I checked your observation is true. I do not think it is because of Windows 8. Its got more to do with the difference in the order in which the .arff file is written & the order of values in the .csv as read by WEKA.

Leo Fernandez

unread,

Mar 3, 2014, 11:36:59 PM3/3/14

to wekamooc...@googlegroups.com

As Alan has observed, the difference is in the colour assignment according to the order of 'yes', 'no' in the line @attribute play {yes, no}

To check, after opening your .csv file in weka, save it as a .arff file and them compare the original weather.nominal.arff with the newly created .arff file. You will see the differences in the way the @attribute lines are assigned.

The result is the same only the colour assignment is different.

Best,

Leo
Using WEKA on Ubuntu 12.04

brooke.herbert

unread,

Mar 4, 2014, 5:36:22 PM3/4/14

to wekamooc...@googlegroups.com

Hi Riju,

When Weka reads a .csv file, it takes the first row as the attribute names.

To make the .csv file represent the original dataset, you should delete the first column (instance numbers) and insert a new first row whose cells contain the values “outlook”, “temperature”, “humidity”, “windy”, and “play”, before loading it into Weka.

I tested this out myself with just copy and paste, and the issue was resolved by opening the csv in a simple text editor, and making these changes. Then weka reads this file just the same as the .arff.

Let us know how you get on!

Kind regards,
Brooke

james....@unb.ca

unread,

Mar 14, 2014, 12:59:02 PM3/14/14

to wekamooc...@googlegroups.com

This looks to be a bigger problem when you are testing using a 'supplied test set'. If, for example, your test file does not read in the attribute values in the same order, WEKA will complain about compatibility. Not a huge issue with a 'yes/no' attribute where you can easily line up the first rows of the train and test file to match, but what if we have an attribute containing state? It would be unreasonable to have to line up the first 51 instances. Advice welcome!