Novice and Layperson: help opening my .csv file into deducer

djschil...@gmail.com

unread,

Jul 15, 2016, 9:26:57 PM7/15/16

to Deducer

Hi there!

I am a complete layperson with coding, and a near-novice with statistics (I have only taken one stats. course for my B.S. in psych. and have some experience with SPSS).

I am having difficulty understanding how to import my .csv file to deducer. When I try, the preview looks pretty wonky, and the output reads with the following error:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :

line 2 did not have 249 elements

I have attached the file that I am trying to upload.

Thanks!

DSchil_Thesis_SurveyData copy.csv

Ian Fellows

unread,

Jul 15, 2016, 9:32:58 PM7/15/16

to ded...@googlegroups.com

Pretty funky header row you have in there. Works for me if you delete the first row from the data file.

Best,

Ian

--

---
You received this message because you are subscribed to the Google Groups "Deducer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deducer+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<DSchil_Thesis_SurveyData copy.csv>

Tom Hopper

unread,

Jul 16, 2016, 11:18:29 AM7/16/16

to ded...@googlegroups.com

Your problem is that long legal agreement in the first cell of the header row. Here’s the long explanation of what’s going on:

R (and therefore Deducer) expects data files like this to have a specific format:

* The data has to be in tabular format, with some separator between cells and and end-of-line separating rows.

* The separator can be a comma, semicolon, tab or whitespace. (There’s some additional flexibility here, but this is a good start.)

* Every row must have the same number of columns (i.e. the same number of separators).

* Every column must have the same number of rows.

* If there is a header, the first row of the data file contains the header.

* After the header row, every row in a given column must have the same data type. So if you have a mix of numbers and letters in a column, the whole column will be treated as characters (i.e. you can’t do math on the numbers).

You can learn more by reading the help in R ?read.csv and ?data.frame

File encoding is important, because R will default to reading data in as a particular file encoding that might not match your file, and this would cause errors when reading the file.

In addition, character strings should ideally be enclosed quotes. One quote at the start of the string, and one quote at the end of the string. If you want to include quotes in your string, they need to be escaped using a backslash, as “\".” See ?Quotes (capitalization matters).

Looking at your csv, the first row starts with

"<div style=""text-align:

and runs on through a whole bunch of legal mumbo jumbo. Until you get to the part that reads “,Age,State,” you’re in a single, very long, header cell. The cell contains quotes that have been escaped using double-quotes ("")and it contains non-ASCII characters (smart-quotes, at least), that seem to encoded using Mac OS Roman.

So to open this file correctly, you would need to tell R to use double-quotes to backslash-quotes and provide the correct file encoding (see ?file and ?iconv). Which means you have to load the file from the command line

my_df <- read.csv("DSchill_Thesis_SurveyData.csv", sep = ",", quote = """", fileEncoding = "macintosh")

rather than using the nice point-and-click interface in Deducer’s “Open Data” dialog. Note that I haven’t actually tried the above, so it may need some tweaking to get it working. The file encoding in particular need a change depending on the platform that you’re using.