READ_CSV issues

130 views
Skip to first unread message

KH

unread,
Mar 18, 2019, 4:11:50 PM3/18/19
to idl-pvwave
Hello,

I have been noticing a few issues when trying to use READ_CSV.  The first occurs when there are several blank cells in the first row of the csv file.  For example, if you read the attached example file (created in Excel) with 16 columns, the program only finds 15 of the data columns, but does identify all 16 tags.  The second issue with with the tag names.  I've noticed in more than one csv file created in Excel that the string length of the first tag name contains 3 extra characters that are converted to underscores by IDL_VALIDNAME.  I've provided some simple code that highlights the issues I am seeing.  Any insight, advice or suggested work arounds would be greatly appreciated.

Thanks,
Kim


  F = 'TEST.csv'
  CDATA = READ_CSV(F, N_TABLE_HEADER=1, TABLE_HEADER=TAGS, COUNT=COUNT, MISSING_VALUE=-99999) ; Read the file

  TAGS = STRSPLIT(TAGS,',',/EXTRACT)
  HELP, CDATA
  
  PRINT, 'Number of FIELDS = ' + STRING(N_TAGS(CDATA))
  PRINT, 'Number of TAGS = ' + STRING(N_ELEMENTS(TAGS))
  
  PRINT, TAGS 
  PRINT, IDL_VALIDNAME(STRCOMPRESS(TAGS),/CONVERT_ALL)
  PRINT, STRLEN(TAGS(0))
  PRINT, STRMID(TAGS(0),0,2)
TEST.csv

markus.sc...@gmail.com

unread,
Mar 19, 2019, 12:53:36 PM3/19/19
to idl-pvwave
Hi Kim,

I cannot replicate your problem, because after your second line I get:

% READ_CSV: Illegal subscript range: XDATA.

Reading the docs it seems to me you mixed up HEADER & TABLE_HEADER.
With the following the read-in works for me:

CDATA = READ_CSV(F, HEADER=TAGS,MISSING_VALUE=-99999)

After that you don't have to apply STRSPLIT to TAGS.

This way I don't get the problem with the first tag.

As to the non-printable characters in the first tag in your version, use

print, byte(tags[0])

and then look up in an extended ASCII table.

If you cannot get READ_CSV to work, this should do:

nlines=file_lines(F)
lines=strarr(nlines)
openr,lun,F,/get_lun
readf,lun,lines
free_lun,lun
cstrings=(strsplit(lines,/extract,',',/preserve_null)).toArray()
tags=cstrings[0,*]
tags_valid=IDL_VALIDNAME(STRCOMPRESS(TAGS),/CONVERT_ALL)
defaults=list('','',0l,0l,0l,0d,0l,0d,0l,0l,0l,'','','',0l,0l)
oha=orderedhash(tags_valid,defaults.map(lambda('x:replicate(x,'+string(nlines-1)+')')))
cdata=oha.toStruct()
for i=0,n_elements(tags)-1 do cdata.(i)=cstrings[1:-1,i]
isString=defaults.map(lambda('x:isa(x,/string)'))
foreach w,isString.where(0) do cdata.(w)[where(cstrings[1:-1,w] eq '',/null)]=-99999

The only drawback of this code is, that you have to set the defaults manually.

Good luck, Markus

PS: Given that your IDL behaves differently than mine, here my version:

IDL> !version
{
    "ARCH": "x86_64",
    "OS": "linux",
    "OS_FAMILY": "unix",
    "OS_NAME": "linux",
    "RELEASE": "8.5",
    "BUILD_DATE": "Jul  7 2015",
    "MEMORY_BITS": 64,
    "FILE_OFFSET_BITS": 64
}

Ben C

unread,
Mar 19, 2019, 3:56:00 PM3/19/19
to idl-pvwave
Markus's suggestion does resolve the issue. 

BTW, Markus is experiencing a READ_CSV bug that was only present in IDL 8.5.x. It won't happen in any other versions (% READ_CSV: Illegal subscript range: XDATA.)


KH

unread,
Mar 19, 2019, 5:19:41 PM3/19/19
to idl-pvwave
Thank you for the feedback.  Yes, the issue was I was using TABLE_HEADER instead of HEADER for the TAGS variable.  I also discovered that if the csv has all text strings, that no tags will be returned and instead will be placed into the first row of the output data.  Now that I have a better understanding of how it works, I think I can make it work for my purpose.

On a side note, one other frustrating part I discovered was that the default missing value for any numeric field is 0, which can easily result in significant errors if the user is not aware of this "feature".  Using -99999 seems to be working, but I have tested it with all data types.

Best regards,
Kim
Reply all
Reply to author
Forward
0 new messages