OpenRefine does not bring in all rows of CSV file

602 views
Skip to first unread message

Karl Stutzman

unread,
Mar 28, 2014, 5:36:57 PM3/28/14
to openr...@googlegroups.com
I am a newbie. I have tried to read as much documentation as possible, but it's entirely possible I'm missing something obvious.

I am now using OpenRefine 2.6.

File A is a csv file with 8043 lines. When I bring it into OpenRefine it only has 7006 rows. I have made sure blank lines are eliminated in the file using Notepad++. I have saved it in as a text file and with various encodings (although probably didn't exhaust all possibilities).

File B is a fixed-width field text file with 120176 lines. When I bring it into OpenRefine it has 120175 rows (the first line being the header obviously). So I know it should work :)

I previously tried Google Refine 2.5. I was able to create a project with File A in version 2.5 but it was corrupted and I couldn't open it. I think there must be something wrong with the file.

Thanks for any tips.

Thad Guidry

unread,
Mar 28, 2014, 7:54:22 PM3/28/14
to openr...@googlegroups.com
Try loading it in as Line Based Importer....rather than CSV ?


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Owen Stephens

unread,
Mar 28, 2014, 8:32:38 PM3/28/14
to openr...@googlegroups.com
Just in case (as it has caught me out before) - check that the number you are seeing when you have brought the file into Refine is 'rows' not 'records' - the latter can be less than the number of rows in the file and in this case switching to the 'rows' view would give the full count

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: ow...@ostephens.com
Telephone: 0121 288 6936

Karl Stutzman

unread,
Mar 31, 2014, 9:35:04 AM3/31/14
to openr...@googlegroups.com
Thanks for your help.

Good news: Line based importer brings in all my rows. 
Bad news: The CSV file escapes commas that are internal within the field using quotation marks. The command in Open Refine, Edit Column-->Split into several columns does not pay attention to the comma escaping with quotation marks when splitting by comma. There is probably a regular expression way to split this and I'm just not adequately caffeinated to figure it out.

My workaround for the time being, is to open original CSV file in MS Excel (the file did not originate in Excel), save as tab-delimited text, then import to OpenRefine. I would be open to more elegant solutions.
Karl Stutzman
Anabaptist Mennonite Biblical Seminary Library

Tom Morris

unread,
Mar 31, 2014, 10:03:40 AM3/31/14
to openr...@googlegroups.com
Did you try Owen's suggestion of switching between record mode (the default) and row mode to make sure you were seeing all the rows that were imported?  If the first column isn't fully populated, Refine's default behavior is to interpret the indented rows as part of a pseudo-record.

If you're going to go through Excel as an intermediary, you can just export in Excel format which Refine reads directly and you wont need to worry about splitting lines after import.

Tom
Reply all
Reply to author
Forward
0 new messages