importing and combining several files

1,764 views
Skip to first unread message

peter.m...@hmri.org.au

unread,
Apr 20, 2017, 9:16:24 PM4/20/17
to OpenRefine
Hi all new to OpenRefine

have imported 3 identically formated CSV files with data for 3 consecutive time periods,

OR imports these in 3 groups of colums with a seperate record for each.

how would I merge the data so that data with the same column identifier is lined up (in excel you would highlight all the blank cells, delete and move the block of data to the left to align it)

thanks

Peter


IDCJDW2124-201702-csv.openrefine.tar.gz

Ettore Rizza

unread,
Apr 21, 2017, 5:03:58 AM4/21/17
to OpenRefine
Hi Peter,

Correct me if I'm wrong. So, you have a csv like this :

ID,A,B,C
1,value1, value2, value3
2,value4, value5, value6
3,value7, value8, value9

Another one like this :

ID,D,E,F
1,value10, value11, value12
2,value13, value14, value15
3,value16, value17, value18

And you want a result like this, right ?

ID,A,B,C, D, E, F
1,value1, value2, value3, 1,value10, value11, value12
2,value4, value5, value6,value13, value14, value15
3,value7, value8, value9, value16, value17, value18

John Little

unread,
Apr 21, 2017, 9:57:19 AM4/21/17
to openr...@googlegroups.com
Hi Peter:

I looked at your OR project file and it looks to me like you want to append each file, one below the next, each with the same column headers into a single OR project.

There is actually a very useful facility to do this if you create your OR project from a single Excel file with multiple worksheets.  Make sure each CSV file is a separate worksheet in a single Excel file.  Save the Excel file.  Then open the Excel file in OpenRefine with the "Create Project" option.  After you identify the new excel file (the one with three worksheets), you'll be looking at the data preview screen.  In the bottom of half of the screen you should see that the "Parse Data As" option is already defaulted to Excel.  Looking slightly to your right you should see a "Worksheets to Import" header.  Immediately under that you'll want to select (check) each worksheet you want to import.  OR will take care of the rest.  

I'm not aware of a similar option with multiple raw CSV files.  

Hope that helps.

--John

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

peter.m...@hmri.org.au

unread,
Apr 21, 2017, 8:09:19 PM4/21/17
to OpenRefine
Thx,

Not quite.

first file looks like

    Record ID    Age  Sex    Month
        1              23    M       Jan
        2              45    F        Jan

the second 


    Record ID    Age  Sex    Month
       3             23    M       Feb
       4             45    F        Feb


the third


    Record ID    Age  Sex    Month
       5             45    M       Mar
      6              67     F       Mar

When I import 3 files together they are there but in seperate columns.

Can open refine recognise that the files contain the same data model and append the second to the first and the third to the first two and so on


This is quite a common type of business case where data is collected every month as a CSV, however you want to be able to create a single longitudinal reference file

John Little describes and option, which would require importing the data from each CSV into different worksheets as a first step and then importing the one CSV into OpenRefine




On Friday, April 21, 2017 at 7:03:58 PM UTC+10, Ettore Rizza wrote:
Hi Peter,

Correct me if I'm wrong. So, you have a csv like this :

ID,A,B,C
1,value1, value2, value3
2,value4, value5, value6
3,value7, value8, value9Th

Ettore Rizza

unread,
Apr 22, 2017, 3:29:41 AM4/22/17
to OpenRefine
Oh ok, you want to append each file to each other. But that's exactly the default behavior of Open Refine. There should be no problemas you can see in this screencast. The three CSV files I select are your three tables above. Open Refine does not respect the order you describe, but it does not matter since you can then sort the ID column as you want.


Owen Stephens

unread,
Apr 27, 2017, 12:11:59 PM4/27/17
to OpenRefine
Did Ettore's solution work for you?

peter.m...@hmri.org.au

unread,
Apr 28, 2017, 8:14:10 PM4/28/17
to OpenRefine
I did try it with the same data in three worksheets in one excel workbook,  OR imported and appended the data where the columns had the same ID. Have yet to try with  different CSV files,  I had thought there was an issue but perhaps I didnt try hard enough.

Peter

Owen Stephens

unread,
May 3, 2017, 3:34:53 AM5/3/17
to OpenRefine
Thanks for the update Peter - glad you were able to make this work using an excel workbook

Owen
Reply all
Reply to author
Forward
0 new messages