Error: duplicate header fields found - Best practice workflow for CSV uploads

2,557 views
Skip to first unread message

patrick...@mac.com

unread,
Dec 13, 2015, 12:10:50 PM12/13/15
to ResourceSpace

Hi I am currently trying to upload quite large batches to a locally hosted copy of resourcespace. I suspect there is something wrong in my method, but can't find where I am going wrong. Would be grateful if anyone could point me in the right direction.

I uploaded photos in batches of a few hundred, with arbitrary field data. Then tried to Upload CSV. Getting "Error: duplicate header fields found" message. Documentation said this could be because there are a different number of headers, rather than a duplicate. Tried downloading CSV for files I had uploaded, changing arbitrary  data for correct data, then upload to database. Still getting the same error. Surely if the "search results metadata" file is the basis of the file I am trying to upload, then the headers should be identical? I tried with and without the Resource ID(s) in case this could be the cause of the problem. Any pointers would be much appreciated,

Thanks,
Patrick.

This is the error message I am getting:
“Info: Found correct field headers for resource_type 1(Photo)
Info: Override resource_type 1(Photo) is valid
Error: duplicate header fields found”

This is the part of the documentation that I am referring to:
“Troubleshooting Tips

Error: duplicate header fields found

This error means that the system has compared the total number of header columns with the number of distinct header columns and found that they do not match. This is intended to detect multiple columns with the same column header name, but could also indicate that you have extra unused columns in your CSV with blank header values.”

Julie Swierczek

unread,
Dec 15, 2015, 10:40:39 AM12/15/15
to ResourceSpace
Patrick,

I believe I ran into this problem before.  I have found that I run into problems when using Excel spreadsheets saved as CSV files. So, I use an extra step in my process, but I stopped running into problems once I did.  (Of course, your mileage may vary.)

First, I create the spreadsheet in Excel.  Then I save as a CSV (Comma delimited), not the CSV (MS-DOS) or CSV (Macintosh) options you'll see in the Excel 'save as' dialog box. I then open that CSV in Libre Office Calc to preview the columns there. (This is just a quick check to see if the columns appear to have the right data in them, or if there are any strange characters appearing anywhere.) Occasionally I will find that the columns have been messed up, usually due to commas within fields.  I save it as a CSV file again, but I use the 'edit filters' checkbox under the file format drop-down in Calc, and in the dialog box, I change the text formatting to Unicode (UTF-8) and I select the option to quote the content of the text type cells. (I do this even if the CSV looked okay when I previewed it in Calc.)  Then I upload the CSV to ResourceSpace. 

I've found this helped solve several problems where Excel was doing something weird with my metadata.  The main issue I found repeatedly is that when saving as a CSV in Excel, sometimes some of the data ended up in the next column to the right. I think this created a situation where there would be an extra column with data in it at the right, but there would be no header to that column. (It would also make my metadata go into the wrong fields.)  If I opened the CSV in Excel, everything looked fine.  But when I opened it in Calc, I could see the problem.  Also, I found (from working with the ResourceSpace support staff), that sometimes Excel saves empty columns at the end of the spreadsheet, which makes it look like there is data in those columns, and those columns have no header.  Calc does not do this.  It only saves the columns that have data in them.

(For anyone asking why I just didn't create the spreadsheet in Calc to begin with, I don't remember exactly. There were some techniques in Excel that didn't work in Calc, so I used Excel to create the spreadsheet and then used Calc to fix any problems.)

I would then upload the CSV file I saved from Calc.  I would add it to its own collection, at least until I was happy with the results of this whole upload process.  I would preview some random files.  After I had done the CSV upload, then I would upload the resources. I used the Team Center>Manage Resources>Replace Resource Batch option. I have a column called "Filename" in all of my CSV files, so I choose that from the menu of fields for matching.  Then I upload my resources and review a few of them again to make sure everything looks right.  Once I am satisfied with the results, I can delete the temporary collection I made while uploading the resources (if desired.)

So, I would recommend trying to save the CSV from some program other than Excel.  (If you are not using Excel, then I don't know what to tell you - you might have to look for settings or methods to only save certain columns.  One method might be to look at the CSV file in a text editor to see if there are extra commas at the end of each line that would indicate that blank columns are being saved, and then figure out how to rewove those extra commas.)

I hope some of that helps you find a solution.

Julie

Patrick C

unread,
Dec 18, 2015, 6:37:30 AM12/18/15
to ResourceSpace
Julie,
Thank you for a detailed and considered reply. I have tried your suggestions and you were right. It appears Excel was changing the data. I had removed commas with replace, and saved as UTF8, but Excel had done something strange to the CSV. It might be useful to someone that I was using a quite old version of MS Office (Mac).

I now use libra Office Calc as you suggested. Some data had moved to another column. I deleted all extra columns & rows & re-saved the file.

For uploading files, I now do it your way. Upload CSV, then use Team Centre > Manage Resources > Replace Resource Batch option, to import Images.

I may automat some of this process in Text Wrangle, but it works!

Thanks again
Patrick



 
Reply all
Reply to author
Forward
0 new messages