TSV to RDF issue

10 views
Skip to first unread message

Felipe Alejandro Bergallo Corral

unread,
Apr 20, 2020, 12:37:50 PM4/20/20
to VIVO Tech
Hey. Long time no see, I apologize for not being as participative around here in quite some time but I was wrangled up in some other projects while the company I work at decided to come back around to VIVO. In doing so I've been assigned to VIVO again, probably to the dismay of those who know me, and we're upgrading to VIVO 1.11, if I recall correctly. So far the changes have been few and seem to not impede my work... That is, at least, until I try and upload some tab-separated .csv's (technically tsv's, though VIVO has them under CSV2RDF and has them as a markable option) - where it seemingly doesn't recognize any tabulations - deeming all the columns I've sent over as just one, as if I hadn't selected the "tabbed" option.
Reviewing the html code, it does seem that the "comma-separated" option is marked in the code. I don't really suspect much of this, considering that I can still change it by hand, but still, just to be sure, I try changing all the commas in my code to semicolons and then changing all the tabs into commas, and then trying to upload it as a regular csv - thinking that that was the issue.
Turns out I was wrong on that front because it would then throw an error regarding a character or combination of characters used (likely due to the semicolon or a combination of it and something else).

So considering how long it has been since I've layed hands on Vivo, or Vitro, or even solr, I would like to know in what way I am being silly, because I know I must be.

Thank you for your time,
Felipe A.

Brian Lowe

unread,
Apr 20, 2020, 2:25:00 PM4/20/20
to vivo...@googlegroups.com

Hi Felipe,

 

It’s not just you.  I just tried loading a tab-delimited text file and observe the same problem.  I can load a comma-separated file without issue.  Just make sure it’s really delimited with commas (and not semicolons as, for example, European Excel likes to force you into using) and that values that might contain internal commas are quoted with ASCII double quote characters.

 

It looks like the tab separator support got broken somewhere along the line, possibly during a switch to more modern CSV parsing library.  If you want to apply a local fix to the Vitro Java code (Vitro/api/src/main/java/…), it works for me after I edit the file edu.cornell.mannlib.vitro.webapp.utils.Csv2Rdf.java to change lines 69-71 to

 

CSVParser cReader = new CSVParser(new InputStreamReader(fis),

                           CSVFormat.DEFAULT.withDelimiter(separatorChar)

                                                     .withQuote(quoteChar));

 

Instead of

 

CSVParser cReader = new CSVParser(new InputStreamReader(fis),

                           CSVFormat.DEFAULT.withRecordSeparator(separatorChar)

                                                     .withQuote(quoteChar));

 

The withRecordSeparator() call should be changed to withDelimiter().

 

I’ll open a JIRA issue.

 

Brian

 

De la: vivo...@googlegroups.com <vivo...@googlegroups.com> În numele Felipe Alejandro Bergallo Corral
Trimis: luni, 20 aprilie 2020 19:38
Către: VIVO Tech <vivo...@googlegroups.com>
Subiect: [vivo-tech] TSV to RDF issue

--
You received this message because you are subscribed to the Google Groups "VIVO Tech" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vivo-tech+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vivo-tech/40c608e9-6710-4f83-857b-0981037a0819%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages