Reading of .rnk files indicates incorrect format

151 views
Skip to first unread message

R L

unread,
Oct 3, 2022, 12:09:30 AM10/3/22
to gsea-help
I know this comes up a lot, but after going though a number of posts that identify problems, I am still left with a file that i cannot enter into the analysis. I am trying to enter a ranked list of gene identifiers, tab separated, no column identifiers. The file just lists ID and log2 score as shown below. 
UBXN8    25.3537038
PDLIM1    30.7123881
SNAP23    23.75606637
AIP    24.82411406
C11orf58    25.48169931
APBB1    25.23614323
PGRMC1    27.58715693
DFFA    26.05708857
EIF3F    22.66858235
every attempt to upload the file list.rnk (no hidden extension) I get the same error (shown below)

The file was created as tab delimited text export from Excel and renamed from .txt to .rnk. Any help would be appreciated.


There were errors: ERROR(S) #:1
Parsing trouble
java.lang.IllegalArgumentExcepti ...

---- Stack Trace ----
# of exceptions: 1
------Unknown file format: D:\Dropbox\@-Work\Perl Script Samples\BioID\list.rnk no known Parser for ext: ------
java.lang.IllegalArgumentException: Unknown file format: D:\Dropbox\@-Work\Perl Script Samples\BioID\list.rnk no known Parser for ext:
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:782)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserFactory.read(ParserFactory.java:736)
    at org.gsea_msigdb.gsea/edu.mit.broad.genome.parsers.ParserWorker.doInBackground(ParserWorker.java:53)
    at java.desktop/javax.swing.SwingWorker$1.call(Unknown Source)
    at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    at java.desktop/javax.swing.SwingWorker.run(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)

list.rnk

Anthony Castanza

unread,
Oct 3, 2022, 1:16:09 PM10/3/22
to gsea...@googlegroups.com
Hello,

I'm not entirely sure what might be going wrong here. Special characters in the file path can cause issues with GSEA recognising files properly, I noticed the file path you were using D:\Dropbox\@-Work\Perl Script Samples\BioID\list.rnk
has several, the @-Work in particular might be problematic. Could you try a different path without this folder in it? If that doesn't work, we can take a look at the file itself confidentially if you send it to the gsea...@broadinstitute.org address.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/a41d63ed-7405-4636-88ee-dfaa8ad536ban%40googlegroups.com.

Anthony Castanza

unread,
Oct 3, 2022, 1:18:58 PM10/3/22
to gsea...@googlegroups.com
My apologies, I missed that you already attached that .rnk file. I was able to load the file on my system without any issues, so I definitely think it is an issue with the file path.
Additionally, I noticed that you only had 643 genes in the data. Assuming you didn't simply truncate this to send it to us, GSEA expects ranking information for all expressed genes, not just genes that pass arbitrary cutoffs for fold change or significance or similar.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego

Reza MMR

unread,
Oct 3, 2022, 2:09:07 PM10/3/22
to gsea...@googlegroups.com
Thank you for your email.
I tried a different file path and got the error below. 


Screen Shot 2022-10-03 at 12.07.04 PM.png

David Eby

unread,
Oct 3, 2022, 4:02:32 PM10/3/22
to gsea...@googlegroups.com
Hi,

As Anthony pointed out, the list.rnk file attached to your first message loads fine in the latest version of GSEA.  Are you trying to load a different RNK and seeing this error?  What version of GSEA are you using?

This error indicates that GSEA found a non-numeric value in a position where it expects a number (the second column).  Check the second column of your file for any text value.  Look out especially for any blank value (like all spaces); earlier versions of GSEA may not have dealt with these well.

Regards,

Ralf L

unread,
Oct 4, 2022, 11:40:10 AM10/4/22
to gsea...@googlegroups.com
Hello David and Anthony, (I assume you are both getting this)
First of all, thank you for the fast response and my apologies that my google account only shows initial, I forgot about that, 
It is running now. I placed it into a directory with straight-forward names (C:\Data). Since I was running the stand alone version on my PC, it did not occur to me that it may not be able to read directory names with hyphens or non alpha numeric characters.

As to the reason for the truncated list, the data are the result of a BioID mass spec based proteomics analysis. These are all the targets that were identified for the membrane bound protein as bait. Our current control is apparently going to a cytoplasm due to processing differences, and does not make for as good a control as we had hoped. Total hits are 2200, but the generic cytosol targeting makes up the bulk of it.
I assume that for the rank file, only a single list of IDs can be put in. Otherwise we would enter both the targeted and untargeted with zero values as needed for the normalized log2 data. I assume a rnk file with two data columns, 2200 rows, but a lot of zero (and hence identical) values would not work?


Ralf



You received this message because you are subscribed to a topic in the Google Groups "gsea-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gsea-help/GqZstwGtoQ4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gsea-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gsea-help/CAFEo9XgQXAzMU0uhsBgzpV%3D6SSgv%2BQNuq0xjTzB3SFXu%2BF2U2w%40mail.gmail.com.

Anthony Castanza

unread,
Oct 5, 2022, 2:04:21 PM10/5/22
to gsea...@googlegroups.com
Hi Ralf,

This isn't really the type of experiment that GSEA was designed to handle. You can certainly try with the truncated list, this will likely result in a large number of sets being thrown away as they'll fail out of the minimum size threshold, and some sets where a large percentage of the genes were thrown away may no longer accurately reflect their annotations. A large number of zero values isn't desirable for GSEA either as they end up arbitrary ordered which can also affect enrichment calculations. GSEA Preranked also wouldn't be able to accept a ranked list with two columns of values. Your best bet is probably to proceed with your original design, however I'd recommend some additional validation of the gene sets after they've been restricted to the input list. I don't know how much more help we'd be able to offer here, but if you have additional questions feel free to reach out and we'll try our best.

-Anthony

Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
Mesirov Lab, Department of Medicine
University of California, San Diego
Reply all
Reply to author
Forward
0 new messages