uploading NA in proteomics data

Fernando Benito

unread,

Jun 23, 2017, 5:39:22 AM6/23/17

to transmart-discuss

Hello.

Until now we have uploaded proteomic data to tranSMART converting all the NA values to 0. NA values means that the protein wasn't found in that subject. We put "ZERO_MEANS_NO_INFO=Y" in the config file. But it can be confounded with values that has 0 values ( we use log2).

I have tried with "ALLOW_MISSING_ANNOTATIONS=Y" but I am still receiving this error:

(with transmart-batch)

> java -jar ./build/libs/transmart-batch-1.1-SNAPSHOT-capsule.jar -n -p ./studies/PESA/proteomics.params

2017-06-23 10:12:46,582 [main] [ERROR] o.s.b.c.s.AbstractStep - Encountered an error executing step secondPass in job proteomicsDataLoadJob

groovy.lang.MissingMethodException: No signature of method: static java.lang.Double.isNaN() is applicable for argument types: (null) values: [null]

Possible solutions: isNaN(), isNaN(double), isCase(java.lang.Object), is(java.lang.Object), isCase(java.lang.Number), any()

at groovy.lang.MetaClassImpl.invokeStaticMissingMethod(MetaClassImpl.java:1495) [groovy-all-2.3.6.jar:2.3.6]

at groovy.lang.MetaClassImpl.invokeStaticMethod(MetaClassImpl.java:1481) [groovy-all-2.3.6.jar:2.3.6]

at org.codehaus.groovy.runtime.callsite.StaticMetaClassSite.call(StaticMetaClassSite.java:50) ~[groovy-all-2.3.6.jar:2.3.6]

at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45) [groovy-all-2.3.6.jar:2.3.6]

at java_lang_Double$isNaN$1.call(Unknown Source) ~[na:na]

at org.transmartproject.batch.highdim.datastd.FilterNaNsItemProcessor.process(FilterNaNsItemProcessor.groovy:15) ~[transmart-batch-1.1-SNAPSHOT-capsule.jar:na]

at org.transmartproject.batch.highdim.datastd.FilterNaNsItemProcessor.process(FilterNaNsItemProcessor.groovy) ~[transmart-batch-1.1-SNAPSHOT-capsule.jar:na]

at org.springframework.batch.item.support.CompositeItemProcessor.processItem(CompositeItemProcessor.java:61) ~[spring-batch-infrastructure-3.0.1.RELEASE.jar:3.0.1.RELEASE]

...

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108) [groovy-all-2.3.6.jar:2.3.6]

at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112) [groovy-all-2.3.6.jar:2.3.6]

at org.transmartproject.batch.startup.RunJob.main(RunJob.groovy:63) [transmart-batch-1.1-SNAPSHOT-capsule.jar:na]

2017-06-23 10:12:46,583 [main] [INFO] o.t.b.b.LogCountsStepListener - READ: 0, WRITTEN: 0, SKIPPED: 0

2017-06-23 10:12:46,616 [main] [WARN] o.t.b.b.BetterExitMessageJobExecutionListener - Exit description: MissingMethodException: No signature of method: static java.lang.Double.isNaN() is applicable for argument types: (null) values: [null]

Possible solutions: isNaN(), isNaN(double), isCase(java.lang.Object), is(java.lang.Object), isCase(java.lang.Number), any()

2017-06-23 10:12:46,619 [main] [INFO] o.s.b.c.l.s.SimpleJobLauncher - Job: [FlowJob: [name=proteomicsDataLoadJob]] completed with the following parameters: [{STUDY_ID=PESA, ALLOW_MISSING_ANNOTATIONS=Y, SKIP_UNMAPPED_DATA=Y, SRC_LOG_BASE=2, MAP_FILENAME=/home/fbenito/projects/transmart-batch/./studies/PESA/proteomics/proteomics_subject_sample_mapping.txt, ZERO_MEANS_NO_INFO=N, run.date=1498202571217, SECURITY_REQUIRED=Y, DATA_FILE=/home/fbenito/projects/transmart-batch/./studies/PESA/proteomics/proteomics_data.txt, LOG_BASE=2, DATA_TYPE=L, TOP_NODE=\Private Studies\PESA\}] and the following status: [FAILED]

Any idea of how to leave the NAs and not to receive this error in the uploading?

Natalia Boukharov

unread,

Jun 23, 2017, 1:47:15 PM6/23/17

to transmart-discuss

If it's not found, then you can just drop it. The protein will be missing for that subject. Another approach is to replace it with an appropriate very low value. I assume that "0" is raw value "1". For NA you can use -6.6 = 0.01; -9.9 = 0.001 or 1/2, 1/4 of the lowest raw value in your assay. This could be a better approach. This way subjects with NA values will still be contributing to the statistics. But you don't want to make it too low. It could artificially increase variance at the low end. What to do with NAs mostly depends on how many NAs you have, what kind of experiment you are loading, and how you are going to analyze it. If you are loading this data as low dimensional, you can always "flag" NAs by creating a sample annotation folder where you can annotate NAs as "Below LLOQ, replaced with ...". Then you can remove these samples from your analysis if needed.

Fernando Benito

unread,

Jun 27, 2017, 10:54:06 AM6/27/17

to transmart-discuss

Thanks for the answer, Natalia.

The point is that I want the protein without value to not taken into account. But the transmart-batch uploader crashes if you leave a blank in any observation.

After you did the uploading with ZERO_MEANS_NO_INFO=Y when you do a heatmap I can see that the zero values are taken into account. I cannot see the point to the ZERO_MEANS_NO_INFO=Y

Wibo Pipping

unread,

Aug 7, 2017, 8:50:43 AM8/7/17

to transmart-discuss

Hi Fernando,

The ZERO_MEANS_NO_INFO flag should exclude the 0 values when loading raw data. The documentation on the High dimensional parameter files (like proteomics data) states it will only work for when loading the raw data type. So the behaviour of the data loader is dependent not only on the ZERO_MEANS_NO_INFO flag but also on the DATA_TYPE flag. When you set this flag to L for log transformed the ZERO_MEANS_NO_INFO will be skipped. From the error that you get the params file that you use the DATA_TYPE set to L. What you currently try to do is not possible in the current set up.

A work around is available but requires you to transform your own data to raw format, setting all of the data points you want to exclude to 0 (ZERO) and adjust your input parameter file. This will let tranSMART handle your log transformation and will exclude the ZERO values. I recommend using a log 2 base to do this as transmart-batch only supports the log 2 transform step from raw to log transformed. This would mean all your data needs to be transformed to the raw format by doing 2^<value_in_data> (2 to the power of value of the data). If you do not remove the NA's it should be fairly easily to set all the missing values to 0 after you did the transformation.

Example (part of) params file:

DATA_FILE = ....

MAP_FILENAME = ...

DATA_TYPE=R

ZERO_MEANS_NO_INFO=Y

The ALLOW_MISSING_ANNOTATIONS and SKIP_UNMAPPED_DATA refer to the platform having not all of the annotations that are in the data or the data not having data for all the annotations in the platform. If you have a row of data for each probe/peptide/annotation you should not have to use these two flags.

Fernando Benito

unread,

Aug 9, 2017, 6:15:01 AM8/9/17

to transmart-discuss

Thanks for the answer.

I will try the work around. It has a lot of sense for me.

Reply all

Reply to author

Forward