Error in executing task.EnhanceDataset and Project Status

Bhavesh Sanghvi

unread,

Jun 16, 2008, 3:56:57 AM6/16/08

to openb...@googlegroups.com

Hi,

I was trying to execute EnhanceDataset task. However, I found that a command of the following form is not executable (please assume that my extras home folder is C:\openbiomind-extras_0.60\extras and I am executing the command from any RANDOM location, say C:\dev)

java -cp C:\openbiomind-extras_0.60\extras\openbiomind-bin_0.61.jar;C:\openbiomind-extras_0.60\extras\ task.EnhanceDataset -d C:\openbiomind-extras_0.60\extras\datafiles\varm126.tab -e C:\temp\output.txt -ontologyAssociationFile C:\openbiomind-extras_0.60\extras\datafiles\pir\gene2pir.varm126.txt -ontologyDescriptionFile C:\openbiomind-extras_0.60\extras\datafiles\pir\pir2desc.txt

The error that I get is:

Exception in thread "main" java.lang.NullPointerException

at task.PipelineParameters.<init>(PipelineParameters.java:145)

at task.EnhanceDataset.main(EnhanceDataset.java:110)

Further analysis showed that the reason for this is the usage of absolute paths for -ontologyAssociationFile and -ontologyDescriptionFile. Absolute files work perfectly fine for -e and -d arguments. When I looked into PipelineParameters.java, I found that the way these files are being read is different than the way -e and -d files. The tutorial does not mention about a special consideration for this. I think that -d, -ontologyAssociationFile and -ontologyDescriptionFile are for existing files (ideally they can be anywhere, if I am not wrong) and -e is a new file that would be created (I ask for a directory and new file name). Is this fine?

I also want to know about how must I capture errors? Are only errors displayed on standard output? Must I read the error from standard output? I am asking all this since, I am executing the task using ProcessBuilder and I need to rely on various outputs generated by the task (files, standard output, etc.)

My current status is that I am done with the following:

1. Base Layout

2. Preference Page

a. Task won’t execute unless preferences are set

3. Wizard for Enhance Dataset task with error check

a. Wizard displays progress monitor

b. Will open all files in various tabs and files will be listed in a tree(not yet done)

i. Names of files to open (non-null) are being shown in console for now

4. Able to execute Enhance Dataset when -d and -e arguments are specified. The file is created (but not yet opened)

5. All message strings (except for command string) are externalized

6. Tried to cover minimal documentation at almost all places

The code can be downloaded from http://code.google.com/p/openbiomind-gui/source/checkout and the project can be imported in Eclipse 3.4 RC3 (required Java 1.6). I am planning to generate a build soon after I am able to open files in editor (so that it is not needed to build the project).

--

Thanks,

Bhavesh Sanghvi

Lúcio de Souza Coelho

unread,

Jun 16, 2008, 1:36:44 PM6/16/08

to openb...@googlegroups.com

On Mon, Jun 16, 2008 at 4:56 AM, Bhavesh Sanghvi
<bsan...@cs.iastate.edu> wrote:
> Hi,

Hi Bhavesh,

(...)

> Further analysis showed that the reason for this is the usage of absolute
> paths for -ontologyAssociationFile and -ontologyDescriptionFile. Absolute
> files work perfectly fine for -e and -d arguments. When I looked into
> PipelineParameters.java, I found that the way these files are being read is
> different than the way -e and -d files. The tutorial does not mention about
> a special consideration for this. I think that -d, -ontologyAssociationFile
> and -ontologyDescriptionFile are for existing files (ideally they can be
> anywhere, if I am not wrong) and -e is a new file that would be created (I
> ask for a directory and new file name). Is this fine?

(...)

The files loaded by PipelineParameters are the default ontology
association files, which are included in the extras package. The
-ontologyAssociationFile/-ontologyDescriptionFile options are there
for specifying "custom" ontology files, when the user wants to use a
non-default gene ontology. (The extras package contains ontology files
for PIR and GO, but th user may want to use ontologies based on KEGG
or other project, or even "handcrafted" ontologies made specifically
for the problem at hand.)

As for the -e option, it specifies the name of the file that is the
output of the EnhanceDataset command. It will be a regular dataset
file like the input data file specified by the -d option, but it will
have additional features (i.e., rows) added by the dataset enhancement
process.

In a new version of the tutorial to be released in a recent future (I
hope :), I will include startup and troubleshooting sections talking
specifically about those file dependencies set up in the properties
file, for I have seen that they are a frequent problem among new
users. I have even considered including the must-have data files
directly into the jar package (inelegant as that may sound to some),
in order to prevent a lot of headaches for users; besides, their
inclusion in the extras package does not look right either, for the
system depends on them. BTW, suggestions about other solutions will be
most welcomed.

> I also want to know about how must I capture errors? Are only errors
> displayed on standard output? Must I read the error from standard output? I
> am asking all this since, I am executing the task using ProcessBuilder and I
> need to rely on various outputs generated by the task (files, standard
> output, etc.)

(...)

Unfortunately all error messages currently goes to the standard
output, since OpenBiomind was first developed as a CLI-only toolkit.
And since we agreed that the GUI should be a wrapper to the CLI, I
guess that the less invasive way to deal with errors would indeed be
checking standard output.

Bhavesh Sanghvi

unread,

Jun 16, 2008, 3:12:33 PM6/16/08

to openb...@googlegroups.com

H, Lúcio,

I still have following queries:
1. Regarding -ontologyAssociationFile and-ontologyDescriptionFile my
question is can they be absolute paths? -d and -e accept absolute file paths
and work fine. However, if absolute paths are given for the optional
parameters, a Null pointer exception occurs. (Please look at
http://imagebin.org/21122. Here, I've allowed user to select absolute path
for original datase. Enhanced dataset can be specified as a combination of
destination directory and a new file name. I am interested in allowing
absolute files for the remaining two as well. In case absolute paths are not
possible, must these be relative? If yes, then relative to what? Please also
look at preference page at http://imagebin.org/21123 - the GUI will ask user
to configure these preferences before executing a task.)

2. Whatever comes as output to standard output, is it fine if I display that
as error? I mean is standard input only used for errors?

--
Best,
Bhavesh Sanghvi

Murilo Saraiva de Queiroz

unread,

Jun 16, 2008, 3:39:24 PM6/16/08

to openb...@googlegroups.com

Regarding the error messages: error messages should be sent to standard error, and progress reports / information messages to standard output. If current code doesn't behaves like that, Lúcio should modify it.

If just the current messages aren't enough, I discussed with Lúcio the possibility of having a command-line flag to enable special formatting of the messages sent to standard input/output/error. This flag would enable more easily parseable information about progress, errors, etc., to be used by the GUI.

--
Murilo Saraiva de Queiroz, MSc.
Senior Software Engineer
http://www.vettalabs.com
http://www.tecnologiainteligente.com.br

Lúcio de Souza Coelho

unread,

Jun 16, 2008, 3:41:12 PM6/16/08

to openb...@googlegroups.com

On Mon, Jun 16, 2008 at 4:12 PM, Bhavesh Sanghvi
<bsan...@cs.iastate.edu> wrote:
>

> H, Lúcio,
>
> I still have following queries:
> 1. Regarding -ontologyAssociationFile and-ontologyDescriptionFile my
> question is can they be absolute paths? -d and -e accept absolute file paths
> and work fine. However, if absolute paths are given for the optional
> parameters, a Null pointer exception occurs. (Please look at
> http://imagebin.org/21122. Here, I've allowed user to select absolute path
> for original datase. Enhanced dataset can be specified as a combination of
> destination directory and a new file name. I am interested in allowing
> absolute files for the remaining two as well. In case absolute paths are not
> possible, must these be relative? If yes, then relative to what? Please also
> look at preference page at http://imagebin.org/21123 - the GUI will ask user
> to configure these preferences before executing a task.)

Ok, now I got it - I have just tested calling EnhanceDataset with
absolute and relative paths and indeed the command crashes with
absolute paths. That's a bug (and an odd one BTW), and thanks for
discovering it! The system should allow both relative and absolute
paths. So, I'll fix that, release a new .jar and notify you.
Meanwhile, while developing the interface, you can assume that both
relative and absolute paths are allowed.

> 2. Whatever comes as output to standard output, is it fine if I display that
> as error? I mean is standard input only used for errors?

(...)

No, standard output is primarily meant for progress messages when
running a command.

Lúcio de Souza Coelho

unread,

Jun 16, 2008, 3:48:12 PM6/16/08

to openb...@googlegroups.com

On Mon, Jun 16, 2008 at 4:39 PM, Murilo Saraiva de Queiroz
<mur...@gmail.com> wrote:
> Regarding the error messages: error messages should be sent to standard
> error, and progress reports / information messages to standard output. If
> current code doesn't behaves like that, Lúcio should modify it.

(...)

Sorry for the innacuracy in my statement: exception treatment
throghout the OpenBiomind code indeed sends error messages to
System.err . In my previous message I intended to say that in the end
those error messages will appear along every other text output in the
terminal, during CLI usage.

Bhavesh Sanghvi

unread,

Jun 16, 2008, 7:55:49 PM6/16/08

to openb...@googlegroups.com

Hi,

1. Regarding standard output and standard error: Referring to the other
e-mails by Murilo and Lúcio, I’ll capture both of them and will show them
distinctly (separate view / same view and separate color) -
2. Regarding using relative and absolute paths, I'll wait for the new JAR.
Meanwhile, I think that the current implementation will start working after
the update. I'll disable those fields until then.

--
Best,
Bhavesh Sanghvi

-----Original Message-----
From: openb...@googlegroups.com [mailto:openb...@googlegroups.com] On
Behalf Of Lúcio de Souza Coelho
Sent: Monday, June 16, 2008 2:41 PM
To: openb...@googlegroups.com
Subject: Re: Error in executing task.EnhanceDataset and Project Status

Bhavesh Sanghvi

unread,

Jul 4, 2008, 6:08:56 PM7/4/08

to openb...@googlegroups.com, Lúcio de Souza Coelho, Murilo Saraiva de Queiroz

Hi,

I am wondering if this issue
(http://code.google.com/p/openbiomind/issues/detail?id=2) has been fixed?
The same problem also occurs for –testDataset argument of
task.DatasetTransformer (as I thought earlier, that it will occur for almost
all the absolute paths - I think those values that are read from
pipeline.properties as well as command line).
The GUI does not work properly if these arguments are specified.

--
Best,
Bhavesh Sanghvi

-----Original Message-----
From: openb...@googlegroups.com [mailto:openb...@googlegroups.com] On
Behalf Of Lúcio de Souza Coelho
Sent: Monday, June 16, 2008 2:41 PM
To: openb...@googlegroups.com
Subject: Re: Error in executing task.EnhanceDataset and Project Status

Lúcio de Souza Coelho

unread,

Jul 4, 2008, 6:24:38 PM7/4/08

to Bhavesh Sanghvi, openb...@googlegroups.com, Murilo Saraiva de Queiroz

Hi Bhavesh,

I apologize for being so late in dealing with that - I was stuck in other activities an only yesterday I started to look at this problem.

It seems that bug is being caused by the use of ClassLoader to get system resources such as files - a way of loading used for many files throghout OpenBiomind. That ClassLoader approach was implemented indeed to free OpenBiomind from a hardcoded directory structure and let the user place the files wherever he or she deemed the best places; but for some reason that is not working when the user *wants* to use an absolute path. I am still investigating how to deal with that, hopefully in a way that does not bring again the use of a hard dir structure.

Bhavesh Sanghvi

unread,

Jul 6, 2008, 9:38:58 PM7/6/08

to openb...@googlegroups.com, Murilo Saraiva de Queiroz

Hi Lúcio,

I will wait for the fix.

--

Thanks,

Bhavesh Sanghvi

Lúcio de Souza Coelho

unread,

Jul 8, 2008, 4:56:03 PM7/8/08

to openb...@googlegroups.com

Hi Bhavesh,

I've finally fixed the bug in EnhanceDataset and uploaded the corresponding new version of the OpenBiomind jar to the project site. (There are a number of other changes like the RFE functionalities and so I numbered that version as 0.70 .)

However, so far I could not reproduce the same error with the -testDataset option of DatasetTransformer. Maybe it is specific of your version of JVM or operating system, and so I'll try a test on the same conditions.

Bhavesh Sanghvi

unread,

Jul 8, 2008, 6:26:14 PM7/8/08

to Lúcio de Souza Coelho, openb...@googlegroups.com

Hi Lúcio,

Thanks. I tried openbiomind-bin_0.70.jar and it works absolutely fine. You can close that issue (I do not have the option available with me).

Strangely, I too was not able to reproduce the problem on –testDataset of DatasetTransformer on the older version of jar. May be last time, I did some other mistake that caused the NullPointerException. Anyways, still the new jar is working fine for both tasks (and their options), I am fine :)

I’d one other question though. I’d read that Google Code does not allow you to delete/rename old release. However, you removed the older version of the jar. Can you please tell me, how did you do that? Due to this problem, I am releasing the builds on the Murilo’s FTP site.

--

Thanks,

Bhavesh Sanghvi

Lúcio de Souza Coelho

unread,

Jul 8, 2008, 6:43:35 PM7/8/08

to openb...@googlegroups.com

Just putting this on the list too, I forgot to use "reply all", sorry...

---------- Forwarded message ----------
From: Lúcio de Souza Coelho <luc...@gmail.com>
Date: Tue, Jul 8, 2008 at 7:42 PM
Subject: Re: Error in executing task.EnhanceDataset and Project Status

To: Bhavesh Sanghvi <bsan...@cs.iastate.edu>

On Tue, Jul 8, 2008 at 7:26 PM, Bhavesh Sanghvi <bsan...@cs.iastate.edu> wrote:
(...)

> Thanks. I tried openbiomind-bin_0.70.jar and it works absolutely fine. You can close that issue (I do not have the option available with me).
>
>
>
> Strangely, I too was not able to reproduce the problem on –testDataset of DatasetTransformer on the older version of jar. May be last time, I did some other mistake that caused the NullPointerException. Anyways, still the new jar is working fine for both tasks (and their options), I am fine :)

(...)

Great! Thanks for the update!

> I'd one other question though. I'd read that Google Code does not allow you to delete/rename old release. However, you removed the older version of the jar. Can you please tell me, how did you do that? Due to this problem, I am releasing the builds on the Murilo's FTP site.

(...)

Actually I just deprecated the older version. You can still get it by
just going to the "Downloads" tab of the project site and doing a
search on "Deprecated Downloads" instead of the default "Current
Downloads".

Reply all

Reply to author

Forward