Hi,
The command line should take a database of known variants as a vcf.gz sorted and compressed file (+ a .tbi tabix index)
from the SnpSift page
java -jar SnpSift.jar annotate dbSnp132.vcf variants.vcf > variants_annotated.vcf
# in our case the command is more of the kind
java -jar SnpSift.jar annotate dbSnp132.vcf.gz variants.vcf > variants_annotated.vcf
Important: SnpSift annotate command has different strategies depending on the input VCF file:
- Uncomressed VCF If the file is not compressed, it created an index in memory to optimize search. This assumes that both the database and the input VCF files are sorted by position, since it is required by the VCF standard (chromosome sort order can differ between files). NOT OK, read below
- Compressed, Tabix indexed It uses the tabix index to speed up annotations.
REM: the uncompressed input is not OK for our server where memory is too limited so we need the compressed and indexed approach.
The command itself does not include (and cannot) the tbi file (implicit) but it is required for success
I tried to create a separate file parameter for the index but not use it in the command and this is not allowed.
Could such a behaviour be implemented?
so that the user provides (by upload and or URL so that we also can us online dbsnp databases)
* the vcf.gz as argument #1
* vcf.gz.tbi files as argument #2
the command includes only the first argument but both files are linked or copied to the job folder so that the command finds them both as required.
The fix by Guy for such situations is to make a perl wrapper but it is really a pain to develop a perl script to handle this sole exception (which is often seen in variant / VCF analysis.
Could we make the command double with a neutral part only meant to copy the tbi to the job folder
ln -s <index.file> . && java -jar <libdir>/SnpSift.jar annotate <database.file> <input.file> > <output.file>
Thanks in advance for any suggestion.
Stephane