can Variant Transforms work well on any variant caller format(vcf) file or particular vcf file?

14 views
Skip to first unread message

hze...@gmail.com

unread,
Jan 19, 2021, 8:31:44 PM1/19/21
to GCP Life Sciences Discuss
i got  error of 1 was not ignored

the tutorial at gcp of variant transforms returns error

secondly  
idont understand  below statements
gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp *.vcf \
    gs://BUCKET/vcf/ 

gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp -R \
    VCF_FILE_DIRECTORY/ \
    gs://BUCKET/vcf/ 

gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp -n -R \
    VCF_FILE_DIRECTORY \
    gs://BUCKET/vcf/ 


thanks in advance 
regards,
haroon
 

 

 

Paul Grosu

unread,
Jan 20, 2021, 12:36:14 PM1/20/21
to GCP Life Sciences Discuss
Hi Haroon,

Maybe you could please provide elaborations to the following two questions regarding "i got  error of 1 was not ignored":

1) What exactly is the error that you are experiencing?

2) The steps you took if one were to try to replicate them to also see the same error.

Regarding gsutil here is a description of the options you are using:

-m Causes commands like 'cp' to run in parallel, where the files are split and reconstructed again at the Google Storage location in the cloud.

-o Overrides values for its own configuration file with new options.

-n When specified, existing files at the destination to not be replaced. 

-R  Enables directories and subdirectories to be copied recursively.

So taking all that into account here's what is happening with your commands:

1) gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp *.vcf \
    gs://BUCKET/vcf/ 

   Here you are running the command of copying all the local vcf files to your google cloud location that you own called "BUCKET/vcf/", assuming first that you created those directories.  The files are split into 150 megabyte chunks if they are larger than that and transferred in parallel and reconstituted at Google again.  Google uses buckets as a way to you might think of directories and they contain objects like you think of files.

2) gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp -R \
    VCF_FILE_DIRECTORY/ \
    gs://BUCKET/vcf/ 

    Here you are copying files from VCF_FILE_DIRECTORY and all it's subdirectories (-R recursively) and transferring them to Google by the same split method as above.

3) gsutil -m -o 'GSUtil:parallel_composite_upload_threshold=150M' cp -n -R \
    VCF_FILE_DIRECTORY \
    gs://BUCKET/vcf/ 

   Here you are employing the (-n non-overwrite) so that files if they exist in your directory (bucket) at Google they do not get overwritten given the command in step 2 (above).

Here are a few link that describe this process in more detail:








Hope it helps,
Paul
Reply all
Reply to author
Forward
0 new messages