BaseSpaceCLI 1.4.0 fails uploads for biosamples with multiple FASTQ datasets

974 views
Skip to first unread message

Ning

unread,
Aug 19, 2021, 2:08:30 AM8/19/21
to basespace-developers
I have a biosample with three paired-end FASTQ datasets, six files in total. These FASTQ datasets were all generated from the same library.

When I try to upload these FASTQ datasets into one single biosample using the BaseSpaceCLI 1.4.0, I get an error about incorrect ReadNums. However, if I try to upload these same datasets via the BaseSpace Sequence Hub web UI, the upload completes perfectly (100.00% reads passing filter, too).

Is this a known bug? If so, where can I subscribe for updates to its fix? If not, is the BaseSpaceCLI team able to reproduce it? I can share more information via private email.

---

The BaseSpaceCLI commands I tried:
  • bs upload dataset --project <project id> --biosample-name SAMPLE \
        SAMPLE_S1_L002_R1_003.fastq.gz \
        SAMPLE_S1_L002_R2_003.fastq.gz \
        SAMPLE_S1_L003_R1_002.fastq.gz \
        SAMPLE_S1_L003_R2_002.fastq.gz \
        SAMPLE_S1_L004_R1_001.fastq.gz \
        SAMPLE_S1_L004_R2_001.fastq.gz
  • bs upload dataset --project <project id> --biosample-name SAMPLE \
        --recursive .
  • for r1 in $(ls *R1*.fastq.gz); do
        r2=$(echo $r1 | sed "s/R1/R2/g")
        bs upload dataset \
            --project <project id> \
            --biosample-name=SAMPLE $r1 $r2
    done
The error message is: ERROR: *** error in validation: Read(R1): Incorrect ReadNum ***, followed by panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xc8pc=0x72b926].

The last command, the one with the bash for loop, does not raise the segmentation error, and fails for two out of three FASTQ datasets (the other commands fail for all FASTQ datasets).

Nick Vinckier

unread,
Aug 19, 2021, 3:30:38 AM8/19/21
to basespace-developers on behalf of Ning
All these command should work, it's just the filenames that are off a bit. The CLI is expecting all files to end with _001.fastq.gz. the files with _002 and _003 I believe are what are likely throwing things off.

If you change everything to _001 and still get errors, can you report back on what those errors are?

--
You received this message because you are subscribed to the Google Groups "basespace-developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to basespace-develo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/basespace-developers/014ac19e-bf89-4a2d-830c-c6aa02c7afcen%40googlegroups.com.

Ning

unread,
Aug 19, 2021, 4:45:11 AM8/19/21
to basespace-developers
Thanks the suggestion!

I understand the trailing four digits sans suffix to be the flow cell index ("FASTQ File Upload Requirements"). Inspection of each FASTQ dataset showed that they used different flow cells, so I gave them each a different index.
  1. In any case, I tried renaming each file to end in _001.fastq.gz as you suggested, and still got the same error.
  2. In addition, I have just successfully uploaded a biosample with two FASTQ datasets, one with flow cell index 1 (ending in _001.fastq.gz) and the other with index 2 (ending in _002.fastq.gz) --- no errors at all.

Nick Vinckier

unread,
Aug 19, 2021, 9:22:16 AM8/19/21
to basespace-developers on behalf of Ning
Is it just the files ending in 003 that give this error then, if you try uploading each pair individually?

Nick Vinckier

unread,
Aug 19, 2021, 9:30:56 AM8/19/21
to basespace-developers on behalf of Ning
Thinking about this more, this may indicate something mismatching between the read number in the filename compared to that in there read headers within the fastq files themselves.

If you can upload all with the GUI though, that is unexpected.

What OS are you using? 

Also, I would suggest reaching out to techs...@illumina.com for assistance as well.



Ning

unread,
Aug 19, 2021, 10:21:41 AM8/19/21
to basespace-developers
The third command in my original post does upload each pair individually, and in that case it is indeed the case that only the pair with flow cell ID 1 (ending on _001.fastq.gz) completes the upload successfully.

However, I have checked the all the reads using FastQValidator, plus FastqPairedEndValidator, checked the heads and tails of each file, and as you mentioned they upload successfully with the GUI, so I don't think there are any obvious problems with the contents of the file.

Ubuntu version something-or-rather, and I will reach out to Illumina support soon. Just wanted to make sure I wasn't missing out anything obvious.

Reply all
Reply to author
Forward
0 new messages