buildindex: Working for some genomes others give "-1" in .files file

17 views
Skip to first unread message

Peter Hoyt

unread,
Mar 16, 2020, 11:50:39 AM3/16/20
to Subread
I'm running Rsubread 2.0.1 on a Win10 system, With R v. 3.6.1 and when using buildIndex on a specific genome (Felis catus) the index `.files` file seems to be wrong. I have tried other genomes with no problems. the `cat.files` file output has "-1" in the third column, starting after the 15th chromosome, and all chromosomes after that are also "-1".  Here's an example using
```
ref <- cat9.fa
buildindex(basename="cat9",reference=ref)
```

```
NC_018723.3    .//subread-index-sam-012480-798846    123
NC_018724.3    .//subread-index-sam-012480-798846    245559744
NC_018725.3    .//subread-index-sam-012480-798846    419481211
NC_018726.3    .//subread-index-sam-012480-798846    564729488
NC_018727.3    .//subread-index-sam-012480-798846    775916970
NC_018728.3    .//subread-index-sam-012480-798846    933438341
NC_018729.3    .//subread-index-sam-012480-798846    1085329585
NC_018730.3    .//subread-index-sam-012480-798846    1231923099
NC_018731.3    .//subread-index-sam-012480-798846    1457896081
NC_018732.3    .//subread-index-sam-012480-798846    1621392114
NC_018733.3    .//subread-index-sam-012480-798846    1740720952
NC_018734.3    .//subread-index-sam-012480-798846    1832196116
NC_018735.3    .//subread-index-sam-012480-798846    1930464506
NC_018736.3    .//subread-index-sam-012480-798846    2028365162
NC_018737.3    .//subread-index-sam-012480-798846    2092767041
NC_018738.3    .//subread-index-sam-012480-798846    -1
NC_018739.3    .//subread-index-sam-012480-798846    -1
NC_018740.3    .//subread-index-sam-012480-798846    -1
NC_001700.1    .//subread-index-sam-012480-798846    -1
NC_018741.3    .//subread-index-sam-012480-798846    -1
```

There are no errors or warnings from Rsubread. I've tried the NCBI genome, the Ensemble genome, and have concatenated the genome myself from the chromosome files. But I can use a different genome (e.g. I tried Anolis carolinsus) and the .files file was correct.

Has anyone seen this before or can help?

Peter

Yang LIAO

unread,
Mar 17, 2020, 4:29:59 PM3/17/20
to Subread

Thanks for the bug report. The "files" file in a Subread index has been deprecated and its content isn't used in any task. This file used to record the offsets of each contig in the input FASTA file, but we then changed our strategy to create a temporary FASTA file ("subread-index-sam-010708-934819" in your case) and then delete it after the index is built. This makes the offsets in the "files" file useless because the offsets in this file describe byte-locations in the deleted temporary file.


I checked the code and found that the "-1" offsets are caused by a 31-bit offset limit in Windows. This will not cause any problem in other parts in the index, hence read mapping should work well on this index.

Peter Hoyt

unread,
Mar 17, 2020, 5:14:10 PM3/17/20
to Subread
Thanks very much for your reply and for checking whether this affects my downstream analyses. I very much appreciate it.

Peter
Reply all
Reply to author
Forward
0 new messages