Hi Robert,
The block_size_megabases and the -K option are related in purpose but do different things.
block_size_megabases splits the FASTA reference into chunks of the specified size, in megabases.
-K controls how many bases are processed by bwa, in bases, making it independent of the number of threads.
In your command you specified a value of 200 Gigabases, which is still decently large.
The default -K value is 100 Megabases, which is identical to the default value used by BWA when running with 10 threads.
See:
https://github.com/CCDG/Pipeline-Standardization/issues/2 for additional context about -K
Note that changing -K would also affect the results, in yet another way that is different from splitting the reference via block_size_megabases.
These options essentially split/chunk the data. The impact comes from how they affect the statistics of what is considered the "best hit" for any given sequence.
Hope this helps,
Renato
> --
> You received this message because you are subscribed to the Google Groups "NGLess" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
ngless+un...@googlegroups.com <mailto:
ngless+un...@googlegroups.com>.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/ngless/772ba0b1-1ecc-4cbc-9b28-c492f8e3baa1n%40googlegroups.com <
https://groups.google.com/d/msgid/ngless/772ba0b1-1ecc-4cbc-9b28-c492f8e3baa1n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
Dr. Renato Alves (PhD),
EMBL Bio-IT community & project manager -
https://bio-it.embl.de | Staff Association Representative
EMBL Heidelberg, Germany
ORCID: 0000-0002-7212-0234
Github/Gitlab: @unode | 🐦 @renato_alvs | 🦣 @renato...@mstdn.science