Hi Neil,
Thank you for your prompt feedback.
- The two (randomly chosen for testing purposes) fasta files I used are public and are these two (two locations each so that people can choose the fastest location for them; they are regularly updated but the same versions I used for my original email will be still there for a couple of months or so):
https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot_varsplic.fasta.gzhttps://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot_varsplic.fasta.gzhttps://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gzhttps://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gzSciTE (as well as other programs) can open/view/edit these two file without issues as far as my current experience is concerned and I didn't stumble in any NUL values in these two files over the many years I've been using them with SciTE and other programs.
Unfortunately I won't be able to count the NUL values any time soon (despite you helpful and clear details) so I prefer to give full details of what I did hoping that someone with experience could give advise.
- In order to create a test file larger than 2,000,000,000 bytes (so to be able to test for the first time SciTE with large files) what I did is pasted together these two fasta files on the Windows command prompt with:
copy /b uniprot_sprot_varsplic.fasta+uniprot_sprot_varsplic.fasta UP.fasta
This UP.fasta file can be opened/viewed/edited by SciTE and other programs without issues and I didn't stumble in any NUL values in this file so I guess the "copy /b" command didn't do any harm in that sense (I've been doing this type of merging of this type of files for years and years and never had any sort of issue whatsoever with the upstream use I made of them).
Next I created eight copies of the same UP.fasta and I merged them together again with "copy /b" to have the 2,510,747,416 bytes test file I mentioned in my original email.
- The RAM of the machine I used is 32 GB
- Some of the fasta headers (a fasta header is what comes after the ">" character up to the first line break) can be long; quite a lot more than the 60-character line break in the sequence area (the sequence area is the text in between two ">" characters and excluding the fasta headers lines)
With this info it should be possible to try reproducing my issue (Windows and SciTE for Windows details in my original email).
Hope this additional info helps in trying to figure out if there's space for improvement in my usage of SciTE with these type of large files.
Thanks again
Best Regards
Emanuele