closest and fisher returning errors for genome file having no valid entries

283 views
Skip to first unread message

anand...@gmail.com

unread,
Sep 4, 2018, 4:09:19 PM9/4/18
to bedtools-discuss
For both closest and fisher, bedtools is returning errors for genome file having no valid entries

On both a LINUX cluster and my local MacOSX machine, I am getting run errors for

OS and bedtools version info is as follows, respectively:

aksrao@farm:~/PhytozomeV12/GenomeAssembly_PhytozomeV12$ uname -a 
Linux farm.cse.ucdavis.edu 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
aksrao@farm:~/PhytozomeV12/GenomeAssembly_PhytozomeV12$ bedtools --version
bedtools v2.25.0

AnandandDeepa-2:GenomeAssembly_PhytozomeV12 anand$ uname -a
Darwin AnandandDeepa-2.local 15.2.0 Darwin Kernel Version 15.2.0: Fri Nov 13 19:56:56 PST 2015; root:xnu-3248.20.55~2/RELEASE_X86_64 x86_64
AnandandDeepa-2:GenomeAssembly_PhytozomeV12 anand$ bedtools --version
bedtools v2.27.1 # installed just last night

1. closest
with any of the following syntax variants
AnandandDeepa-2:GenomeAssembly_PhytozomeV12 anand$ closestBed -d -a Acoerulea_322_v3_Rep_phmmert_l20.GFF3 -b Acoerulea_322_v3_Vs_Hel_phmmert_l20.GFF3
ERROR: Sort order was unspecified, and file Acoerulea_322_v3_Vs_Hel_phmmert_l20.GFF3 is not sorted lexicographically.
       Please rerun with the -g option for a genome file.
       See documentation for details.

AnandandDeepa-2:GenomeAssembly_PhytozomeV12 anand$ closestBed -d -a Acoerulea_322_v3_Rep_phmmert_l20.GFF3 -b Acoerulea_322_v3_Vs_Hel_phmmert_l20.GFF3 -g Acoerulea_322_v3.fa.shIDscleaned-up 
Error: The genome file Acoerulea_322_v3.fa.shIDscleaned-up has no valid entries. Exiting.

AnandandDeepa-2:GenomeAssembly_PhytozomeV12 anand$ closestBed -d -a Acoerulea_322_v3_Rep_phmmert_l20.GFF3 -b Acoerulea_322_v3_Vs_Hel_phmmert_l20.GFF3 -g Acoerulea_322_v3.fa.shIDs
Error: The genome file Acoerulea_322_v3.fa.shIDs has no valid entries. Exiting.

AnandandDeepa-2:GenomeAssembly_PhytozomeV12 anand$ closestBed -d -a Acoerulea_322_v3_Rep_phmmert_l20.bed -b Acoerulea_322_v3_Vs_Hel_phmmert_l20.bed -g Acoerulea_322_v3.fa.shIDs
Error: The genome file Acoerulea_322_v3.fa.shIDs has no valid entries. Exiting.

2. fisher
same type of error with any of the following syntax variants
AnandandDeepa-2:GenomeAssembly_PhytozomeV12 anand$ bedtools fisher -a Acoerulea_322_v3_Rep_phmmert_l20.bed -b Acoerulea_322_v3_Vs_Hel_phmmert_l20.bed -g Acoerulea_322_v3.fa.shIDs
Error: The genome file Acoerulea_322_v3.fa.shIDs has no valid entries. Exiting.

AnandandDeepa-2:GenomeAssembly_PhytozomeV12 anand$ bedtools fisher -a Acoerulea_322_v3_Rep_phmmert_l20.bed -b Acoerulea_322_v3_Vs_Hel_phmmert_l20.bed -g Acoerulea_322_v3.fa.shIDscleaned-up 
Error: The genome file Acoerulea_322_v3.fa.shIDscleaned-up has no valid entries. Exiting.

AnandandDeepa-2:GenomeAssembly_PhytozomeV12 anand$ bedtools fisher -a Acoerulea_322_v3_Rep_phmmert_l20.GFF3 -b Acoerulea_322_v3_Vs_Hel_phmmert_l20.GFF3 -g Acoerulea_322_v3.fa.shIDs
Error: The genome file Acoerulea_322_v3.fa.shIDs has no valid entries. Exiting.

AnandandDeepa-2:GenomeAssembly_PhytozomeV12 anand$ bedtools fisher -a Acoerulea_322_v3_Rep_phmmert_l20.GFF3 -b Acoerulea_322_v3_Vs_Hel_phmmert_l20.GFF3 -g Acoerulea_322_v3.fa.shIDscleaned-up 
Error: The genome file Acoerulea_322_v3.fa.shIDscleaned-up has no valid entries. Exiting.


What can I do differently? Is there any way I can process the GFF3 or BED or genome multifasta file differently, and/or something to do with the syntax?

GFF3 files were generated in house by parking output of a bioinfromatics tool.
They were converted to BED using gff2bed
AnandandDeepa-2:GenomeAssembly_PhytozomeV12 anand$ gff2bed --version
convert2bed -i gff
  version:  2.4.35
  author:   Alex Reynolds

LInks to those files are
genome (with shortened IDs) = https://ufile.io/ntuuc
genome (with shortened IDs and alternating headers and sequences) = https://ufile.io/ceuv5

Thanks, in advance.

Aaron Quinlan

unread,
Sep 4, 2018, 5:27:17 PM9/4/18
to anand...@gmail.com, bedtools...@googlegroups.com
Hi there,

Could you share the contents of Acoerulea_322_v3.fa.shIDscleaned-up via “head Acoerulea_322_v3.fa.shIDscleaned-up”?  bedtools is asking for a genome file to tell it the order of the chromosomes in your input files.  Genome files are defined here: https://bedtools.readthedocs.io/en/latest/content/general-usage.html?highlight=genome%20files#genome-file-format

Thanks,
Aaron
--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anand K S Rao

unread,
Sep 6, 2018, 8:05:51 AM9/6/18
to Aaron Quinlan, bedtools...@googlegroups.com
Hi Aaron et al.,

SOLVED OLD ERROR
For clarity of viewing my 2 columns in the genome lengths file, I had 3 tab separators between chromosome ID and chromosome length, in my genome file.
When I changed it to just 1 tab separator, things work as expected. Sorry about this mix-up from my end. But thanks for your help.

MEANING OF WARNING?
However, for my closest and fisher runs, that DO run now, I see warnings, as shown below from the STDOUT.
Is there reason to look into and try to fix my input GFF3 files? genometools' gt gff3validator returned with "input is valid gf3" for both these gff3 files.

So I am not sure what these warnings are about. Should I worry or no? Please advice. Thanks!

closestBed -d -a Acoerulea_322_v3_Hel_phmmert_l20_REsorted.gff3 -b Acoerulea_322_v3_Rep_phmmert_l20_REsorted.gff3 -g Acoerulea_322_v3.fa.shIDscleaned-up_IDs_SeqLen

Several output lines
.
.
.
.
***** WARNING: File Acoerulea_322_v3_Rep_phmmert_l20_REsorted.gff3 has inconsistent naming convention for record:
scaffold_8 phmmert_PIF1_PF05970.13 ORF 194959 195993 0 + 0 ID=ORF80_phmmert_PIF1_PF05970.13
.
.
.
More output lines
.
.
.
***** WARNING: File Acoerulea_322_v3_Rep_phmmert_l20_REsorted.gff3 has inconsistent naming convention for record:
scaffold_8 phmmert_PIF1_PF05970.13 ORF 194959 195993 0 + 0 ID=ORF80_phmmert_PIF1_PF05970.13

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

bedtools fisher -a Acoerulea_322_v3_Hel_phmmert_l20_REsorted.gff3 -b Acoerulea_322_v3_Rep_phmmert_l20_REsorted.gff3 -g Acoerulea_322_v3.fa.shIDscleaned-up_IDs_SeqLen
***** WARNING: File Acoerulea_322_v3_Hel_phmmert_l20_REsorted.gff3 has inconsistent naming convention for record:
scaffold_8 phmmert_Helitron_like_N_PF14214.5 ORF 192792 193016 6.4e-08 + 2 ID=ORF254_phmmert_Helitron_like_N_PF14214.5

# Number of query intervals: 375
# Number of db intervals: 777
# Number of overlaps: 25
# Number of possible intervals (estimated): 197997
# phyper(25 - 1, 375, 197997 - 375, 777, lower.tail=F)
# Contingency Table Of Counts
#_________________________________________
#           |  in -b       | not in -b    |
#     in -a | 25           | 350          |
# not in -a | 752          | 196870       |
#_________________________________________
# p-values for fisher's exact test
left right two-tail ratio
1 8.3946e-23 8.3946e-23 18.700
***** WARNING: File Acoerulea_322_v3_Hel_phmmert_l20_REsorted.gff3 has inconsistent naming convention for record:
scaffold_8 phmmert_Helitron_like_N_PF14214.5 ORF 192792 193016 6.4e-08 + 2 ID=ORF254_phmmert_Helitron_like_N_PF14214.5

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Links to the 3 input files are shown below if you wish to replicate the warnings. Thanks a ton!

Best,
Anand

On Tue, Sep 4, 2018 at 8:42 PM Anand K S Rao <anand...@gmail.com> wrote:
Hi Aaron,

Thanks for reminding me that the -genome flag is for lengths, not the actual multifasta file.

That took care of the old errors, but now I get a new error.

AnandandDeepa-2:ForUpload anand$ bedtools intersect -d -a Acoerulea_322_v3_Rep_phmmert_l20_gt_Sort_Tidy_UNcommented.bed -b Acoerulea_322_v3_Hel_phmmert_l20_gt_Sort_Tidy_UNcommented.bed -g Acoerulea_322_v3.fa.shIDscleaned-up_IDs_SeqLen_4_bedtools
***** ERROR: too many digits/characters for integer conversion in string . Exiting...

AnandandDeepa-2:ForUpload anand$ bedtools fisher -a Acoerulea_322_v3_Rep_phmmert_l20_gt_Sort_Tidy_UNcommented.bed -b Acoerulea_322_v3_Hel_phmmert_l20_gt_Sort_Tidy_UNcommented.bed -g Acoerulea_322_v3.fa.shIDscleaned-up_IDs_SeqLen_4_bedtools
***** ERROR: too many digits/characters for integer conversion in string . Exiting...

There are links to the 3 input files here

-a = a.bed = Acoerulea_322_v3_Rep_phmmert_l20_gt_Sort_Tidy_UNcommented.bed


-b = b.bed = Acoerulea_322_v3_Hel_phmmert_l20_gt_Sort_Tidy_UNcommented.bed

-g = genome.sizes = Acoerulea_322_v3.fa.shIDscleaned-up_IDs_SeqLen_4_bedtools

Are the chromosome sizes too long, or is it something to do with formatting of some column(s) in my *.bed files?

Thank you,
Anand

Anand K S Rao

unread,
Sep 6, 2018, 8:05:55 AM9/6/18
to Aaron Quinlan, bedtools...@googlegroups.com
Hi Aaron,

Thanks for reminding me that the -genome flag is for lengths, not the actual multifasta file.

That took care of the old errors, but now I get a new error.

AnandandDeepa-2:ForUpload anand$ bedtools intersect -d -a Acoerulea_322_v3_Rep_phmmert_l20_gt_Sort_Tidy_UNcommented.bed -b Acoerulea_322_v3_Hel_phmmert_l20_gt_Sort_Tidy_UNcommented.bed -g Acoerulea_322_v3.fa.shIDscleaned-up_IDs_SeqLen_4_bedtools
***** ERROR: too many digits/characters for integer conversion in string . Exiting...

AnandandDeepa-2:ForUpload anand$ bedtools fisher -a Acoerulea_322_v3_Rep_phmmert_l20_gt_Sort_Tidy_UNcommented.bed -b Acoerulea_322_v3_Hel_phmmert_l20_gt_Sort_Tidy_UNcommented.bed -g Acoerulea_322_v3.fa.shIDscleaned-up_IDs_SeqLen_4_bedtools
***** ERROR: too many digits/characters for integer conversion in string . Exiting...

There are links to the 3 input files here

-a = a.bed = Acoerulea_322_v3_Rep_phmmert_l20_gt_Sort_Tidy_UNcommented.bed


-b = b.bed = Acoerulea_322_v3_Hel_phmmert_l20_gt_Sort_Tidy_UNcommented.bed

-g = genome.sizes = Acoerulea_322_v3.fa.shIDscleaned-up_IDs_SeqLen_4_bedtools

Are the chromosome sizes too long, or is it something to do with formatting of some column(s) in my *.bed files?

Thank you,
Anand

On Tue, Sep 4, 2018 at 5:27 PM Aaron Quinlan <aaronq...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages