anvi-merge error, but only sometimes: "...the samples you are trying to merge has zero hits"

125 views
Skip to first unread message

Jessica Jarett

unread,
Jun 29, 2017, 4:51:49 PM6/29/17
to Anvi'o
Hi all, I'm using anvi'o to map a moderately large number of samples (~200) to a combined assembly. Everything seems to run smoothly until I try to merge the profiles. On the exact same input data, only one of the following commands works, though it seems to me they should both work. I have used a subset of the samples as a test case, where this command works:

cp -r individual_profiles/profile-BTTG* troubleshooting/
# n=6 profiles: BTTGT, BTTGU, BTTGW, BTTGX, BTTGY, BTTGZ
anvi-merge troubleshooting/profile-*/PROFILE.db -o merge_star -c combined_assembly/combined.db -S merge_star 1>merge_star.log 2>&1 &
# works


Using the full dataset, I try this command, but it doesn't work. Note that the BTTG* samples that worked above are included in the error message here.

anvi-merge individual_profiles/profile-B*/PROFILE.db -o merge_fulldata2 -c combined_assembly/combined.db -S merge_fulldata2 1>merge_fulldata2.log 2>&1 &

cat merge_fulldata2.log

profiler_version .............................: 20
output_dir ...................................: /global/projectb/scratch/jkjarett/Cone_Pool/merge_fulldata2
sample_id ....................................: merge_fulldata2
description ..................................: None
profile_db ...................................: /global/projectb/scratch/jkjarett/Cone_Pool/merge_fulldata2/PROFILE.db
merged .......................................: True
contigs_db_hash ..............................: 5e74c381
num_runs_processed ...........................: 99
merged_sample_ids ............................: BTTGT, BTTGU, BTTGW, BTTGX, BTTGY, BTTGZ, BTTHA, BTTHB, BTTHC, BTTHG, BTTHH, BTTHO, BTTHP, BTTHS, BTTHT, BTTHU, BTTHW, BTTHX, BTTHY, BTTHZ, BTTNA, BTTNB, BTTNC, BTTNG, BTTNH, BTTNN, BTTNO, BTTNP, BTTNS, BTTNT, BTTNU, BTTNW, BTTNX, BTTNY, BTTNZ, BTTOA, BTTON, BTTOO, BTTOP, BTTOS, BTTPZ, BTTST, BTTSW, BTTSY, BTTTG, BTTTN, BTTTU, BTTTZ, BTTUH, BTTUN, BTTUS, BTTUW, BTTUZ, BTTWA, BTTXB, BTTXG, BTTXN, BTTXO, BTTXT, BTTXU, BTTXW, BTTXY, BTTYB, BTTYG, BTTYP, BTTYT, BTTYY, BTTZG, BTTZN, BTTZO, BTTZT, BTTZW, BTTZX, BTTZZ, BTUAC, BTUOS, BTUOT, BTUOU, BTUOW, BTUPB, BTUPG, BTUPH, BTUPN, BTUPO, BTUPP, BTUPW, BTUPZ, BTUSB, BTUSG, BTUSO, BTUSP, BTUSX, BTUSY, BTUTA, BTUTC, BTUTN, BTUTO, BTUTP, BTUTS
total_reads_mapped ...........................: 32216245, 4878761, 5261841, 18346150, 8890945, 22527761, 3010034, 15276990, 9373405, 17841525, 3591214, 7353484, 67354704, 7471467, 8073401, 5242689, 18984229, 9905873, 3484317, 9719885, 16919388, 4963680, 8756556, 17951447, 3906653, 21018276, 5673721, 13937349, 10744767, 8997943, 5459626, 9709282, 4323760, 3897438, 3160474, 13365216, 19350561, 17671150, 17140318, 10676333, 8568174, 7645132, 6417557, 5073629, 8918588, 9085721, 5789320, 27023619, 9454370, 11017736, 5319486, 6772670, 5670726, 10481079, 9691397, 18660285, 15883869, 9333428, 8052697, 15388632, 16791879, 6235694, 6361056, 7200435, 22293846, 5103321, 10532170, 0, 23188511, 5990172, 7713927, 13560733, 8051598, 4802347, 6447218, 9303584, 16500668, 14063453, 21951553, 21496921, 18053505, 11536498, 13594281, 10309904, 9743916, 12063318, 8970746, 13974895, 8615827, 9454530, 14147180, 11207512, 9433531, 11425034, 13499633, 7696531, 7310166, 9315273, 7324643
cmd_line .....................................: /global/homes/j/jkjarett/miniconda2/envs/py3/bin/anvi-merge individual_profiles/profile-BTTGT/PROFILE.db individual_profiles/profile-BTTGU/PROFILE.db individual_profiles/profile-BTTGW/PROFILE.db individual_profiles/profile-BTTGX/PROFILE.db individual_profiles/profile-BTTGY/PROFILE.db individual_profiles/profile-BTTGZ/PROFILE.db individual_profiles/profile-BTTHA/PROFILE.db individual_profiles/profile-BTTHB/PROFILE.db individual_profiles/profile-BTTHC/PROFILE.db individual_profiles/profile-BTTHG/PROFILE.db individual_profiles/profile-BTTHH/PROFILE.db individual_profiles/profile-BTTHO/PROFILE.db individual_profiles/profile-BTTHP/PROFILE.db individual_profiles/profile-BTTHS/PROFILE.db individual_profiles/profile-BTTHT/PROFILE.db individual_profiles/profile-BTTHU/PROFILE.db individual_profiles/profile-BTTHW/PROFILE.db individual_profiles/profile-BTTHX/PROFILE.db individual_profiles/profile-BTTHY/PROFILE.db individual_profiles/profile-BTTHZ/PROFILE.db individual_profiles/profile-BTTNA/PROFILE.db individual_profiles/profile-BTTNB/PROFILE.db individual_profiles/profile-BTTNC/PROFILE.db individual_profiles/profile-BTTNG/PROFILE.db individual_profiles/profile-BTTNH/PROFILE.db individual_profiles/profile-BTTNN/PROFILE.db individual_profiles/profile-BTTNO/PROFILE.db individual_profiles/profile-BTTNP/PROFILE.db individual_profiles/profile-BTTNS/PROFILE.db individual_profiles/profile-BTTNT/PROFILE.db individual_profiles/profile-BTTNU/PROFILE.db individual_profiles/profile-BTTNW/PROFILE.db individual_profiles/profile-BTTNX/PROFILE.db individual_profiles/profile-BTTNY/PROFILE.db individual_profiles/profile-BTTNZ/PROFILE.db individual_profiles/profile-BTTOA/PROFILE.db individual_profiles/profile-BTTON/PROFILE.db individual_profiles/profile-BTTOO/PROFILE.db individual_profiles/profile-BTTOP/PROFILE.db individual_profiles/profile-BTTOS/PROFILE.db individual_profiles/profile-BTTPZ/PROFILE.db individual_profiles/profile-BTTST/PROFILE.db individual_profiles/profile-BTTSW/PROFILE.db individual_profiles/profile-BTTSY/PROFILE.db individual_profiles/profile-BTTTG/PROFILE.db individual_profiles/profile-BTTTN/PROFILE.db individual_profiles/profile-BTTTU/PROFILE.db individual_profiles/profile-BTTTZ/PROFILE.db individual_profiles/profile-BTTUH/PROFILE.db individual_profiles/profile-BTTUN/PROFILE.db individual_profiles/profile-BTTUS/PROFILE.db individual_profiles/profile-BTTUW/PROFILE.db individual_profiles/profile-BTTUZ/PROFILE.db individual_profiles/profile-BTTWA/PROFILE.db individual_profiles/profile-BTTXB/PROFILE.db individual_profiles/profile-BTTXG/PROFILE.db individual_profiles/profile-BTTXN/PROFILE.db individual_profiles/profile-BTTXO/PROFILE.db individual_profiles/profile-BTTXT/PROFILE.db individual_profiles/profile-BTTXU/PROFILE.db individual_profiles/profile-BTTXW/PROFILE.db individual_profiles/profile-BTTXY/PROFILE.db individual_profiles/profile-BTTYB/PROFILE.db individual_profiles/profile-BTTYG/PROFILE.db individual_profiles/profile-BTTYP/PROFILE.db individual_profiles/profile-BTTYT/PROFILE.db individual_profiles/profile-BTTYY/PROFILE.db individual_profiles/profile-BTTZG/PROFILE.db individual_profiles/profile-BTTZN/PROFILE.db individual_profiles/profile-BTTZO/PROFILE.db individual_profiles/profile-BTTZT/PROFILE.db individual_profiles/profile-BTTZW/PROFILE.db individual_profiles/profile-BTTZX/PROFILE.db individual_profiles/profile-BTTZZ/PROFILE.db individual_profiles/profile-BTUAC/PROFILE.db individual_profiles/profile-BTUOS/PROFILE.db individual_profiles/profile-BTUOT/PROFILE.db individual_profiles/profile-BTUOU/PROFILE.db individual_profiles/profile-BTUOW/PROFILE.db individual_profiles/profile-BTUPB/PROFILE.db individual_profiles/profile-BTUPG/PROFILE.db individual_profiles/profile-BTUPH/PROFILE.db individual_profiles/profile-BTUPN/PROFILE.db individual_profiles/profile-BTUPO/PROFILE.db individual_profiles/profile-BTUPP/PROFILE.db individual_profiles/profile-BTUPW/PROFILE.db individual_profiles/profile-BTUPZ/PROFILE.db individual_profiles/profile-BTUSB/PROFILE.db individual_profiles/profile-BTUSG/PROFILE.db individual_profiles/profile-BTUSO/PROFILE.db individual_profiles/profile-BTUSP/PROFILE.db individual_profiles/profile-BTUSX/PROFILE.db individual_profiles/profile-BTUSY/PROFILE.db individual_profiles/profile-BTUTA/PROFILE.db individual_profiles/profile-BTUTC/PROFILE.db individual_profiles/profile-BTUTN/PROFILE.db individual_profiles/profile-BTUTO/PROFILE.db individual_profiles/profile-BTUTP/PROFILE.db individual_profiles/profile-BTUTS/PROFILE.db -o merge_fulldata2 -c combined_assembly/combined.db -S merge_fulldata2
clustering_performed .........................: True


Config Error: It seems at least one of the samples you are trying to merge has zero hits. Here
              is a list of all samples and number of mapped reads they have: "BTUPB":         
              21,496,921, "BTTYB": 6,361,056, "BTTGZ": 22,527,761, "BTUTO": 7,310,166,        
              "BTTOS": 10,676,333, "BTTGT": 32,216,245, "BTUSG": 8,615,827, "BTTSY":          
              5,073,629, "BTTNP": 13,937,349, "BTTZT": 7,713,927, "BTTTU": 5,789,320, "BTTHO":
              7,353,484, "BTTXN": 15,883,869, "BTUPN": 13,594,281, "BTTXB": 9,691,397,        
              "BTTOP": 17,140,318, "BTTOA": 13,365,216, "BTTUN": 11,017,736, "BTUSB":         
              13,974,895, "BTTOO": 17,671,150, "BTTUS": 5,319,486, "BTTON": 19,350,561,       
              "BTTWA": 10,481,079, "BTUSP": 14,147,180, "BTUSX": 11,207,512, "BTTNT":         
              8,997,943, "BTTZO": 5,990,172, "BTTZX": 8,051,598, "BTUAC": 6,447,218, "BTTTN": 
              9,085,721, "BTUPP": 9,743,916, "BTTHC": 9,373,405, "BTTNH": 3,906,653, "BTTNO": 
              5,673,721, "BTTHZ": 9,719,885, "BTTHP": 67,354,704, "BTUPZ": 8,970,746, "BTTZN":
              23,188,511, "BTTNU": 5,459,626, "BTTZG": 0, "BTUTP": 9,315,273, "BTTNS":        
              10,744,767, "BTTXT": 8,052,697, "BTTXY": 6,235,694, "BTTHA": 3,010,034, "BTTHH":
              3,591,214, "BTUTS": 7,324,643, "BTTPZ": 8,568,174, "BTTUH": 9,454,370, "BTTNX": 
              4,323,760, "BTTTZ": 27,023,619, "BTUTC": 13,499,633, "BTTZW": 13,560,733,       
              "BTTGU": 4,878,761, "BTTZZ": 4,802,347, "BTTYG": 7,200,435, "BTTHB": 15,276,990,
              "BTUSY": 9,433,531, "BTTXW": 16,791,879, "BTTYY": 10,532,170, "BTTHG":          
              17,841,525, "BTTXG": 18,660,285, "BTUPO": 10,309,904, "BTTNY": 3,897,438,       
              "BTTHS": 7,471,467, "BTTGY": 8,890,945, "BTTNC": 8,756,556, "BTTGX": 18,346,150,
              "BTTHX": 9,905,873, "BTUOU": 14,063,453, "BTTUW": 6,772,670, "BTTHU": 5,242,689,
              "BTUTA": 11,425,034, "BTTXO": 9,333,428, "BTTUZ": 5,670,726, "BTTGW": 5,261,841,
              "BTUPH": 11,536,498, "BTTTG": 8,918,588, "BTTHY": 3,484,317, "BTTSW": 6,417,557,
              "BTTNZ": 3,160,474, "BTTHW": 18,984,229, "BTTNG": 17,951,447, "BTUOT":          
              16,500,668, "BTUPW": 12,063,318, "BTUTN": 7,696,531, "BTTYP": 22,293,846,       
              "BTTHT": 8,073,401, "BTUOS": 9,303,584, "BTTNN": 21,018,276, "BTUOW":           
              21,951,553, "BTTNW": 9,709,282, "BTTXU": 15,388,632, "BTUSO": 9,454,530,        
              "BTTST": 7,645,132, "BTTYT": 5,103,321, "BTTNB": 4,963,680, "BTUPG": 18,053,505,
              "BTTNA": 16,919,388.  



I don't really know what "hits" means in this context. All of the samples have reads mapped to the combined assembly (i.e., .bam and .bai files are >0 size). I can share the data as needed, just let me know. How can I get this merge command to work for all my samples? Thanks!

Jessica Jarett

unread,
Jun 29, 2017, 4:58:12 PM6/29/17
to Anvi'o
Meant to put the version too, here is that info:

anvi-profile --version
Anvi'o version ...............................: 2.3.2
Profile DB version ...........................: 20
Contigs DB version ...........................: 8
Pan DB version ...............................: 5
Samples information DB version ...............: 2
Genome data storage version ..................: 1
Auxiliary data storage version ...............: 3
Anvi'server users data storage version .......: 1

python --version
Python 3.5.2 :: Continuum Analytics, Inc.

Michael Lee

unread,
Jun 29, 2017, 5:59:12 PM6/29/17
to an...@googlegroups.com
Hey there, Jessica,

It looks like one of your samples didn't have any reads recruited, so I do think 'hits' in this case may mean that. The bam files might still have sizes even if there were no mapped reads. 

I'm not sure which sample it is, but in the anvio output it lists the number of reads and one has 0. If you find which that is and remove it things might work!

-mike 
--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/2a264132-4393-4bbe-b2a7-51a3d3a9ff9c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike Lee

unread,
Jun 29, 2017, 6:04:25 PM6/29/17
to Anvi'o
Ah, actually it's sample BTTZG
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.

A. Murat Eren

unread,
Jun 29, 2017, 6:40:17 PM6/29/17
to an...@googlegroups.com
Mike is correct. Anvi'o does not merge profiles if they don't contain at least one read :/ This was a control to make sure people don't try to merge failed profiling attempts.

We are hoping to change this behavior going forward, but as of today, if a profile is empty, it shouldn't be merged with others.


Best,

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

Jessica Jarett

unread,
Jun 29, 2017, 7:14:23 PM6/29/17
to Anvi'o
You're right, that one slipped by. Been looking at these for a while, guess I'm going cross-eyed! It's a good safeguard, I was just thrown off by seeing ALL the samples listed there, instead of just the offending one. Thanks!
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

--
Anvi'o Paper: https://peerj.com/articles/1319/
Project Page: http://merenlab.org/projects/anvio/
Code Repository: https://github.com/meren/anvio
---
You received this message because you are subscribed to the Google Groups "Anvi'o" group.
To unsubscribe from this group and stop receiving emails from it, send an email to anvio+un...@googlegroups.com.

Jessica Jarett

unread,
Jun 29, 2017, 7:42:31 PM6/29/17
to Anvi'o
Ok, one more issue. After removing the sample that didn't map, I get this error when merging, and the resulting profile doesn't load in interactive mode. I saw someone with a similar issue and the recommendation was to re-profile everything, but that is rather painful with 200 samples. Any other suggestions before I try that? :(


Traceback (most recent call last):
  File "/global/homes/j/jkjarett/miniconda2/envs/py3/bin/anvi-merge", line 50, in <module>
    merger.MultipleRuns(args).merge()
  File "/global/homes/j/jkjarett/miniconda2/envs/py3/lib/python3.5/site-packages/anvio/merger.py", line 367, in merge
    self.merge_split_coverage_data()
  File "/global/homes/j/jkjarett/miniconda2/envs/py3/lib/python3.5/site-packages/anvio/merger.py", line 247, in merge_split_coverage_data
    coverages_dict = sample_split_coverage_values.get(split_name)
  File "/global/homes/j/jkjarett/miniconda2/envs/py3/lib/python3.5/site-packages/anvio/auxiliarydataops.py", line 120, in get
    self.is_known_split(split_name)
  File "/global/homes/j/jkjarett/miniconda2/envs/py3/lib/python3.5/site-packages/anvio/auxiliarydataops.py", line 112, in is_known_split
    raise HDF5Error('The database at "%s" does not know anything about "%s" :(' % (self.file_path, split_name))
anvio.errors.HDF5Error: 

HDF5 Error: The database at "bad-profiles/profiles/profile-BTUPP/AUXILIARY-DATA.h5" does not
            know anything about "c_000000000201_split_00001" :(  

A. Murat Eren

unread,
Jun 29, 2017, 7:47:24 PM6/29/17
to an...@googlegroups.com
​This is not good news. Can you send back the output of this command please:

    sqlite bad-profiles/profiles/profile-BTUPP/PROFILE.db 'select * from atomic_data_splits limit 10;'

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/917c1e1c-2837-4cac-93b7-7c0c55f6dcee%40googlegroups.com.

Jessica Jarett

unread,
Jun 29, 2017, 8:05:10 PM6/29/17
to Anvi'o
Is sqlite supposed to be included with anvio? I get "command not found" with that one.

Jessica Jarett

unread,
Jun 29, 2017, 8:06:24 PM6/29/17
to Anvi'o
Got it, it just needs sqlite3 instead:

sqlite3 bad-profiles/profiles/profile-BTUPP/PROFILE.db 'select * from atomic_data_splits limit 10;'
c_000000000175_split_00001|0|0|0|1|1|0|0|0|c_000000000175
c_000000000374_split_00001|0|0|0|1|1|0|0|0|c_000000000374
c_000000000126_split_00001|0|0|0|1|1|0|0|0|c_000000000126
c_000000000095_split_00003|0|0|0|1|1|0|0|0|c_000000000095
c_000000000235_split_00001|0|0|0|1|1|0|0|0|c_000000000235
c_000000000202_split_00002|0|0|0|1|1|0|0|0|c_000000000202
c_000000000044_split_00003|0|0|0|1|1|0|0|0|c_000000000044
c_000000000084_split_00001|0|0|0|1|1|0|0|0|c_000000000084
c_000000000018_split_00002|0|0|0|1|1|0|0|0|c_000000000018
c_000000000431_split_00001|0|0|0|1|1|0|0|0|c_000000000431

A. Murat Eren

unread,
Jun 29, 2017, 8:26:21 PM6/29/17
to an...@googlegroups.com
This is weird, Jessica. Something bad might have happened during the profiling of this sample.

Can you send the resulting file please:

python -c "import anvio.auxiliarydataops as x; j = x.AuxiliaryDataForSplitCoverages('bad-profiles/profiles/profile-BTUPP/AUXILIARY-DATA.h5', None, ignore_hash=True); print('\n'.join(j.split_names_in_db))" | gzip > SPLITS_IN_AUXILIARY.txt.gz


​I know it is long, and I will be surprised if it works, too :)​


--

A. Murat Eren (meren)
http://merenlab.org :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/e1f59e51-5fd6-4775-9f46-68c7f6f9bf40%40googlegroups.com.

Jessica Jarett

unread,
Jun 29, 2017, 8:30:05 PM6/29/17
to Anvi'o
:-/
I can definitely try re-profiling this one, I'll start it now.

python -c "import anvio.auxiliarydataops as x; j = x.AuxiliaryDataForSplitCoverages('bad-profiles/profiles/profile-BTUPP/AUXILIARY-DATA.h5', None, ignore_hash=True); print('\n'.join(j.split_names_in_db))" | gzip > SPLITS_IN_AUXILIARY.txt.gz
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: 'AuxiliaryDataForSplitCoverages' object has no attribute 'split_names_in_db'

A. Murat Eren

unread,
Jun 29, 2017, 9:00:05 PM6/29/17
to an...@googlegroups.com
Oh, I apologize. The command I sent didn't work because I am using a more recent version of anvi'o than v2.3.2 :/

Please try to merge without this sample first to see this is not a common error for every other sample. If that is the case, I would need to have access to one of your profile directories, and your contigs database to debug this. But this is definitely not an error we are familiar with.

--

A. Murat Eren (meren)
http://merenlab.org :: gpg

To unsubscribe from this group and stop receiving emails from it, send an email to anvio+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/anvio/c2b1cd09-3a45-4444-8ff7-d4a81c4cd85e%40googlegroups.com.

Jessica Jarett

unread,
Jun 30, 2017, 4:29:31 PM6/30/17
to Anvi'o
Great, it's just that one. I still need to re-merge with the re-profiled version of that sample, but the rest of them work. Thank you! 
Reply all
Reply to author
Forward
0 new messages