Discrepency between 3D emperor plot and make_2D_plot.py on jackknifing ellipsoid

jincheng.wang1986

unread,

May 20, 2015, 4:37:27 PM5/20/15

to qiime...@googlegroups.com

Dear Guys,

I was using jackknifed_beta_diversity.py trying to get uncertainty estimates of my beta diversity and I got the emperor plot with good results.
jackknifed_beta_diversity.py -i sorted_otu_table_smd.biom -t treeGTRGAMMA.tre -m AF_Mapping_All.txt -o JACK/ -e 18000
The problem is when I tried to produce the 2D plots as well. No matter how I change the ellipsoid settings, I could only see tiny or none ellipsoid on my plots. I am not sure why this happens.
make_2d_plots.py -i JACK/unweighted_unifrac/pcoa -m AF_Mapping_All.txt -b 'Treatment' --ellipsoid_opacity=0.5 --ellipsoid_method=IQR -o TEST/
Please take a look at attached files.
Thank you very much!

Jincheng

3D.svg

treeGTRGAMMA.tre

PC1_vs_PC3_plot.png

PC1vsPC2plot.eps.gz

PC1vsPC3plot.eps.gz

PC3_vs_PC2_plot.png

PC3vsPC2plot.eps.gz

PC1_vs_PC2_plot.png

AF_Mapping_All.txt

sorted_otu_table_smd.biom

Jai Ram Rideout

unread,

May 20, 2015, 7:30:22 PM5/20/15

to qiime...@googlegroups.com

Hi Jincheng,

I'm able to reproduce the issue in QIIME 1.8.0 and 1.9.0. I'm not sure why the ellipsoids are so much smaller in the 2D plots vs. Emperor. I've contacted the Emperor developers to see if they have any ideas about this discrepancy.

Best,

Jai

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jincheng Wang

unread,

May 21, 2015, 10:58:46 AM5/21/15

to qiime...@googlegroups.com

Hi Jai,

Thank you for your response. I tried several methods to walk around this problem but not being able to. The transform_coordinates_matrices.py in QIIME seems not working right. I always got errors, not sure why.

So I decide to go simple and seems not to work either. I was wondering if in the 3d interactive plot that emperor produced, we could edit the axis. It seems the function in that tab is disabled for some reason. Do you know if we could enable it? What my thoughts are just kill one axis in the 3d plot.

Thanks!

Jincheng

Jincheng Wang

unread,

May 21, 2015, 11:00:54 AM5/21/15

to qiime...@googlegroups.com

Or is there a way we could change the camera angle in the emperor plot?

On Wed, May 20, 2015 at 7:30 PM, Jai Ram Rideout <jai.r...@gmail.com> wrote:

Message has been deleted

Jincheng Wang

unread,

May 21, 2015, 9:53:52 PM5/21/15

to qiime...@googlegroups.com

Dear Jai,

Thank you very much!
Regarding to the transform_coordinate_matrices.py, firstly it cannot be used for my jackknifing PC files. Always got following error message:
Error in transform_coordinate_matrices.py: No filepaths match pattern/name ''. All patterns must be matched at least once.

The scripts I used:

transform_coordinate_matrices.py -i pcoa/pcoa_unweighted_unifrac_rarefaction_18000_0.txt, pcoa/pcoa_unweighted_unifrac_rarefaction_18000_1.txt -o test2/

I don't know why, the filepaths should be right because I used tab-completion. I then thought it might be because the format of the file which is different from the PC file created use the beta_diversity scripts. Consequently, I changed all the formats to be identical to beta_diversity.py output file, still no luck, same error messages.

I then tried to apply the scripts in the documentation:

transform_coordinate_matrices.py -i unweighted_unifrac_pc.txt,weighted_unifrac_pc.txt -o proscrustes_output/
and got another error messages:

Traceback (most recent call last):
File "/usr/local/bin/transform_coordinate_matrices.py", line 187, in <module>
    main()
File "/usr/local/bin/transform_coordinate_matrices.py", line 149, in main
    max_dimensions=num_dimensions)
File "/usr/local/lib/python2.7/dist-packages/qiime/transform_coordinate_matrices.py", line 134, in get_procrustes_results
    ord_res_1 = OrdinationResults.read(coords_f1)
File "/usr/local/lib/python2.7/dist-packages/skbio/io/_registry.py", line 700, in read
    return skbio_io_read(fp, into=cls, format=format, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/skbio/io/_registry.py", line 601, in read
    format, fmt_kwargs = sniff(fp, cls=into, mode=mode)
File "/usr/local/lib/python2.7/dist-packages/skbio/io/_registry.py", line 531, in sniff
    % str(fp))
skbio.io._exception.UnrecognizedFormatError: Cannot guess the format for <open file 'unweighted_unifrac_pc.txt', mode 'U' at 0x60aed20>.

This is where I decided to give up.

Thanks!

Jincheng

On Thu, May 21, 2015 at 8:23 PM, Jai Ram Rideout <jai.r...@gmail.com> wrote:

Hello,

We're having an internal discussion among the QIIME developers about this discrepancy -- someone will follow up in the next couple of days.

Can you please provide more details about the issues you're having with transform_coordinate_matrices.py?

I don't think there is a way to edit the axes in Emperor when viewing jackknifed plots. I believe there are plans to add 2D support to Emperor in the future. There's an open feature request here; you might consider commenting on that to indicate that you'd find this a useful feature.

Best,
Jai

Jai Ram Rideout

unread,

May 21, 2015, 8:24:00 PM5/21/15

to qiime...@googlegroups.com

Hello,

We're having an internal discussion among the QIIME developers about this discrepancy -- someone will follow up in the next couple of days.

Can you please provide more details about the issues you're having with transform_coordinate_matrices.py?

I don't think there is a way to edit the axes in Emperor when viewing jackknifed plots. I believe there are plans to add 2D support to Emperor in the future. There's an open feature request here; you might consider commenting on that to indicate that you'd find this a useful feature.

Best,

Jai

On Thu, May 21, 2015 at 8:00 AM, Jincheng Wang <jincheng...@gmail.com> wrote:

Antonio González Peña

unread,

May 22, 2015, 11:06:59 AM5/22/15

to Qiime Forum

Hi Jincheng,

First of all thank you for reporting this.

Now, the algorithm to create the jackknifed plots is:
# Create the reference PCoA
1. Compute distance matrix of the raw OTU table (without subsampling)
2. Compute PCoA on the previous distance matrix - this becomes the
"reference PCoA"
# Create the iterative PCoAs
3. Sub-sample the OTU table to depth D, N times
4. Compute individual distance matrix of N OTU tables
5. Compute PCoA of N distance matrices
6. (optional - this is the difference, more below) Transform the N
PCoA agains the reference
# Plot
7. Calculate the variance between reference and the N PCoAs
8. Plot the variance ellipsoids around the reference centroids

As stated in the previous list the difference between what you are
seeing in make_2d_plots.py and make_emperor.py is step 6. This step is
intended to "fix" the PCoA artifacts that _could_ result in different
orientation of the same samples. Remember that in PCoA the direction
of the axis is not formally defined (as it is a simple reflection) and
it could change based on floating point values, which could be the
result of hardware/OS specific configurations. This means, the same
distance matrix could produce the same PCoA but rotated if ran in
different hardware/OS.

The currently "accepted" methods to solve the artifacts are: (1) do
nothing (make_emperor.py) and assume that the samples are different
enough and you are doing enough iterations to mitigate the _possible_
PCoA artifacts, (2) check the signs of the reference (directions) and
fix them in the iterations (make sure they are in the same
orientation), and (3) use procrustes (make_2d_plots.py) to fix not
only orientation but also the shape of the iterative PCoA. As you can
imagine, all of them have pros and cons and you should decide which
one is the best.

Finally, in the current implementations we only have one of the
previous list of transformations so the plan is to add all of them
into make_emperor.py (see issue
https://github.com/biocore/emperor/issues/402) and as Jai said we are
also planning in adding 2D plotting into make_emperor.py.

Hope this clarifies your question.

Antonio

Jai Ram Rideout

unread,

May 22, 2015, 11:34:53 AM5/22/15

to qiime...@googlegroups.com

Thanks Antonio for following up with details about the discrepancy between make_2d_plots.py and make_emperor.py!

Jincheng, to answer your latest questions about transform_coordinate_matrices.py:

Your first command has a space between the two input files. The filepaths should be separated by a comma, without a space:

-i pcoa/pcoa_unweighted_unifrac_rarefaction_18000_0.txt,pcoa/pcoa_unweighted_unifrac_rarefaction_18000_1.txt

Your second command indicates that your input files aren't in the expected format. Can you please send me unweighted_unifrac_pc.txt so I can take a look?

Please also send the output of running print_qiime_config.py.

Thanks,

Jai

Jincheng Wang

unread,

May 22, 2015, 2:35:14 PM5/22/15

to qiime...@googlegroups.com

Thanks Jai.

I thought about the space between sample names, but even if I delete the space, I still got the same error.

I also attached unweighted_unifrac_pc.txt with this email and the output of config in below.

Again please ignore me if this took too much time to figure it out. I have found a walk-around as I mentioned in previuos email.

Thank you all again for help.

Jincheng

System information
==================
         Platform:    linux2
   Python version:    2.7.3 (default, Dec 18 2014, 19:10:20) [GCC 4.6.3]
Python executable:    /usr/bin/python

QIIME default reference information
===================================
For details on what files are used as QIIME's default references, see here:
https://github.com/biocore/qiime-default-reference/releases/tag/0.1.1

Dependency versions
===================
          QIIME library version:    1.9.0
           QIIME script version:    1.9.0
qiime-default-reference version:    0.1.1
                  NumPy version:    1.9.1
                  SciPy version:    0.15.1
                 pandas version:    0.15.2
             matplotlib version:    1.4.3
            biom-format version:    2.1.3
                   h5py version:    2.4.0 (HDF5 version: 1.8.4)
                   qcli version:    0.1.1
                   pyqi version:    0.3.2
             scikit-bio version:    0.2.3
                 PyNAST version:    1.2.2
                Emperor version:    0.9.51-dev
                burrito version:    0.9.0
       burrito-fillings version:    Installed.
              sortmerna version:    SortMeRNA version 2.0, 29/11/2014
              sumaclust version:    SUMACLUST Version 1.0.00
                  swarm version:    Swarm 1.2.19 [Feb 25 2015 16:12:06]
                          gdata:    Installed.

QIIME config values
===================
For definitions of these settings and to learn how to configure QIIME, see here:
http://qiime.org/install/qiime_config.html
http://qiime.org/tutorials/parallel_qiime.html

                     blastmat_dir:    /qiime_software/blast-2.2.22-release/data
      pick_otus_reference_seqs_fp:    /usr/local/lib/python2.7/dist-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
                         sc_queue:    all.q
      topiaryexplorer_project_dir:    None
     pynast_template_alignment_fp:    /usr/local/lib/python2.7/dist-packages/qiime_default_reference/gg_13_8_otus/rep_set_aligned/85_otus.fasta
                  cluster_jobs_fp:    start_parallel_jobs.py
pynast_template_alignment_blastdb:    None
assign_taxonomy_reference_seqs_fp:    /usr/local/lib/python2.7/dist-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
                     torque_queue:    friendlyq
                    jobs_to_start:    1
            denoiser_min_per_core:    50
assign_taxonomy_id_to_taxonomy_fp:    /usr/local/lib/python2.7/dist-packages/qiime_default_reference/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt
                         temp_dir:    /tmp/
                     slurm_memory:    None
                      slurm_queue:    None
                      blastall_fp:    /qiime_software/blast-2.2.22-release/bin/blastall
                 seconds_to_sleep:    1

unweighted_unifrac_pc.txt

Jincheng Wang

unread,

May 22, 2015, 2:39:12 PM5/22/15

to qiime...@googlegroups.com

Ok the below reply seems not to go through because of too big of my attachment. I paste them below again:

Hi Antonio,

Thank you very much for your clarification.

I understood the general process, but I have confusions on Step 2, how to computer PCoA coordinates. From a rough thinking, actually the coordinates of at least N+1 points are arbitrary selected, where N is the dimension of the space (total number of principle components) that we assume our sample points to be in. By saying that, if we have m samples, we could at most get m-1 principle components. I assume the coordinates of my samples from this step should be the reference PCoA and it should be what recorded in the unweighted_unifrac_pc.txt from beta_diversity.py. However, in the attached file there are 12 vectors (should have only 11) and with the 12th vector suspiciously close to zero.

But when doing the jackknife beta diversity analysis in QIIME, it seems it did not use this concept. It does not use a reference PCoA and from somewhere I saw, it seems it randomly selected one from the subsamples as reference, which might be problematic. Because from my perspective, I will use the raw OTU table to get a reference PCoA. Here even I said raw OTU, this OTU is actually subsampled from the original OTU table to assure the same sample depth of all samples. Then I will do a subsampling of the raw OTU at depth of 75% of the OTU I used for reference multiple times per sample as a measure of uncertainty.

But I find it is more complicated that even assume we do as what QIIME is doing (randomly select one PCoA as reference), the reference PCoA selected in the jackknife beta diversity analysis did not match any of the PCoA of my jackknifed subsamples. I found this when I open the emperor plots of my jackknife beta diversity analysis and use my browser's source code function and found the coordinates used to create the plot. I found that set of coordinates did not match any of my subsamples. So I am a little confused.

Also in step 6, you mentioned calculating variance. But how do you define variance. my original thinking is that as previously mentioned, since PCoA from a distance matrix should assume coordinates for many data points, PCoA for different samples are likely not in the same space (coordinate system), and to make them comparable, we have to transform them according to one coordinate system (reference). When making this transformation, one has to assume the data points from the same sample should be close. I guess this is what you mean by fix the PCoA artifacts. After we got all subsamples in the same PCoA coordinate systems, we could get a variation of one sample. Is this what QIIME is doing?

Regarding to make_2D, I still not sure why it is so different, I thought it is just that we create 3D plot and make projections on each 2D space. But please ignore me if this is too complicate to explain. And this is exactly what I do as a walk-around of the current discrepancy. I took the coordinates from the emperor webpage as I mentioned previously, and plot PC1vsPC2 use x, y values and use width and heights value to plot the ellipsoid. I am doing Ok. See attached file Inline image 1

unweighted_unifrac_pc.txt

Justin Kuczynski

unread,

May 22, 2015, 4:36:38 PM5/22/15

to qiime...@googlegroups.com

Hi folks, good to see the deep investigation of PCoA plots and ellipsoids. Let me see if I can comment on a few of these.

On Fri, May 22, 2015 at 11:39 AM, Jincheng Wang <jincheng...@gmail.com> wrote:

Ok the below reply seems not to go through because of too big of my attachment. I paste them below again:

Hi Antonio,

Thank you very much for your clarification.

I understood the general process, but I have confusions on Step 2, how to computer PCoA coordinates. From a rough thinking, actually the coordinates of at least N+1 points are arbitrary selected, where N is the dimension of the space (total number of principle components) that we assume our sample points to be in. By saying that, if we have m samples, we could at most get m-1 principle components. I assume the coordinates of my samples from this step should be the reference PCoA and it should be what recorded in the unweighted_unifrac_pc.txt from beta_diversity.py. However, in the attached file there are 12 vectors (should have only 11) and with the 12th vector suspiciously close to zero.

Right, the 12th dimension should contain only rounding / floating point arithmetic errors.

But when doing the jackknife beta diversity analysis in QIIME, it seems it did not use this concept. It does not use a reference PCoA and from somewhere I saw, it seems it randomly selected one from the subsamples as reference, which might be problematic. Because from my perspective, I will use the raw OTU table to get a reference PCoA. Here even I said raw OTU, this OTU is actually subsampled from the original OTU table to assure the same sample depth of all samples. Then I will do a subsampling of the raw OTU at depth of 75% of the OTU I used for reference multiple times per sample as a measure of uncertainty.

there is a way to use a reference PCoA I'm fairly certain, see the following from make_emperor.py -h :

"""

...

Jackknifed PCoA plot with a master coordinates file: Passing a master coordinates file (--master_pcoa) will display the ellipsoids centered by the samples in this file:

make_emperor.py -i unweighted_unifrac_pc -s unweighted_unifrac_pc/pcoa_unweighted_unifrac_rarefaction_110_5.txt -m Fasting_Map.txt -o jackknifed_with_master

...

"""

But I find it is more complicated that even assume we do as what QIIME is doing (randomly select one PCoA as reference), the reference PCoA selected in the jackknife beta diversity analysis did not match any of the PCoA of my jackknifed subsamples. I found this when I open the emperor plots of my jackknife beta diversity analysis and use my browser's source code function and found the coordinates used to create the plot. I found that set of coordinates did not match any of my subsamples. So I am a little confused.

not sure what you mean here, perhaps include the file used as reference?

Also in step 6, you mentioned calculating variance. But how do you define variance. my original thinking is that as previously mentioned, since PCoA from a distance matrix should assume coordinates for many data points, PCoA for different samples are likely not in the same space (coordinate system), and to make them comparable, we have to transform them according to one coordinate system (reference). When making this transformation, one has to assume the data points from the same sample should be close. I guess this is what you mean by fix the PCoA artifacts. After we got all subsamples in the same PCoA coordinate systems, we could get a variation of one sample. Is this what QIIME is doing?

Yes, pretty much. They're not in the same coordinate system, but one can still compute similarities between clouds of points. The artifacts probably refer to the arbitrary sign of each axis though - that's a different thing.

Regarding to make_2D, I still not sure why it is so different, I thought it is just that we create 3D plot and make projections on each 2D space. But please ignore me if this is too complicate to explain. And this is exactly what I do as a walk-around of the current discrepancy. I took the coordinates from the emperor webpage as I mentioned previously, and plot PC1vsPC2 use x, y values and use width and heights value to plot the ellipsoid. I am doing Ok. See attached file

Great. I don't see the difference you refer to, perhaps some screen shots? But it does look like you've got some good plots.

Best!

Jai Ram Rideout

unread,

May 22, 2015, 7:11:55 PM5/22/15

to qiime...@googlegroups.com

Hi Jincheng,

To answer your question about transform_coordinate_matrices.py not recognizing the input file you attached:

This file appears to have been generated with an older version of QIIME (pre-1.9.0) than you have installed (1.9.0). If you rerun the commands with 1.9.0 that you used to generate your principal coordinates files it should work.

-Jai

Reply all

Reply to author

Forward