Weighting by precipitation

50 views
Skip to first unread message

Eduardo Mariano

unread,
May 21, 2021, 11:39:30 AM5/21/21
to IsoriX
Dear colleagues,

I'm a postdoctoral researcher from Brazil and would like to thank you for providing "isoriX" for the R community. I'm an agronomist working ~12 years with stable isotopes (mainly 15N, focused on 15N-labeled fertilizers to trace their fate on the plant-soil system). Recently I moved to a lab where the natural abundance of 13C and 15N is used for ecological and forensic studies. Therefore, I'm working on a project to detect food fraud (i.e., to find out the possible place of origin of food products, such as coffee, soybean, etc) by using d18O and d2H. The "isoriX" R package saved my day since building an isoscape from scratch is definitely tricky. I already built an isoscape for 2H and 18O via "isoriX" for the Brazilian territory. However, I'm stuck on how to produce an isoscape weighted by precipitation amount (which is essential in Brazil due to the high variation throughout the year). I already read the excellent presentation for summer school, but I really don't know how to perform the aggregation by myself. I noticed an R script on Github about the weighting procedure but seems outdated since some functions are not present in the current version of "isoriX". Could you help me with that?

All the best,

Eduardo Mariano

Leonard Wassenaar

unread,
May 22, 2021, 8:43:19 AM5/22/21
to iso...@googlegroups.com, Leonard Wassenaar
Hi Eduardo - at IAEA we have a new high resolution mean annual precipitation amount weighted isotope surface for South America (see RCWIP in 2021 hydrological processes).  These files are huge, and we’d need to know what format you want (geotif or ascii).  Len 
--
You received this message because you are subscribed to the Google Groups "IsoriX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isorix+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/isorix/d366721f-6d78-4cf4-872f-3c5eec3fba3fn%40googlegroups.com.

Eduardo Mariano

unread,
May 22, 2021, 9:30:59 AM5/22/21
to iso...@googlegroups.com
Hi Leonard,
Many thanks for your reply! I'm happy to hear that IAEA produces isoscapes for stable isotopes (2H and 18O). Regarding the file format, I usually work with geotiff files within R, but ascii is also an option. I'm not a GIS expert, so your advice is more than appreciated about the file format to chose :)
Cheers,
Eduardo



--
Doutor em Solos e Nutrição de Plantas 
Escola Superior de Agricultura "Luiz de Queiroz"
Universidade de São Paulo
Brasil
Telefone: (19)98805-4348

Doctorate in Soil Science and Plant Nutrition
Luiz de Queiroz College of Agriculture
University of São Paulo
Brazil
Mobile: +55 1998805-4348

Alexandre Courtiol

unread,
May 22, 2021, 9:54:24 AM5/22/21
to iso...@googlegroups.com
Hi Eduardo,

You should try using the IAEA isoscape (Thanks Len!), but the issue is that you won't be able to perform an assignment on it that accounts for several important sources of uncertainty.
Since IsoriX is not just using the mean isoscape but also the prediction variance of the isoscape, the prediction covariances..., it should capture those uncertainties properly.
On the other hand, for now at least, you cannot include many predictors in IsoriX to get a very tailored isoscape which IAEA may offer.
As usual, I would recommend you to try and compare many options available out there.
If you get the same results, then you will feel confident that your results are true, which should be a must in forensic.
If you don't get the same results, comparing the results should point toward the assumptions that make a big difference and you will have learned something about your system.

Now the core question: how to do precipitation amount weighted isoscapes in IsoriX.
You have 2 options:

1). create one isoscape per month and then aggregate these isoscapes in a way that accounts for monthly precipitation amounts.
This is what is described and done in the example of the function isomultiscape() in IsoriX.
My feeling is that this way is superior to option 2 although it has never been investigated properly.
The benefits are that all is already coded in IsoriX: you would use the functions prepcipitate(), isomultifit() & isomultiscape().
The big drawback is that you cannot do calibrations based on such isoscapes (it is tricky and I have not yet coded it).
So that means that either you would need to do the calibration yourself, which means that you will probably won't be able to propagate the full uncertainty from the isoscape into the calibration and assignments.
Or you have to build an isoscape directly on coffee/soybean... from known origins instead of precipitation water.

2). follow the usual workflow but bypass the very first step: prepsources() and instead prepare by hand a dataset similarly as those produce by this function, e.g.:
                source_ID mean_source_value var_source_value n_source_value   lat  long elev
1                  ARKONA         -60.99231         247.8767            134 54.67 13.43   42
2                  ARTERN         -61.00653         510.5720            199 51.37 11.29  164
You would need to prepare it by hand because when computing mean_source_value and var_source_value, instead of doing a simple mean and variance, you would do a precipitation weighted mean and variance.
I could modify prepsources() to do that automatically, but I have not so far.
To compute the weighted means, you just need to use the native R function weighted.mean(x, w, ...).
To compute the weighted variance, you could simply use the function Hmisc::wtd.var(), check the help as there are different methods and options you could use.
I would either use normwt = TRUE or method = "ML" and I don't think it should make much difference.
So with this option you do need to perform a little bit of data preparation, but then everything should be easy.

Again, ideally, do both and compare :-)

If you don't know how to implement some suggestions, feel free to ask for more details, but that should hopefully already point you in the right direction.

Best,

Alex





--
You received this message because you are subscribed to the Google Groups "IsoriX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isorix+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/isorix/d366721f-6d78-4cf4-872f-3c5eec3fba3fn%40googlegroups.com.


--

Eduardo Mariano

unread,
May 22, 2021, 10:40:16 AM5/22/21
to IsoriX
Hi Alex,

Thank you so much for your instruction and comments. I'll follow your recommendation and compare different isoscapes (I also found H2 and 18O isoscapes from WaterIsotopes.com - probably built with isoMAP) before deciding which way to go. An important question: which parameters should I consider when comparing these isoscapes?
As related to isoriX, I think that option 2, although more time-consuming, allows me to calibrate known samples, isn't it? Unfortunately, I'm still getting coffee and soy samples, so the calibration of the isoscape will be done later. Since soybean is an annual plant (growing from ~Oct to ~Feb here in Brazil), I think that monthly isoscapes would be the best scenario for predictions about the place of origin, while coffee is a perennial crop and an annual isoscape would fit well. Please feel free to correct me.

Eduardo

Alexandre Courtiol

unread,
May 22, 2021, 11:06:28 AM5/22/21
to iso...@googlegroups.com
Re Eduardo,

There is no rule about how to compare isoscape.
I would plot them and look at them all.
Ultimately, the best test would be to compare the assignments for a not-so-small number of coffee and soy samples from known origines.
For this assessment to be valid, it is important that these samples are NOT used during the calibration step.
You should just treat them as real samples to assign and use those to measure the accuracy of your entire workflow.
The best would be that those samples come from different places and encompass the largest possible range of isotopic values.
So perhaps, if you have enough calibration samples, leave 20% of calibration samples aside to do such a test.

Once the best method is identified, you can put them back in for refining the calibration and take the benefit of all your data.

About the need for calibration: yes, unless you have calibration samples offering a coverage as good as the precipitation data, you will have to build the isoscape on precipitation water and not on coffee or soy directly.
That means you will need to calibrate your data.

Which months to consider?
I am afraid that this is a dark art that ideally should be rooted in physiological knowledge.
The question I guess is which precipitation events contributed to the atoms of O and H found in your sample...?
From what I have seen during summer schools and via interactions with other colleagues, I came to think that people have a very poor knowledge about that and that clearly research would be needed on this topic.
So I would check the robustness of your results varying a little the time windows around what seems sensible.
What you suggest may be sensible but I know nothing on the physiology of plants growing in Brazil.
I wonder for instance if when the seeds develop would be more important than when the plants grow.
Again, compare what different options tell you.
Perhaps uncertainty will be much more reduced under some scenarios than others...

If anyone else on this list knows more on that, please feel free to share!

++








Eduardo Mariano

unread,
May 23, 2021, 6:07:20 PM5/23/21
to IsoriX
Thanks again for your detailed answer. Regarding the months to be used for predictions, understanding plant physiology is key to avoid biased results. Although grains are produced in the reproductive phase of soybean (R1), metabolites containing oxygen formed in earlier stages of the plant can be remobilized to the grain. It is a tricky exercise and for sure future research must be done. In addition, since I'm using plant tissues (grains), I need to correct d2H and d18O values to match those with the source (precipitation) due to the fractionation that may occur in several biochemical processes and remobilization. Probably a Craig-Gordon model will be needed in this sense for 18O. It's not so trivial, but someone needs to do LOL... In the example found in isoriX using bats relative to deuterium, the non-exchangeable H in the fur is used to match with the d2H of the soil water (source). I'm afraid that this procedure is virtually impossible for plants, where a prior correction is required. At least for 18O.
Leonard: could you send me the URL to download the isoscapes for South America? I only found isoscapes created in 2013 at the IAEA website. Thanks in advance.

Cheers,

Eduardo

Alexandre Courtiol

unread,
May 24, 2021, 2:55:06 AM5/24/21
to iso...@googlegroups.com
Hi Eduardo,
In IsoriX the calibration step assumes a linear relationship (y = ax + b) between the isotopes in the tissue and those in the environment.
I don't know what would be needed for plants, but if the empirical relationship is not far from something linear, it should work.
Otherwise, please let us know since perhaps we could include something in IsoriX to ease the workflow for plant people :-)
++


Eduardo Mariano

unread,
May 24, 2021, 5:29:56 PM5/24/21
to iso...@googlegroups.com
Alex,

I'm trying to estimate isoscapes of amount-weighted precipitation following your suggestion n.2. However, the "weighted mean" column has missing values (NA) due to the lack of precipitation values for an entire site or only when some values are missing. Thus, I subsequently got errors when calculating the weighted variance. Is there any way to solve this? I'm wondering if suppressing NAs would be a good idea since 33% of the sites would be removed. Please find attached my R code as well as raw data.  

E

Isoscape-18O-optimized.R
Wiser_BulkData.csv

Alexandre Courtiol

unread,
May 25, 2021, 3:32:18 AM5/25/21
to iso...@googlegroups.com
Hi Eduardo,

1. You need to identify the country-year combinations for which there is no recorded precipitations at all:

library(dplyr)
GNIPData %>%
  group_by(source_ID, year) %>%
  summarise(precip = all(is.na(precip))) %>%
  filter(precip) %>%
  select(-precip) -> not_doable

2. You need to remove those with anit_join() and as it turns out that any NA in the weight makes weighted.mean() to produce NA, I used HMisc::wtd.mean() instead:

GNIPDataEUagg <- GNIPData %>%
  anti_join(not_doable) %>%
  group_by(source_ID, lat, long, elev) %>%
  summarise(mean_source_value = Hmisc::wtd.mean(source_value, weights = precip, normwt = TRUE, na.rm = TRUE),
            var_source_value = Hmisc::wtd.var(source_value, weights = precip, normwt = TRUE, na.rm = TRUE),
            n_source_value = length(source_value)) %>%
  select(source_ID, mean_source_value, var_source_value, n_source_value, lat, long, elev)

That seems to work:

> GNIPDataEUagg
# A tibble: 28 x 7
# Groups:   source_ID, lat, long [28]

   source_ID                    mean_source_value var_source_value n_source_value    lat  long  elev
   <fct>                                    <dbl>            <dbl>          <int>  <dbl> <dbl> <dbl>
 1 BELEM                                    -2.72             8.80            232  -1.43 -48.5    24
 2 BELEM PIRACICABA                         -3.38             3.40             22  -1.43 -48.5    24
 3 BELO HORIZONTE-CDTN                      -6.11             9.30             91 -19.9  -44.0   857
 4 BENJAMIN CONSTANT                        -6.47             2.19              4  -4.38 -70.0    65
 5 BETANIA - FAZENDA CONCEICAP.             -4.89             5.51             11  -8.18 -37.9   490
 6 BETANIA - MET.STATION                    -3.77             9.85             12  -8.26 -38.0   450
 7 BOA VISTA                                -4.74            11.2               6   2.83 -60.7    78
 8 BRASILIA (AIRPORT)                       -4.85             9.47            136 -15.8  -47.9  1061
 9 CACHIMBO                                 -3.29             9.54             13  -5.98 -39.6   555
10 CAMPO GRANDE                             -6.50            10.5              64 -20.5  -54.7   572
# … with 18 more rows

I did not check things after that, since I prefer to stick to the issue you raised.

Small remark: I would not load "plyr" which is largely outdated and conflicts with tidyverse (which loads dplyr).

++


Eduardo Mariano

unread,
May 25, 2021, 10:40:52 AM5/25/21
to iso...@googlegroups.com
Thanks, Alex! Your code worked like a charm. Exactly, the plyr package conflicts with dplyr in some functions (.e.,g, summarize). I forgot to remove it.
Cheers,

Reply all
Reply to author
Forward
0 new messages