[R] Extracting values from a ecdf (empirical cumulative distribution function) curve

501 views
Skip to first unread message

Manoranjan Muthusamy

unread,
Oct 31, 2013, 8:25:46 AM10/31/13
to r-h...@r-project.org
Hi R users,

I am a new user, still learning basics of R. Is there anyway to extract y
(or x) value for a known x (or y) value from ecdf (empirical cumulative
distribution function) curve?

Thanks in advance.
Mano.

[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

unread,
Oct 31, 2013, 5:53:28 PM10/31/13
to Manoranjan Muthusamy, r-h...@r-project.org
Hello,

As for the problem of finding y given the ecdf and x, it's very easy,
just use the ecdf:

f <- ecdf(rnorm(100))

x <- rnorm(10)
y <- f(x)

If you want to get the x corresponding to given y, use linear interpolation.

inv_ecdf <- function(f){
x <- environment(f)$x
y <- environment(f)$y
approxfun(y, x)
}

g <- inv_ecdf(f)
g(0.5)


Hope this helps,

Rui Barradas

Manoranjan Muthusamy

unread,
Oct 31, 2013, 9:18:26 PM10/31/13
to Rui Barradas, r-h...@r-project.org
Thank you, Barradas. It works when finding y, but when I tried to find x
using interpolation for a known y it gives 'NA' (for whatever y value). I
couldn't find out the reason. Any help is really appreciated.

Thanks,
Mano
>> ______________________________**________________
>> R-h...@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>

William Dunlap

unread,
Oct 31, 2013, 9:48:25 PM10/31/13
to Manoranjan Muthusamy, Rui Barradas, r-h...@r-project.org
> it gives 'NA' (for whatever y value).

What 'y' values were you using? inf_f maps probabilities (in [0,1]) to
values in the range of the orginal data, x, but it will have problems for
a probability below 1/length(x) because the original data didn't tell
you anything about the ecdf in that region.

> X <- c(101, 103, 107, 111)
> f <- ecdf(X)
> inv_f <- inv_ecdf(f)
> inv_f(seq(0, 1, by=1/8))
[1] NA NA 101 102 103 105 107 109 111

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

Duncan Mackay

unread,
Oct 31, 2013, 10:40:14 PM10/31/13
to Manoranjan Muthusamy, R
Hi

There is a print method for ecdf

So print(f) should give you an idea of what is going on

See ?ecdf

HTH

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mac...@northnet.com.au

-----Original Message-----
From: r-help-...@r-project.org [mailto:r-help-...@r-project.org] On
Behalf Of Manoranjan Muthusamy
Sent: Friday, 1 November 2013 11:18
To: Rui Barradas
Cc: r-h...@r-project.org

Manoranjan Muthusamy

unread,
Nov 1, 2013, 7:37:37 AM11/1/13
to William Dunlap, dulc...@bigpond.com, r-h...@r-project.org
Thanks, Bill & Duncan. Actually I tried values which are inside the defined
region. please find below the extracted script

> xnew<-rlnorm(seq(0,4000000,10000), meanlog=9.7280055, sdlog=2.0443945)
> f <- ecdf(xnew)
> y <- f(x)
> y1<-f(2000000) ## finding y for a given xnew value of
2000000
> y1
[1] 0.9950125 ## It works.

> inv_ecdf <- function(f){
+ xnew <- environment(f)$xnew
+ y <- environment(f)$y
+ approxfun(y, xnew)
+ }
## Interpolation to find xnew for a known y value.

> g <- inv_ecdf(f)
> g(0.9950125)
[1] NA
> g(0.99) ## It doesn't
[1] NA
> g(0.5)
[1] NA ## again
> g(0.2)
[1] NA ## and again


I am stuck here. Any help is appreciated.

Mano.

William Dunlap

unread,
Nov 1, 2013, 10:54:13 AM11/1/13
to Manoranjan Muthusamy, dulc...@bigpond.com, r-h...@r-project.org
You are not using the inv_ecdf function that Rui sent. His was
inv_ecdf_orig <-
function (f)
{
x <- environment(f)$x
y <- environment(f)$y
approxfun(y, x)
}
(There is no 'xnew' in the environment of f.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

On Fri, Nov 1, 2013 at 2:48 AM, William Dunlap <wdu...@tibco.com<mailto:wdu...@tibco.com>> wrote:
> it gives 'NA' (for whatever y value).
What 'y' values were you using? inf_f maps probabilities (in [0,1]) to
values in the range of the orginal data, x, but it will have problems for
a probability below 1/length(x) because the original data didn't tell
you anything about the ecdf in that region.

> X <- c(101, 103, 107, 111)
> f <- ecdf(X)
> inv_f <- inv_ecdf(f)
> inv_f(seq(0, 1, by=1/8))
[1] NA NA 101 102 103 105 107 109 111

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com<http://tibco.com>
> >> R-h...@r-project.org<mailto:R-h...@r-project.org> mailing list
> >> https://stat.ethz.ch/mailman/**listinfo/r-
> help<https://stat.ethz.ch/mailman/listinfo/r-help>
> >> PLEASE do read the posting guide http://www.R-project.org/**
> >> posting-guide.html <http://www.R-project.org/posting-guide.html>
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-h...@r-project.org<mailto:R-h...@r-project.org> mailing list

Manoranjan Muthusamy

unread,
Nov 1, 2013, 1:41:45 PM11/1/13
to William Dunlap, r-h...@r-project.org
Yeah, now it works. Thanks a lot, William and everyone who helped me. This
forum is really helpful for beginners like me. :)

Mano.


On Fri, Nov 1, 2013 at 3:54 PM, William Dunlap <wdu...@tibco.com> wrote:

> You are not using the inv_ecdf function that Rui sent. His was****
>
> inv_ecdf_orig <-****
>
> function (f) ****
>
> {****
>
> x <- environment(f)$x****
>
> y <- environment(f)$y****
>
> approxfun(y, x)****
>
> }****
>
> (There is no 'xnew' in the environment of f.)****
>
> ** **
>
> Bill Dunlap****
>
> Spotfire, TIBCO Software****
>
> wdunlap tibco.com****
>
> ** **
>
> *From:* Manoranjan Muthusamy [mailto:ranjan...@gmail.com]
> *Sent:* Friday, November 01, 2013 4:38 AM
> *To:* William Dunlap; dulc...@bigpond.com
> *Cc:* Rui Barradas; r-h...@r-project.org
>
> *Subject:* Re: [R] Extracting values from a ecdf (empirical cumulative
> distribution function) curve****
>
> ** **
>
> Thanks, Bill & Duncan. Actually I tried values which are inside the
> defined region. please find below the extracted script****
>
> ** **
>
> > xnew<-rlnorm(seq(0,4000000,10000), meanlog=9.7280055, sdlog=2.0443945)**
> **
>
> > f <- ecdf(xnew)****
>
> > y <- f(x)****
>
> > y1<-f(2000000) ## finding y for a given xnew value of
> 2000000****
>
> > y1****
>
> [1] 0.9950125 ## It works.****
>
> ** **
>
> > inv_ecdf <- function(f){****
>
> + xnew <- environment(f)$xnew****
>
> + y <- environment(f)$y****
>
> + approxfun(y, xnew)****
>
> + }****
>
> ## Interpolation to find xnew for a known y value.****
>
> ** **
>
> > g <- inv_ecdf(f)****
>
> > g(0.9950125)****
>
> [1] NA****
>
> > g(0.99) ## It doesn't****
>
> [1] NA****
>
> > g(0.5)****
>
> [1] NA ## again****
>
> > g(0.2)****
>
> [1] NA ## and again****
>
> ** **
>
>
> I am stuck here. Any help is appreciated.
>
> Mano.****
>
> ** **
>
> On Fri, Nov 1, 2013 at 2:48 AM, William Dunlap <wdu...@tibco.com> wrote:*
> ***
>
> > it gives 'NA' (for whatever y value).****
>
> What 'y' values were you using? inf_f maps probabilities (in [0,1]) to
> values in the range of the orginal data, x, but it will have problems for
> a probability below 1/length(x) because the original data didn't tell
> you anything about the ecdf in that region.
>
> > X <- c(101, 103, 107, 111)
> > f <- ecdf(X)
> > inv_f <- inv_ecdf(f)
> > inv_f(seq(0, 1, by=1/8))
> [1] NA NA 101 102 103 105 107 109 111
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com****
>
>
>
> > -----Original Message-----
> > From: r-help-...@r-project.org [mailto:r-help-...@r-project.org]
> On Behalf
> > Of Manoranjan Muthusamy
> > Sent: Thursday, October 31, 2013 6:18 PM
> > To: Rui Barradas
> > Cc: r-h...@r-project.org
> > Subject: Re: [R] Extracting values from a ecdf (empirical cumulative
> distribution function)
> > curve
> >
> > Thank you, Barradas. It works when finding y, but when I tried to find x
> > using interpolation for a known y it gives 'NA' (for whatever y value). I
> > couldn't find out the reason. Any help is really appreciated.
> >
> > Thanks,
> > Mano
> >
> >
> > On Thu, Oct 31, 2013 at 10:53 PM, Rui Barradas <ruipba...@sapo.pt>
> > >>****
>
> > >> ______________________________**________________
> > >> posting-guide.html <http://www.R-project.org/posting-guide.html>****
>
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >>
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-h...@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.****
>
> ** **
Reply all
Reply to author
Forward
0 new messages