Random sampling like pandas.DataFrame.sample()

1,220 views
Skip to first unread message

Emma Worthington

unread,
May 5, 2020, 11:39:36 AM5/5/20
to xarray
Hi,
Hopefully this is self-explanatory. I'm looking for a function that resamples a Dataset randomly along a dimension (e.g., time), and returns n values.

This seems to do exactly what I want in Pandas:

Is there an equivalent in xarray? Searching has revealed nothing.

thanks,
Emma

Chuan-Yuan Hsu

unread,
May 5, 2020, 12:21:29 PM5/5/20
to xarray
Given the random numbers by np.random.randint?

df.isel(time=np.random.randint(0, df.time.size, n))

I believe this will give you a n numbers from the uniform distribution within the range from 0 to your time size. 


Chuan-Yuan Hsu
+——————————+
{‘Title’ : ‘Postdoctoral Research Associate’, 
 ‘Organization’ : ‘ Department of Oceanography, Texas A&M University’,
 ‘Organization’ : ‘Gulf of Mexico Coastal Ocean Observing System, GCOOS’,
 ‘Office’ : ‘979-845-3956’,
 ‘Mobile’ : ‘734-926-5394
 ‘Web’ : 'http://gcoos.org’}

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/d4d7a342-ed78-421a-8fa6-94bf27583c98%40googlegroups.com.

Val Schmidt

unread,
May 5, 2020, 1:31:01 PM5/5/20
to xar...@googlegroups.com
This is a good suggestion. 

One more thing - be careful if you care about re-sampling the same value. If so, you need to check to ensure your randomly generated integers are unique.The Pandas method you posted had a flag for this, but here you’ll have to do it yourself.

-Val

Stephan Hoyer

unread,
May 5, 2020, 1:32:56 PM5/5/20
to xarray
numpy.random.choice(df.time.size, size=n, replace=False) would let you sample without replacement.

Emma Worthington

unread,
May 6, 2020, 4:30:10 AM5/6/20
to xarray
Thanks everyone - numpy.random.choice is what I used, but I was just being lazy and hoping there was an xarray function to let me save a few lines of code.

-Val

To unsubscribe from this group and stop receiving emails from it, send an email to xar...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xar...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "xarray" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xar...@googlegroups.com.

Ondřej Grover

unread,
May 6, 2020, 9:39:56 AM5/6/20
to xar...@googlegroups.com, Jakub Seidl
Dear all,

following up on this discussion, I would like to point out that with my colleague Jakub Seidl we are developing a library called xr-random https://github.com/smartass101/xr-random
which wraps various scipy.stats rvs methods and arch bootstrap classes for convenient MC-like error propagation calculations (that's our primary use case, but there could be more) with automatic broadcasting and parallelization with Dask. It's still a little WIP, but usable for most purposes.
I suppose that for instance Emma's use case can be done like so

from xrrandom import bootstrap_samples
samples = bootstrap_samples(ds, 'time', n)  # by default does an IIDBootstrap, others are available

We intend to eventually merge at least the scipy.stats part into xr-scipy.
Comments, ideas, PRs, etc. are very welcome!

Regards,
Ondrej G.
 

To unsubscribe from this group and stop receiving emails from it, send an email to xarray+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xarray/763f7a90-faff-4e9c-8596-3dcb196c9717%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages