Hi,
I would suggest patching the last few lines of rescaleData as follows:
- d2 = data-offset
- d2 *= scale
- data = d2.astype(dtype)
+ data = (data.astype(dtype) - offset) * scale
The reason is that sometimes, d2 *= scale will trigger a RuntimeWarning, e.g. if data happens to be an uint16 and the multiplication by a large scale (that is a float64) triggers a RuntimeWarning, even though we will want to cast the data to float64 immediately after anyways.
As a side note, while I haven't tried it, I would guess that the weave implementation of rescaleData can't be faster than a simple numpy expression ("(data.astype(dtype) - offset) * scale") if it must invoke a compiler every time...
I am hoping to replace weave with a small library of pre-built C functions that would support most platforms while not requiring any compiler (and still falling back to pure python when necessary). Not yet sure how feasible this is..
Le samedi 7 septembre 2013 14:59:26 UTC+8, Guillaume Poulin a écrit :
Le samedi 7 septembre 2013 12:10:01 UTC+8, Luke Campagnola a écrit :I am hoping to replace weave with a small library of pre-built C functions that would support most platforms while not requiring any compiler (and still falling back to pure python when necessary). Not yet sure how feasible this is..It's easily feasible. It's what I do in one of my project: https://github.com/gpoulin/pybeem/tree/folding. I have two module beem.experiment._pure_python and beem.experiment._pure_c. The importation is done in beem/experiment/experiment.py. If the importation of beem.experiment._pure_c fails, beem.experiment._pure_python is used instead.However for the distribution it's start to be more complex because you need a binary for each target platform and each target python. I would suggest to only distribute binary for windows and rely on the system compiler for all *nix with a simple "python setup built" (it also permit the user to easily set custom CFLAGS).
Already for windows, it's 4 binaries: win32 python 2.7, win32 python 3.3, win64 python 2.7, win64 python 3.3.
#include <stdlib.h>void main() {int i;char* data = malloc(100000000);for( i=0; i<100000000; i++) {data[i] = (((i + 5) * 7) % 10) - 2;}}
import time, timeit, numpy as np
from scipy import weave
def rescale_np(data, scale, offset):
return (data - offset) * scale
def rescale_weave(data, scale, offset): # basically copied from rescaleData
size = data.size
output = np.empty(data.shape, dtype=data.dtype)
code = r"""
for (int i = 0; i < size; ++i) {
output[i] = ((double)data[i] - offset) * scale;
}"""
weave.inline(code, ["data", "size", "output", "offset", "scale"], compiler="gcc", extra_compile_args=['-march=native -mtune=native -O3' ])
return output
if __name__ == "__main__":
code = "rescale_weave(np.random.random((25,2,10)), 10, 1)"
number = 100000
timer = time.clock # process time
print(timeit.repeat("np.random.random((25,2,10))", setup="import numpy as np",
number=number, timer=timer)) # approx. time required to generate the data
print(timeit.repeat(code.replace("weave", "np"),
setup="import numpy as np; from __main__ import rescale_np",
number=number, timer=timer)) # numpy version
print(timeit.repeat(code,
setup="import numpy as np; from scipy import weave; "
"from __main__ import rescale_weave; " + code,
number=number, timer=timer)) # weave version, calling the function once during setup so that compilation time is substractedimport time, timeit, numpy as np
from scipy import weave
def rescale_np(data, scale, offset):
return (data - offset) * scale
def rescale_weave(data, scale, offset): # basically copied from rescaleData
size = data.size
output = np.empty(data.shape, dtype=data.dtype)
code = r"""
#pragma omp parallel for
for (int i = 0; i < size; ++i) {
output[i] = ((double)data[i] - offset) * scale;
}"""
weave.inline(code, ["data", "size", "output", "offset", "scale"], compiler="gcc", extra_compile_args=['-march=native -mtune=native -O3 -fopenmp' ], headers=['<omp.h>'], extra_link_args=['-lgomp'])
return output
if __name__ == "__main__":
code = "rescale_weave(np.random.random((25,2,10)), 10, 1)"
number = 100000
timer = time.clock # process time
print(timeit.repeat("np.random.random((25,2,10))", setup="import numpy as np",
number=number, timer=timer)) # approx. time required to generate the data
print(timeit.repeat(code.replace("weave", "np"),
setup="import numpy as np; from __main__ import rescale_np",
number=number, timer=timer)) # numpy version
print(timeit.repeat(code,
setup="import numpy as np; from scipy import weave; "
"from __main__ import rescale_weave; " + code,
number=number, timer=timer)) # weave version, calling the function once during setup so that compilation time is substracted
On Friday, September 6, 2013 9:10:01 PM UTC-7, Luke Campagnola wrote:On Mon, Sep 2, 2013 at 10:07 PM, Antony Lee <anton...@berkeley.edu> wrote:The reason is that sometimes, d2 *= scale will trigger a RuntimeWarning, e.g. if data happens to be an uint16 and the multiplication by a large scale (that is a float64) triggers a RuntimeWarning, even though we will want to cast the data to float64 immediately after anyways.
As a side note, while I haven't tried it, I would guess that the weave implementation of rescaleData can't be faster than a simple numpy expression ("(data.astype(dtype) - offset) * scale") if it must invoke a compiler every time...Weave is significantly faster--it only invokes the compiler once per dtype and stores the compiled modules to disk.I am hoping to replace weave with a small library of pre-built C functions that would support most platforms while not requiring any compiler (and still falling back to pure python when necessary). Not yet sure how feasible this is..
Indeed I misunderstood the way weave.inline works. Still...
[ snip ]
Output:
[1.0582530498504639, 1.0574111938476562, 1.0580999851226807] # data generation
[2.238456964492798, 2.2196199893951416, 2.217461109161377] # numpy
[2.6102700233459473, 2.6073598861694336, 2.6155309677124023] # weave
Once you substract the time spent calling np.random.random it seems that the weave solution is ~33% slower... or did I miss something?
Traceback (most recent call last):
File "test.py", line 2, in <module>
from scipy import weave
File "/usr/lib64/python3.3/site-packages/scipy/weave/__init__.py", line 23, in <module>
raise ImportError("scipy.weave only supports Python 2.x")
ImportError: scipy.weave only supports Python 2.x
You don't need to reshape the vector to use weave. It's where you lose precious milliseconds. By using the below code I'm getting[0.6200000000000001, 0.6099999999999999, 0.6200000000000001] #data generation[1.3499999999999996, 1.3500000000000005, 1.3499999999999996] #numpy[1.1100000000000003, 1.1199999999999992, 1.120000000000001] #weaver
--
-- [ You are subscribed to pyqt...@googlegroups.com. To unsubscribe, send email to pyqtgraph+...@googlegroups.com ]
---
You received this message because you are subscribed to the Google Groups "pyqtgraph" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyqtgraph+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Le jeudi 12 septembre 2013 10:39:50 UTC+8, Luke Campagnola a écrit :This is great!I recommend we move development discussions over to github. I opened an issue here:LukeYou confuse me. Where should we put the pull request? At https://github.com/lcampagn/pyqtgraph or https://github.com/pyqtgraph/inp?
Is there a reason to create a new project and not simply a new branch for inp?
Le jeudi 12 septembre 2013 13:01:15 UTC+8, Luke Campagnola a écrit :On Thu, Sep 12, 2013 at 12:50 AM, Guillaume Poulin <poulin.g...@gmail.com> wrote:It just seemed like the thing to do :)
You confuse me. Where should we put the pull request? At https://github.com/lcampagn/pyqtgraph or https://github.com/pyqtgraph/inp?I think perhaps a new topic branch within github.com/pyqtgraph/inp would be best, but I'm not totally clear on github's intended organization..Is there a reason to create a new project and not simply a new branch for inp?Creating a project allows the possibility of more team-management in the future, and gives pyqtgraph an unambiguous canonical repository.So you want to use pyqtgraph/inp for the development and lcampagn/pyqtgraph for mirroring? I just have the impression that lcampagn/pyqtgraph is not so useful.
Creating a project pyqtgraph/pyqtgraph with a branch "stable" and a branch "inp" would have do the trick. But you are the boss and... I don't really care as long as I know where to push. So I'm gonna fork pyqtgraph/inp and push back there.
On Thu, Sep 12, 2013 at 1:38 AM, Guillaume Poulin <poulin.g...@gmail.com> wrote:
Le jeudi 12 septembre 2013 13:01:15 UTC+8, Luke Campagnola a écrit :On Thu, Sep 12, 2013 at 12:50 AM, Guillaume Poulin <poulin.g...@gmail.com> wrote:It just seemed like the thing to do :)
You confuse me. Where should we put the pull request? At https://github.com/lcampagn/pyqtgraph or https://github.com/pyqtgraph/inp?I think perhaps a new topic branch within github.com/pyqtgraph/inp would be best, but I'm not totally clear on github's intended organization..Is there a reason to create a new project and not simply a new branch for inp?Creating a project allows the possibility of more team-management in the future, and gives pyqtgraph an unambiguous canonical repository.So you want to use pyqtgraph/inp for the development and lcampagn/pyqtgraph for mirroring? I just have the impression that lcampagn/pyqtgraph is not so useful.I think lcampagn/pyqtgraph will stick around as my personal repository, where I can push junk without it going into the main branches. If it turns out to be not useful, then I'll delete it to avoid more confusion.
Creating a project pyqtgraph/pyqtgraph with a branch "stable" and a branch "inp" would have do the trick. But you are the boss and... I don't really care as long as I know where to push. So I'm gonna fork pyqtgraph/inp and push back there.With this I was just trying to mimic the current dev/inp structure that exists on launchpad, but if that's counterintuitive then I'm perfectly happy to go with your suggestion. I'm really just making a lot of guesses here; probably I need to find some other well-organized projects to compare to.
Hi,
I would suggest patching the last few lines of rescaleData as follows:
- d2 = data-offset
- d2 *= scale
- data = d2.astype(dtype)
+ data = (data.astype(dtype) - offset) * scale
The reason is that sometimes, d2 *= scale will trigger a RuntimeWarning, e.g. if data happens to be an uint16 and the multiplication by a large scale (that is a float64) triggers a RuntimeWarning, even though we will want to cast the data to float64 immediately after anyways.
As a side note, while I haven't tried it, I would guess that the weave implementation of rescaleData can't be faster than a simple numpy expression ("(data.astype(dtype) - offset) * scale") if it must invoke a compiler every time...
Antony
--
You received this message because you are subscribed to the Google Groups "pyqtgraph" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyqtgraph+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyqtgraph/1be22f6e-e1bd-4ff0-afa9-d2b54b8a2f59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.