Functional programming techniques in protein biochemistry analysis, talk by Gregory Benison
An NMR spectrum is effectively a mapping from frequencies to intensities. Through some magic it is possible to produce two-dimensional spectra in which peaks correspond to pairings of what would be peaks in corresponding one-dimensional spectra.
Through application of transformations on spectra like convolutions, transpositions, and diagonal projections one can eliminate noise to find the frequency of some atom arrangement given a related arrangement with a known frequency.
The problem is that this involves a lot of computations. Lazy evaluation is essential to keep memory use in check. But lazy evaluation might result in repeating a lot of computations.
Perhaps a cache with the correct design could help.
Leif referenced logic-programming as a domain that is specialized for finding solutions in large search spaces.
A typical input spectrum could contain 10^7 double values. The file size could be 50 - 100 MB.
Maglev can handle large volumes of data accessed concurrently using software transactional memory. Could it scale high enough to handle the number-crunching required?
Daniel proposes that this may be a problem for custom hardware. Another example of finding a signal amidst noise is a GPS device: it has to separate out one signal from a plethora of satellites. That is a problem that is handled well by FPGAs.
Special chip support or GPU use could be helpful. GPUs are really good at parallel floating-point operations.
In some situations, memory bandwidth can be a sufficiently problematic that it is efficient to compress data in memory and to take the CPU hit of decompressing it for processing.
Jillian suggests applying wavelet analysis to the input spectra and applying transformations to the result. Wavelet analysis would produce a more compact representation of the input data. There are some papers out there on signal recovery via wavelet analysis.
Can the problem be divided into pieces? Perhaps using mapreduce?
In any case, it is useful to narrow down the problem before putting too much energy into a solution. Hooking up a simple cache and watching the hit/miss ratio might give a hint as to whether any kind of caching solution could work.
Another way to reduce the input size could be to build a K-D tree of local maxima points - as opposed to filtering out all frequencies below a given intensity. Or you could sample the input at reduced resolution.
It is hard to store matrices in a way to make multiplication as efficient as possible. You can choose to store a matrix as a list of rows or as a list of columns - but neither is optimal. You can get better results by using space partitioning to recursively store sub-matrices.
A problem with using multiple threads is that you could end up thrashing memory more than a single thread would, resulting in more cache misses. What might be nice sequential memory access in a single thread could become much messier with interleaved threads.
The final suggestion of the night: what about quantum computing?
Functional programming techniques in protein biochemistry analysis, talk by Gregory Benison
An NMR spectrum is effectively a mapping from frequencies to intensities. Through some magic it is possible to produce two-dimensional spectra in which peaks correspond to pairings of what would be peaks in corresponding one-dimensional spectra.
Through application of transformations on spectra like convolutions, transpositions, and diagonal projections one can eliminate noise to find the frequency of some atom arrangement given a related arrangement with a known frequency.
The problem is that this involves a lot of computations. Lazy evaluation is essential to keep memory use in check. But lazy evaluation might result in repeating a lot of computations.
Perhaps a cache with the correct design could help.
Leif referenced logic-programming as a domain that is specialized for finding solutions in large search spaces.
A typical input spectrum could contain 10^7 double values. The file size could be 50 - 100 MB.
Maglev can handle large volumes of data accessed concurrently using software transactional memory. Could it scale high enough to handle the number-crunching required?
Daniel proposes that this may be a problem for custom hardware. Another example of finding a signal amidst noise is a GPS device: it has to separate out one signal from a plethora of satellites. That is a problem that is handled well by FPGAs.
Special chip support or GPU use could be helpful. GPUs are really good at parallel floating-point operations.
In some situations, memory bandwidth can be a sufficiently problematic that it is efficient to compress data in memory and to take the CPU hit of decompressing it for processing.
Jillian suggests applying wavelet analysis to the input spectra and applying transformations to the result. Wavelet analysis would produce a more compact representation of the input data. There are some papers out there on signal recovery via wavelet analysis.
Can the problem be divided into pieces? Perhaps using mapreduce?
In any case, it is useful to narrow down the problem before putting too much energy into a solution. Hooking up a simple cache and watching the hit/miss ratio might give a hint as to whether any kind of caching solution could work.
Another way to reduce the input size could be to build a K-D tree of local maxima points - as opposed to filtering out all frequencies below a given intensity. Or you could sample the input at reduced resolution.
It is hard to store matrices in a way to make multiplication as efficient as possible. You can choose to store a matrix as a list of rows or as a list of columns - but neither is optimal. You can get better results by using space partitioning to recursively store sub-matrices.
A problem with using multiple threads is that you could end up thrashing memory more than a single thread would, resulting in more cache misses. What might be nice sequential memory access in a single thread could become much messier with interleaved threads.
The final suggestion of the night: what about quantum computing?
--
You received this message because you are subscribed to the Google Groups "pdxfunc" group.
To post to this group, send email to pdx...@googlegroups.com.
To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pdxfunc?hl=en.
--
You received this message because you are subscribed to the Google Groups "pdxfunc" group.
To post to this group, send email to pdx...@googlegroups.com.
To unsubscribe from this group, send email to pdxfunc+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pdxfunc?hl=en.