New subscriber to mailing list

63 views
Skip to first unread message

Anand Rao

unread,
Mar 11, 2023, 2:22:31 AM3/11/23
to DIYbio
Hi all,

I am a new subscriber to this mailing list. I'm an experimental biologist with some experience in bioinformatics. Currently I have an ongoing project at the community lab close to me. So if you are in the DMV area (USA), I might be able to help you out with answers to lab access questions you may have.

I look forward to stimulating discussions and productive collaborations with you.

Speaking of collaborations, for a project I have in mind, I am looking for a research partner with experience and expertise in digital signal processing. If that is you, please ping me back.

Thanks,
Anand

Dan Kolis

unread,
Mar 11, 2023, 11:11:05 AM3/11/23
to diy...@googlegroups.com

Welcome. Yet I remark you have to name what your trying to to to solicit help.

Secret projects generally do not promote collaborators !

There isn't a lot of traffic on the blog, but its basically always quite informed stuffola.... both the stimulus and response(s) really

Regards,
Daniel B. Kolis

 

Anand Rao

unread,
Mar 12, 2023, 12:36:19 PM3/12/23
to DIYbio
Greetings!

Thanks for your response and asking for details, Daniel.

There's nothing secret about my project. This has been done previously by research groups in academia but there are no off-the-shelf tools available for such analyses than can be downloaded, installed, and executed.

Here's a simplified outline of what it is I am trying to achieve, indicating what skillsets I wish to leverage from my collaborator(s):

1. DNA sequences can be 'encoded' in various ways - e.g. one-hot encoding, integer encoding, etc.
2. These encodings can then be transformed into digital signals using techniques such as Fast Fourier Transform, discrete wavelet transform etc.
3. Such digitized signals can be aligned for comparison in ways similar to pairwise or multiple sequence alignment of nucleic acid or protein sequences
4. Furthermore, such encoded and transformed DNA sequences may be aligned , or clustered, or used to identify genomic 'features' of interest, and even 'classify unknown' sequences, and other such bioinformatic applications.

If you have any or all of the skillsets, especially if you come from the electronics digital signal processing field, then please connect with me at this email address. Thanks in advance.

Cheers,
Anand K.S. Rao, Ph.D.


Dan Kolis

unread,
Mar 12, 2023, 1:00:11 PM3/12/23
to DIYbio
HI Anand,

I'm glad I asked, not just to inform me, but this gets the idea(s) out for everybody.

Doing time series to FFT is one of the few algorithms I haven't been able to apply well ! When I was 'young' I tried it in Fortran and always ran out of dynamic range, it was always miss-calibrated and was more like a 'tone detector' then a histogram. As a hint when that was, it was in Fortran on a Data General Eclipse.

-anyway-

I'm assuming you suggest an x-form into the frequency domain, then matching somehow is in your scope of work on this.

Offhand, I don't think that seems like it will do much; but obviously you are on a larger safari then one assumption I made here. I do also like to be wrong. I am sure it is worth doing.

The scope of "DIY" here logically includes a lot of comp sci as opposed to wet chemistry, obviously.

Esp since this forum has relatively little traffic ( sadly ) ... how about poking up more here. I can maybe do something helpful. If its burdensome volume wise we can continue elsewhere. This gets the nuts and bolts of what your considering out to others.

I could maybe write some programs, etc who knows.

One aspect is surely it's good stuff for base pairs to be considered as actionable including there X Y Z and methylation, etc not just a string of five characters... Wrapping on histones, etc all sort of epigenetics needs more attention to 3D xNA. This "If it's not making a protein, its junk" idea is NOT DEAD !!!

Q1: Do you think an end user accessible notation combining B.P. and Cartesian positions in some duality motif, is desirable but weirdly absent ? 

Regs,
Daniel B. Kolis

my ref: Anand, diy, nafl, 12 Mar 2023





Anand Rao

unread,
Mar 13, 2023, 6:11:15 AM3/13/23
to DIYbio
HI Daniel,

Thanks for your reply.

point 1.
I note your point that DNA encoding should not be over-simplified. And I agree in principle with this.
However, the type of encoding used will depend on the question being asked.
And for the purpose of my research question, I don't think anything beyond already utilized encoding(s) need to be implemented - i.e. no need to re-invent the proverbial (research) wheel.

point 2.
This project is neither a safari nor an expedition.
Work done previously as research paper citations or body of work from 2 separate research groups are linked below.

point 3.
Due to the real possibility of nauseatingly detailed discussions between us ensuing, your suggestion about moving future email discussions off the forum unless is a good one, unless anyone else expresses interest to be included in their reply to my reply here.

Being a molecular geneticist and genomicist, performing digital signal processing is way outside my skillset i..e this is NOT a DIYbio project for me. Therefore, I looking for collaborator(s) with the skillset complementary to mine. Hope that provides complete context for  this email thread thus far.

Lin, J., Wei, J., Adjeroh, D. et al. SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform. BMC Bioinformatics 19, 165 (2018). https://doi.org/10.1186/s12859-018-2155-9

Kamal, N.A.M.; Bakar, A.A.; Zainudin, S. Optimization of Discrete Wavelet Transform Feature Representation and Hierarchical Classification of G-Protein Coupled Receptor Using Firefly Algorithm and Particle Swarm Optimization. Appl. Sci. 2022, 12, 12011. https://
doi.org/10.3390/app122312011

Farman Ali, Omar Barukab, Ajay B Gadicha, Shruti Patil, Omar Alghushairy, Akram Y. Sarhan, "DBP-iDWT: Improving DNA-Binding Proteins Prediction Using Multi-Perspective Evolutionary Profile and Discrete Wavelet Transform", Computational Intelligence and Neuroscience, vol. 2022, Article ID 2987407, 8 pages, 2022. https://doi.org/10.1155/2022/2987407

https://scholar.google.co.in/citations?user=LJR5DfUAAAAJ&hl=en

https://scholar.google.com/citations?user=YzfenUAAAAAJ&hl=en

I look forward to your response. Thanks in advance.

Best,
Anand K.S. Rao, Ph.D.

Sylvian Hemus

unread,
Mar 13, 2023, 10:15:16 AM3/13/23
to diy...@googlegroups.com
A bit tangential to the topic at hand but just wanted to recommend org-mode to Anand if he finds he cites in emails frequentially. With org-cite you can post cites and bibliographies quickly and easily, and send as txt or html email

--
-- You received this message because you are subscribed to the Google Groups DIYbio group. To post to this group, send email to diy...@googlegroups.com. To unsubscribe from this group, send email to diybio+un...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/diybio?hl=en
Learn more at www.diybio.org
---
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To unsubscribe from this group and stop receiving emails from it, send an email to diybio+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/diybio/c4ae282d-9d09-49d8-97b9-0ec03876fd37n%40googlegroups.com.

Dan Kolis

unread,
Mar 13, 2023, 1:59:17 PM3/13/23
to DIYbio
Thanks for the Document URLs .... This one seemed comprehensible to me, though all of them are fairly straightforward really.

https://www.researchgate.net/publication/5260702_Accelerating_and_Focusing_Protein-Protein_Docking_Correlations_Using_Multi-Dimensional_Rotational_FFT_Generating_Functions


TV MPEG-4 compression uses a similar technique to compress video, it breaks a field of vision into zones, each has a datum in the middle and sine and cosine wave ripples out from the centroid. a delta from the two matches zones to see which way there moving. Then, often the update is just a X Y new datum. Its like 20:1 compression and is slightly lossey.

Alphafold-2 discards absolute positioning of proteins entirely early on, all delta from features are used so there is no magic 'best' facing angle. The last steps to undo that is full of nasty complex tradeoffs.

For investigations of most life science questions, M.D. for small subsets of molecules is probably always worth doing, for instance before even speculating on wet chemistry. So perhaps most frequently protein-B.P. investigations are as much a process of rapidly discarding no-go subsets... ?

When any things 'fit' the conjugate in some enumeration are perfect fits, like i + j stuff. Simplifying investigations to pairs is pretty dangerous to deep comprehension. I don't know of a matching algo for greater then pairs, but there is an algo called Minikowsi distances I think might apply. What I suggest is a context sensitive detection perhaps is best for discarding complete no-go's and make candidate ones queue up for M.D. once identified.  I think if the slope of the minikowski derived tray is flat, the matches in whatever context make the independant variable vectors are good fits.

It seems like the papers take various aspects and leap into comparisons with previous benchmarks without any exposed speculation on why such a thing should work. 

If you decide to try to implement a FFT in a chipset optimized for such a thing, let me know, Maybe I can throw iin on some aspect of that specifically. I don't quite 'get' why yhr trig radiating out of the centroid of an image is better then other numerical methods either. But the compression is done in r/t without unusual semiconductors... Just chips with a few specialised instructions here and there.

Maybe an interactive IDE might help the usability of these programs. Like the last steps in alphafold for shape determination going from a position in a list to a rendering is a fussy step. 

Regards,
Daniel B. Kolis
 

John Griessen

unread,
Mar 27, 2023, 1:33:11 PM3/27/23
to diy...@googlegroups.com
On 3/13/23 11:59, Dan Kolis wrote:
> I don't know of a matching algo for greater then pairs, but there is an algo called Minikowsi distances I think might apply.

Are you thinking of ways to capture 3D between molecule interactions/ionic_bondings/reactions or probabilities of reactions? What
if all the research data cited already has not much of it? How would experiments get more of it?
Reply all
Reply to author
Forward
0 new messages