Orca resolution question

71 views
Skip to first unread message

Noura Maziak

unread,
Nov 8, 2023, 9:38:18 AM11/8/23
to Orca Users
Hello Dr. Zhou, 

Wanted to first say thanks for developing such nice tools! 

I'm very keen to use Orca to train on some of our micro-c data. I'm working on the fly genome, and while the genome size is considerably smaller, some of the data we have is really high resolution (less than 200 bp).

I am new to this, but I was wondering if you have any suggestions to make Orca capable to train/predict with higher resolutions that fits our data better? 

In the meantime I have made the spy files, cool files, and the genome memmap. 

All the best, 
Noura Maziak 

Jian Zhou

unread,
Nov 8, 2023, 12:23:47 PM11/8/23
to Noura Maziak, Orca Users
Hi Noura,

Thank you for your interest in Orca! I will recommend training the first stage Orca model (1Mb input, 4kb resolution) on your data first and see if it fits your needs. I often find that even for high-resolution data, since most regions do not have the highest resolution (e.g. for human and mouse micro-C)  4kb still suffice for most regions. It is certainly possible to adjust the resolution and input size in the model, though it will involve more tweaking and model development work.

Best,
Jian

--
You received this message because you are subscribed to the Google Groups "Orca Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orca-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orca-users/195ea2e2-9f93-484d-8124-c8ceacd28c69n%40googlegroups.com.

Noura Maziak

unread,
Jan 9, 2024, 3:26:10 PM1/9/24
to Orca Users
Hello Jian, 

Thank you so much for your reply! I have been testing model training and everything is working fine. I had a few questions/clarification, I'm really sorry in case the information was there and I missed it. 

1) For running the training with the SWA option, was it that you first trained the model without and then reran the training with the swa option on top? I saw that there is no .optimizer output with --swa (was testing model training on a a set amount of sequences with and without auxiliary data).

2) For the blacklisted regions, is it just in bedfile format? 

3) For the test_holdout and validation_holdout Are any chromosomes included there not used for training?

And last, some of the loops/features I'm interested in are visible at 1-2kb but get a bit lost at 4kb resolution. I would be more than willing to test/try out in the background while training the current model. If you are interested, I would also be more than glad to set up a meeting. 

All the best,
Noura

Jian Zhou

unread,
Jan 16, 2024, 6:28:37 PM1/16/24
to Noura Maziak, Orca Users
Sorry for delayed response, i included my responses below

On Tue, Jan 9, 2024 at 2:26 PM Noura Maziak <nm...@cornell.edu> wrote:
Hello Jian, 

Thank you so much for your reply! I have been testing model training and everything is working fine. I had a few questions/clarification, I'm really sorry in case the information was there and I missed it. 

1) For running the training with the SWA option, was it that you first trained the model without and then reran the training with the swa option on top? I saw that there is no .optimizer output with --swa (was testing model training on a a set amount of sequences with and without auxiliary data).

Yes exactly, rerunning with SWA provides a slight improvement after training plateaus.  

2) For the blacklisted regions, is it just in bedfile format? 

blacklisted_region takes string as input  "hg38" or "hg19" and it will use the bed files that comes included (we use the ENCODE blacklisted regions)

3) For the test_holdout and validation_holdout Are any chromosomes included there not used for training?

Yes any chromosomes in test or validation are not used for training 
 
And last, some of the loops/features I'm interested in are visible at 1-2kb but get a bit lost at 4kb resolution. I would be more than willing to test/try out in the background while training the current model. If you are interested, I would also be more than glad to set up a meeting. 
 
I would like to help but we do have quite limited capacity in helping with this. If it's fine, can you tell me more about your scientific problem and why it would benefit from higher resolution? I will try to see if it is something that we can help with. Feel free to email me directly at jian...@utsouthwestern.edu

All the best,
Noura

On Wednesday, November 8, 2023 at 5:23:47 PM UTC jzh...@gmail.com wrote:
Hi Noura,

Thank you for your interest in Orca! I will recommend training the first stage Orca model (1Mb input, 4kb resolution) on your data first and see if it fits your needs. I often find that even for high-resolution data, since most regions do not have the highest resolution (e.g. for human and mouse micro-C)  4kb still suffice for most regions. It is certainly possible to adjust the resolution and input size in the model, though it will involve more tweaking and model development work.

Best,
Jian

On Wed, Nov 8, 2023 at 8:38 AM Noura Maziak <nm...@cornell.edu> wrote:
Hello Dr. Zhou, 

Wanted to first say thanks for developing such nice tools! 

I'm very keen to use Orca to train on some of our micro-c data. I'm working on the fly genome, and while the genome size is considerably smaller, some of the data we have is really high resolution (less than 200 bp).

I am new to this, but I was wondering if you have any suggestions to make Orca capable to train/predict with higher resolutions that fits our data better? 

In the meantime I have made the spy files, cool files, and the genome memmap. 

All the best, 
Noura Maziak 

--
You received this message because you are subscribed to the Google Groups "Orca Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orca-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orca-users/195ea2e2-9f93-484d-8124-c8ceacd28c69n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Orca Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orca-users+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages