karyotype file

846 views
Skip to first unread message

meryem x

unread,
Jul 5, 2017, 3:24:04 AM7/5/17
to Circos
hello, 
I'm trying to use Circos to compare my genome, but i have no clue how to start, how to generate input file? how can convert fasta and GBK file to karyotype file 
can anyone help me

Meriem


Martin Krzywinski

unread,
Jul 14, 2017, 7:57:27 PM7/14/17
to circos-data-...@googlegroups.com
Your question is very vague and without more detail it's not possible to answer specifically.

The karyotype file defines the name and size of sequences that the image will use. If your fasta file contains many sequences and you want each of these sequences to be represented by an axis, then you'll need to parse out the sequence length from the fasta files. If the sequences have headers, e.g. seq1 seq2 ... then your karyotype file would look like

chr - seq1 1 0 1000 black
chr - seq2 2 0 500 black
...

for seq1 length of 1000 and seq2 length of 500.

Once you've done that, you're ready to draw data. It's not clear what you mean by "compare my genome" ... to what? 

For example, if you have some annotation and for each sequence position bin of width 10 you have some value X, then you would create a file like this

seq1 0 9 0.75
seq1 10 19 0.25
seq1 20 29 1.75
...

and then e.g. a histogram track to draw these data.

If you want to draw connections between positions, you'd create file of coordinate pairs

seq1 0 50 seq2 100 200
...

Here region 0-50 on seq1 is connected to 100-200 on seq2. What this connection means is up to you -- could be sequence similarity or otherwise.

m




Martin Krzywinski
science + art


--
You received this message because you are subscribed to the Google Groups "Circos" group.
To unsubscribe from this group and stop receiving emails from it, send an email to circos-data-visualization+unsub...@googlegroups.com.
To post to this group, send email to circos-data-visualization@googlegroups.com.
Visit this group at https://groups.google.com/group/circos-data-visualization.
For more options, visit https://groups.google.com/d/optout.

M~

unread,
Apr 9, 2018, 4:31:33 PM4/9/18
to Circos
In case it helps anyone else get started with a karyotype file from a known assembly that isn't already in the circos directory: 

Go to the UCSC genome browser and find the right table for your organism. 

In my case: 

Go to: https://genome.ucsc.edu/cgi-bin/hgTables

Clade = Vertebrate

genome = Zebrafish

assembly = Zv9/DanRer7

group = All Tables

table = chromInfo


Click on "get output" and save the content as a text file. This will at least give you the chromosome info from the assembly and the correct size of the chromosomes.
You'll still have to modify the file to fit the format that Circos expects (i.e. add the chr - columns, I changed "chr" to "dr" for danio rerio in the id column, add a column of '0' for the start positions, and replace the last column with the correct chromosome # to designate color), and you can either remove pieces of the assembly (like scaffolded or unassembled regions), or suppress them via Circos later if you aren't interested in visualizing them. 

The start of my karyotype file ends up looking like this: 

#Zebrafish Zv9 (DanRer7) assembly downloaded 4-5-18
#removed unassembled, scaffolding, and mitochondria sequence
#chrom id label start end color
chr - dr1 1 0 60348388 chr1
chr - dr2 2 0 60300536 chr2
chr - dr3 3 0 63268876 chr3

etc.

I'll also note that the order of the chromosomes in the file DOES matter: initially this data output had the chromosome listed as chr7, chr5, chr3, chr4 etc, and so they were visualized that way by default in circos (rather than chr1, chr2, chr3, chr4, chr5, chr6, chr7 ... ). So again, you can either re-order them in your karyotype file, like I did here, or set Circos to change the order of how they are displayed later.   

M~

Wayne

unread,
Apr 9, 2018, 11:46:53 PM4/9/18
to Circos
I'd like to tag M~'s revival of this thread with another option when your organism isn't included  the circos directory. 

I wrote a Python script to do this, I think. You just need to determine the URL to provide the script and how to do that is described [here](https://github.com/fomightez/sequencework/tree/master/circos-utilities), just look for the paragraph that begins with "To determine the URL to feed the script". 
Several ways to get and run the script are demonstrated in a notebook that can be read in a nicely rendered form [here](https://nbviewer.jupyter.org/github/fomightez/sequencework/blob/master/circos-utilities/demo%20UCSC_chrom_sizes_2_circos_karyotype%20script.ipynb). The code for the script and the demo notebook itself are available [here](https://github.com/fomightez/sequencework/tree/master/circos-utilities). 

It may not work perfectly in regards to the order it produces, and so see the ends of M~'s note about that, but the main parts should work, I believe.

The best part is that it is super easy to run that script right in your browser via the MyBinder system with no need for you to install anything. Just remember to download the result at the end to your local machine. (In fact, I am planning a separate post on running Circos in your browser using that system with no set-up or installation needed --> go to https://github.com/fomightez/circos-binder/blob/master/README.md if you are really curious right now as the basics work but I want to put a lot more `getting started` content.)  The beginning of the demo points you are where you can find a `launch binder` button to click in your browser to be provided an active environment where any of the code in the demo notebook will work.

Wayne



Reply all
Reply to author
Forward
0 new messages