IgDiscover database

31 views
Skip to first unread message

irene

unread,
Feb 15, 2018, 11:13:22 AM2/15/18
to IgDiscover
Hello, 
does anybody know how does it work with creating the database for IgDiscovery if we were to download the V, D, J segments from IMGT: the final fasta have to be saved as V.fasta, D.fasta and J.fasta, but what about the light chain? do we save these as additional fasta files and do they have to be named in a particular way?

thanks, 
ib

Martin Corcoran

unread,
Feb 15, 2018, 12:17:00 PM2/15/18
to irene, IgDiscover
Hi Irene,

We usually keep our libraries separate.
One folder for the heavy chains, one for kappa and one for lambda.
In each folder there should be three files:

V.fasta
J.fasta
D.fasta

This includes the kappa and lambda folders so you will need to make a dummy file for the D.fasta file for the light chain databases:

for example just containing a single sequence like

DUMMY.fasta
CCCCCC

The reason for this is that the IgBLAST module requires a D.fasta file or it will not proceed (even in the case of light chains that do not contain Ds).

It will not output the Ds but it is required to stop the program crashing.

Anyway, it is important to get the database files in the correct structure for the program to work - which means removing the IMGT header

In other words you want to change this:

>L10057|IGHV7-4-1*01|Homo sapiens|F|V-REGION|95..388|294 nt|1| | | | |294+0=294| | |
caggtgcagctggtgcaatctgggtctgagttgaagaagcctggggcctcagtgaaggtt
tcctgcaaggcttctggatacaccttcactagctatgctatgaattgggtgcgacaggcc
cctggacaagggcttgagtggatgggatggatcaacaccaacactgggaacccaacgtat
gcccagggcttcacaggacggtttgtcttctccttggacacctctgtcagcacggcatat
ctgcagatctgcagcctaaaggctgaggacactgccgtgtattactgtgcgaga

into this:

>IGHV7-4-1*01
caggtgcagctggtgcaatctgggtctgagttgaagaagcctggggcctcagtgaaggtt
tcctgcaaggcttctggatacaccttcactagctatgctatgaattgggtgcgacaggcc
cctggacaagggcttgagtggatgggatggatcaacaccaacactgggaacccaacgtat
gcccagggcttcacaggacggtttgtcttctccttggacacctctgtcagcacggcatat
ctgcagatctgcagcctaaaggctgaggacactgccgtgtattactgtgcgaga

There is a perl program program called edit_imgt_file.pl that can do this with each of your files downloaded from IMGT (download the IMGT sequences that do not have gaps!)

The program is available here.
ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/

Do this for all the various database files (heavy chain Vs, Ds and Js, licht chain Vs and Js) and place them in a single folder for each Ig type (one folder for IGH, with J.fasta, V.fasta  and D.fasta files, and one folder for IGK and one for IGL)

Now you should be ready to use them in IgDiscover.

Regards,

Martin


From: igdis...@googlegroups.com [igdis...@googlegroups.com] on behalf of irene [ibs...@gmail.com]
Sent: Thursday, February 15, 2018 5:13 PM
To: IgDiscover
Subject: [IgDiscover] IgDiscover database

--
You received this message because you are subscribed to the Google Groups "IgDiscover" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igdiscover+...@googlegroups.com.
To post to this group, send email to igdis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/igdiscover/174ede65-8752-4e58-bd06-40cda80e6b48%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages