FASTA to BAM for IGV visualization

4,032 views
Skip to first unread message

Joyce Wang

unread,
Nov 18, 2013, 7:45:45 PM11/18/13
to igv-...@googlegroups.com
Hi all,

Does anyone know how to make a .fasta file into a .bam file so that we can incorporate two reference genomes into IGV?

Thanks 
Joyce

Stéphane Plaisance

unread,
Nov 19, 2013, 2:09:11 AM11/19/13
to igv-...@googlegroups.com
BedTools has a bamToFastq that does half of the job, you can then keep every first two lines to extract the fasta like in
S
--
 
---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Joyce Wang

unread,
Nov 19, 2013, 11:12:36 AM11/19/13
to igv-...@googlegroups.com
Stéphane thanks for your reply. The website tells me how to convert FASTQ to FASTA...?
Do you know how to convert .fasta to .bam?

Thanks
Joyce


--
 
---
You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/8nyE659YTss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.

Stéphane Plaisance

unread,
Nov 19, 2013, 1:51:14 PM11/19/13
to igv-...@googlegroups.com
Hi Joyce,

fasta represent sequence only while SAM/BAM is intended to add the mapping location for that sequence as well as the quality scores for base calling. If you do not have the coordinates to map the sequence not the quality, it has little value to make your fasta into a SAM/BAM.

It is probably possible to construct a bam record with no coordinate like for unmapped reads but I do not know what you could do with such data.
If you are still interested, you will need some awk or perl and read the sam documentation to figure out what to put in the additional fields


good luck
Stephane

Joyce Wang

unread,
Nov 19, 2013, 4:54:07 PM11/19/13
to igv-...@googlegroups.com
Thanks Stephane, this is going to be fun! >_<

Have a nice day!
Joyce

Amit Kumar

unread,
Jan 21, 2014, 4:13:44 AM1/21/14
to igv-...@googlegroups.com
Hi

I am in urgent need to convert fata file to either bam or fastq format.
I have tried even anline conversions but I could not succeed. I need your help to convert my files.
I hope a positive response from your side.

Amit Kumar

unread,
Jan 21, 2014, 4:14:23 AM1/21/14
to igv-...@googlegroups.com
*fasta files i mean to say.

Keith Mewis

unread,
Nov 6, 2014, 7:10:12 PM11/6/14
to igv-...@googlegroups.com
Hi Joyce,

Did you manage to figure this out? I have some assembled metagenomic data in FASTA format  (thus no reference sequences to align them to) and would like to view them with a tool that only takes BAM files. 

Any help would be appreciated!
Keith

Joyce Wang

unread,
Nov 6, 2014, 8:28:48 PM11/6/14
to igv-...@googlegroups.com
Hi Keith,

Sorry I never figured this one out. Is this new data from an Illumina sequencer? 
I was working with a scaffold sequence (literally 1 really long contig). If you can explain your situation a bit more, maybe other igv users can help you and I'll check with my colleagues tomorrow :)

Cheers
Joyce


Jim Robinson

unread,
Nov 6, 2014, 8:49:04 PM11/6/14
to igv-...@googlegroups.com
Hi,

I'm at a loss to understand what it is you want to visualize.   A sketch might be helpful.

Jim


Keith Mewis

unread,
Nov 6, 2014, 8:56:07 PM11/6/14
to igv-...@googlegroups.com
Hi Jim and Joyce,

There is a new tool by Bernard Henrissat (http://bioinformatics.oxfordjournals.org/content/early/2014/10/28/bioinformatics.btu716.short) that will predict PULs (a genomic locus/operon) based on sequence data. It uses JBrowse and will accept BAM file inputs. I have an assembly that I would like to use in this tool that is currently in a .fasta file, and hence not recognized by JBrowse.

I know a .bam file contains more information than a .fasta (alignment information, quality scores maybe?) but given this is a metagenomic assembly (75,000 contigs of length ranging from 2kb to 85kb, assembled from my environment of interest), I wouldn't know what to align it to. I'd be fine with a .bam file with "empty" (or equivalent) fields of the information not found in the .fasta.

Any help would be appreciated!
Keith

Jim Robinson

unread,
Nov 6, 2014, 9:27:07 PM11/6/14
to igv-...@googlegroups.com
Keith,

The problem is a fasta file contains only sequence,  so the most you can ever see from a fasta file is a string of characters.    You can actually "load" this file in IGV as a genome "Genome > Load from File...".     I don't know what file format the PULs take, I didn't see that from a quick scan of the paper,  but they are in essences annotations of the reference,  so you would load  these annotations from the "file" menu after defining your reference genome by loading the fasta from the "genome" menu.   Does that make sense?

If you could post a small example "PULs" file I might be able to assist further.

Jim


Keith Mewis

unread,
Nov 6, 2014, 11:45:09 PM11/6/14
to igv-...@googlegroups.com
Thanks for the assistance, Jim!

The JBrowse format says it accepts .bam files, the only mention of using my own data in that paper comes at the very end of section 3.2: "Finally, the JBrowse engine also allow loading the user's own expression data, such as short-reads from BAM files." I'm not super familiar with .bam files, but I think they're files that tell me alignment parameters to a reference genome, no? In my case, it is metagenomic data from forest soils (bacterial mostly), so there is no defined reference genomes to align them to. I'm also not familiar with IGV, but I would guess that when I load a .fasta into IGV it performs some sort of gene prediction to be able to display information on those tracks.

I just noticed it will accept gff3 files. I used Prodigal to make a .gff file of my data and will try that. I really appreciate your assistance and effort to help!

Regards,
Keith

Jim Robinson

unread,
Nov 7, 2014, 12:22:18 AM11/7/14
to igv-...@googlegroups.com
Hi,

If you have a gff file then by definition you have a reference sequence.   Assuming the reference you used to create the gff is a fasta file load that first into igv from the "genome" menu,  then load the gff3 from the "file" menu. 


Keith Mewis

unread,
Nov 7, 2014, 1:30:27 AM11/7/14
to igv-...@googlegroups.com
Hmmm, if by reference sequence you mean my original fasta, then yes. I input my assembled metagenome (original fasta file) into Prodigal and it predicts ORFs and outputs a .gff file. I understand that the .gff does not contain sequence data though. I still do not have a "reference genome" (for example a sequenced isolate genome from NCBI or something) to which my original metagenome reads (or assembly in this case) were aligned to create a .bam file.

Regardless, it appears the .gff file doesn't work in this case - perhaps it needs different annotation to have the PULs show up. I have emailed the author on the paper to ask if what I'm trying to do is possible.

Once again, thank you very much for your help! You are too kind :)

Jim Robinson

unread,
Nov 7, 2014, 6:38:38 AM11/7/14
to igv-...@googlegroups.com
Hi,

Yes I see the confusion in terms,  in the context of a genome browser we should really speak of a reference sequence, not a "genome",  the terminology is a little loose here.

If you want to zip and email your fasta and gff file to us I will look at it.  You can send it to  igv-team (at) broadinstitute.org.   If its too large for email just send a sample of each file.   Also, if you can describe what you mean by "doesn't work", perhaps with a screenshot that would be helpful.

I will be traveling until Monday so further responses might be delayed.

Jim
Reply all
Reply to author
Forward
0 new messages