FASTA to BAM for IGV visualization

Joyce Wang

unread,

Nov 18, 2013, 7:45:45 PM11/18/13

to igv-...@googlegroups.com

Hi all,

Does anyone know how to make a .fasta file into a .bam file so that we can incorporate two reference genomes into IGV?

Thanks

Joyce

Stéphane Plaisance

unread,

Nov 19, 2013, 2:09:11 AM11/19/13

to igv-...@googlegroups.com

BedTools has a bamToFastq that does half of the job, you can then keep every first two lines to extract the fasta like in

http://edwards.sdsu.edu/labsite/index.php/robert/289-how-to-convert-fastq-to-fasta

S

--

---
You received this message because you are subscribed to the Google Groups "igv-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to igv-help+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Joyce Wang

unread,

Nov 19, 2013, 11:12:36 AM11/19/13

to igv-...@googlegroups.com

Stéphane thanks for your reply. The website tells me how to convert FASTQ to FASTA...?

Do you know how to convert .fasta to .bam?

Thanks

Joyce

--

---
You received this message because you are subscribed to a topic in the Google Groups "igv-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/igv-help/8nyE659YTss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to igv-help+u...@googlegroups.com.

Stéphane Plaisance

unread,

Nov 19, 2013, 1:51:14 PM11/19/13

to igv-...@googlegroups.com

Hi Joyce,

fasta represent sequence only while SAM/BAM is intended to add the mapping location for that sequence as well as the quality scores for base calling. If you do not have the coordinates to map the sequence not the quality, it has little value to make your fasta into a SAM/BAM.

It is probably possible to construct a bam record with no coordinate like for unmapped reads but I do not know what you could do with such data.

If you are still interested, you will need some awk or perl and read the sam documentation to figure out what to put in the additional fields

samtools specs: v1.4 link

good luck

Stephane

Joyce Wang

unread,

Nov 19, 2013, 4:54:07 PM11/19/13

to igv-...@googlegroups.com

Thanks Stephane, this is going to be fun! >_<

Have a nice day!

Joyce

Amit Kumar

unread,

Jan 21, 2014, 4:13:44 AM1/21/14

to igv-...@googlegroups.com

Hi

I am in urgent need to convert fata file to either bam or fastq format.

I have tried even anline conversions but I could not succeed. I need your help to convert my files.

I hope a positive response from your side.

Amit Kumar

unread,

Jan 21, 2014, 4:14:23 AM1/21/14

to igv-...@googlegroups.com

*fasta files i mean to say.

Keith Mewis

unread,

Nov 6, 2014, 7:10:12 PM11/6/14

to igv-...@googlegroups.com

Hi Joyce,

Did you manage to figure this out? I have some assembled metagenomic data in FASTA format (thus no reference sequences to align them to) and would like to view them with a tool that only takes BAM files.

Any help would be appreciated!

Keith

Joyce Wang

unread,

Nov 6, 2014, 8:28:48 PM11/6/14

to igv-...@googlegroups.com

Hi Keith,

Sorry I never figured this one out. Is this new data from an Illumina sequencer?

I was working with a scaffold sequence (literally 1 really long contig). If you can explain your situation a bit more, maybe other igv users can help you and I'll check with my colleagues tomorrow :)

Cheers

Joyce

To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/06e3f61a-ae8b-4ddd-a68d-3827baeed120%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jim Robinson

unread,

Nov 6, 2014, 8:49:04 PM11/6/14

to igv-...@googlegroups.com

Hi,

I'm at a loss to understand what it is you want to visualize. A sketch might be helpful.

Jim

To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/CAJzJ2D8fcQf8MUsJTWtSnFt%3DuPyBhwJA8hwRB%2BefKy7bANRp7A%40mail.gmail.com.

Keith Mewis

unread,

Nov 6, 2014, 8:56:07 PM11/6/14

to igv-...@googlegroups.com

Hi Jim and Joyce,

There is a new tool by Bernard Henrissat (http://bioinformatics.oxfordjournals.org/content/early/2014/10/28/bioinformatics.btu716.short) that will predict PULs (a genomic locus/operon) based on sequence data. It uses JBrowse and will accept BAM file inputs. I have an assembly that I would like to use in this tool that is currently in a .fasta file, and hence not recognized by JBrowse.

I know a .bam file contains more information than a .fasta (alignment information, quality scores maybe?) but given this is a metagenomic assembly (75,000 contigs of length ranging from 2kb to 85kb, assembled from my environment of interest), I wouldn't know what to align it to. I'd be fine with a .bam file with "empty" (or equivalent) fields of the information not found in the .fasta.

Any help would be appreciated!

Keith

Jim Robinson

unread,

Nov 6, 2014, 9:27:07 PM11/6/14

to igv-...@googlegroups.com

Keith,

The problem is a fasta file contains only sequence, so the most you can ever see from a fasta file is a string of characters. You can actually "load" this file in IGV as a genome "Genome > Load from File...". I don't know what file format the PULs take, I didn't see that from a quick scan of the paper, but they are in essences annotations of the reference, so you would load these annotations from the "file" menu after defining your reference genome by loading the fasta from the "genome" menu. Does that make sense?

If you could post a small example "PULs" file I might be able to assist further.

Jim

To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/1c9e10ac-8415-4950-b63e-63aef9fa0189%40googlegroups.com.

Keith Mewis

unread,

Nov 6, 2014, 11:45:09 PM11/6/14

to igv-...@googlegroups.com

Thanks for the assistance, Jim!

The JBrowse format says it accepts .bam files, the only mention of using my own data in that paper comes at the very end of section 3.2: "Finally, the JBrowse engine also allow loading the user's own expression data, such as short-reads from BAM files." I'm not super familiar with .bam files, but I think they're files that tell me alignment parameters to a reference genome, no? In my case, it is metagenomic data from forest soils (bacterial mostly), so there is no defined reference genomes to align them to. I'm also not familiar with IGV, but I would guess that when I load a .fasta into IGV it performs some sort of gene prediction to be able to display information on those tracks.

I just noticed it will accept gff3 files. I used Prodigal to make a .gff file of my data and will try that. I really appreciate your assistance and effort to help!

Regards,

Keith

Jim Robinson

unread,

Nov 7, 2014, 12:22:18 AM11/7/14

to igv-...@googlegroups.com

Hi,

If you have a gff file then by definition you have a reference sequence. Assuming the reference you used to create the gff is a fasta file load that first into igv from the "genome" menu, then load the gff3 from the "file" menu.

To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/9abc8163-e53e-4ab9-94de-6a8f6909d8c6%40googlegroups.com.

Keith Mewis

unread,

Nov 7, 2014, 1:30:27 AM11/7/14

to igv-...@googlegroups.com

Hmmm, if by reference sequence you mean my original fasta, then yes. I input my assembled metagenome (original fasta file) into Prodigal and it predicts ORFs and outputs a .gff file. I understand that the .gff does not contain sequence data though. I still do not have a "reference genome" (for example a sequenced isolate genome from NCBI or something) to which my original metagenome reads (or assembly in this case) were aligned to create a .bam file.

Regardless, it appears the .gff file doesn't work in this case - perhaps it needs different annotation to have the PULs show up. I have emailed the author on the paper to ask if what I'm trying to do is possible.

Once again, thank you very much for your help! You are too kind :)

To view this discussion on the web visit <a moz-do-not-send="true" href="https://groups.google.com/d/msgid/igv-help/CAJzJ2D8fcQf8MUsJTWtSnFt%3DuPyBhwJA8hwRB%2BefKy7bANRp7A%40mail.gmail.com?utm_medium=email&utm_source=footer" target="_blank"
...

Jim Robinson

unread,

Nov 7, 2014, 6:38:38 AM11/7/14

to igv-...@googlegroups.com

Hi,

Yes I see the confusion in terms, in the context of a genome browser we should really speak of a reference sequence, not a "genome", the terminology is a little loose here.

If you want to zip and email your fasta and gff file to us I will look at it. You can send it to igv-team (at) broadinstitute.org. If its too large for email just send a sample of each file. Also, if you can describe what you mean by "doesn't work", perhaps with a screenshot that would be helpful.

I will be traveling until Monday so further responses might be delayed.

Jim

To view this discussion on the web visit https://groups.google.com/d/msgid/igv-help/934cad40-20ef-4705-abb4-70dc043a5ee1%40googlegroups.com.

Reply all

Reply to author

Forward