Download Spades Assembler

0 views

Skip to first unread message

Message has been deleted

Keena Wiegert

unread,

Jul 11, 2024, 1:43:24 AM7/11/24

to cordesofo

How to Download and Use SPAdes Assembler

SPAdes - St. Petersburg genome assembler - is an assembly toolkit that contains various assembly pipelines for different types of sequencing data. It was originally developed for de novo assembly of bacterial and viral genomes from single-cell or isolate samples, but it has been extended to support metagenomic, plasmid, transcriptomic, and biosynthetic gene cluster assembly as well. SPAdes can also perform hybrid assembly using short reads (Illumina or IonTorrent) and long reads (PacBio, Oxford Nanopore, or Sanger). SPAdes is one of the most widely used assemblers in the field, and it has several advantages over other assemblers, such as:

It can handle complex repeat structures and large genome variations.

It can produce high-quality assemblies with low error rates and high gene completeness.

It can assemble genomes from low-coverage or unevenly distributed data.

It can assemble multiple genomes from mixed samples.

It can assemble novel sequences that are not present in reference genomes.

In this article, I will show you how to download and use SPAdes assembler for your own genome assembly projects. I will cover the following topics:

download spades assembler

Download File https://vlyyg.com/2yNKaJ

How to download SPAdes binaries or source code for Linux or Mac.

How to verify your installation and run a self-test.

How to provide input data and command line options for different assembly pipelines.

How to evaluate the output files and statistics.

By the end of this article, you should be able to perform de novo genome assembly using SPAdes with confidence and ease. Let's begin!

Downloading SPAdes

The first step is to download SPAdes from its official website: http://cab.spbu.ru/software/spades/. You can choose to download either the pre-compiled binaries or the source code, depending on your operating system and preference. The latest version of SPAdes is 3.15.5, which was released on July 14th, 2022 under GPLv2 license.

Downloading SPAdes binaries for Linux

If you are using a Linux system (64-bit only), you can download the pre-compiled binaries from the website. The file name is SPAdes-3.15.5-Linux.tar.gz. You can use the following command to download it:

wget http://cab.spbu.ru/files/release3.15.5/SPAdes-3.15.5-Linux.tar.gz

Alternatively, you can use a web browser to download it manually. After downloading, you need to extract the file using the following command:

tar -xzf SPAdes-3.15.5-Linux.tar.gz

This will create a folder named SPAdes-3.15.5-Linux, which contains the executable files and other resources for SPAdes.

Downloading SPAdes binaries for Mac

If you are using a Mac system (64-bit only), you can download the pre-compiled binaries from the website as well. The file name is SPAdes-3.15.5-Darwin.tar.gz. You can use the following command to download it:

wget http://cab.spbu.ru/files/release3.15.5/ SPAdes-3.15.5-Darwin.tar.gz

Alternatively, you can use a web browser to download it manually. After downloading, you need to extract the file using the following command:

tar -xzf SPAdes-3.15.5-Darwin.tar.gz

This will create a folder named SPAdes-3.15.5-Darwin, which contains the executable files and other resources for SPAdes.

Downloading SPAdes source code

If you prefer to compile SPAdes from source code, or if you are using a different operating system, you can download the source code from the website as well. The file name is SPAdes-3.15.5.tar.gz. You can use the following command to download it:

wget http://cab.spbu.ru/files/release3.15.5/SPAdes-3.15.5.tar.gz

Alternatively, you can use a web browser to download it manually. After downloading, you need to extract the file using the following command:

tar -xzf SPAdes-3.15.5.tar.gz

This will create a folder named SPAdes-3.15.5, which contains the source code and other resources for SPAdes.

To compile SPAdes from source code, you need to have some prerequisites installed on your system, such as CMake, GCC, Python 2 or 3, zlib, bzip2, and Boost libraries. You can check the detailed instructions on how to install these prerequisites on the SPAdes website: http://cab.spbu.ru/software/spades/#prereq. Once you have installed the prerequisites, you can use the following commands to compile SPAdes:

cd SPAdes-3.15.5 ./spades_compile.sh

This will create an executable file named spades.py in the bin folder.

Installing SPAdes

After downloading and extracting (or compiling) SPAdes, you need to install it on your system. The installation process is very simple and straightforward. You just need to add the bin folder of SPAdes to your system's PATH variable, so that you can run SPAdes from any directory.

Installing SPAdes on Linux

If you are using a Linux system, you can add the bin folder of SPAdes to your PATH variable by editing your .bashrc file (or equivalent) in your home directory. You can use the following command to open the file with a text editor (such as nano):

nano /.bashrc

Then, add the following line at the end of the file (replace /path/to/SPAdes-3.15.5-Linux/bin with the actual path of your SPAdes bin folder):

export PATH=$PATH:/path/to/SPAdes-3.15.5-Linux/bin

Save and close the file, and then run the following command to apply the changes:

source /.bashrc

You can now run SPAdes from any directory by typing spades.py.

Installing SPAdes on Mac

If you are using a Mac system, you can add the bin folder of SPAdes to your PATH variable by editing your .bash_profile file (or equivalent) in your home directory. You can use the following command to open the file with a text editor (such as nano):

nano /.bash_profile

Then, add the following line at the end of the file (replace /path/to/SPAdes-3.15.5-Darwin/bin with the actual path of your SPAdes bin folder):

export PATH=$PATH:/path/to/SPAdes-3.15.5-Darwin/bin

Save and close the file, and then run the following command to apply the changes:

source /.bash_profile

You can now run SPAdes from any directory by typing spades.py.

Verifying SPAdes installation and running a self-test

After installing SPAdes, you should verify that it works properly on your system. You can do this by running a self-test that comes with SPAdes. The self-test will run SPAdes on a small dataset and check if the output matches the expected results.

To run the self-test, you need to go to the test folder of SPAdes, which is located inside the main SPAdes folder. You can use the following command to go there:

cd /path/to/SPAdes-3.15.5/test

Then, you can run the self-test by typing:

./spades.py --test

This will launch SPAdes in test mode and run it on a small dataset of E. coli reads. The test will take a few minutes to complete, and it will generate some output files in a folder named spades_test. You should see something like this at the end of the test:

===== Test passed OK =====

This means that SPAdes ran successfully and produced the correct output. If you see any errors or warnings, you should check the log file (spades.log) for more details and troubleshoot the problem.

Running SPAdes

Now that you have installed and verified SPAdes, you are ready to use it for your own genome assembly projects. To run SPAdes, you need to provide some input data and some command line options for different assembly pipelines.

Providing input data

The input data for SPAdes are sequencing reads from one or more samples. SPAdes can handle various types of reads, such as:

Illumina paired-end (PE) or mate-pair (MP) reads.

IonTorrent PE or MP reads.

PacBio single-molecule real-time (SMRT) reads.

Oxford Nanopore MinION or GridION reads.

Sanger reads.

Mixed reads from different sources.

You need to specify the type and format of your input reads using different command line options. The most common options are:

Option	Description

-1 <filename> The file name with forward PE reads (in FASTQ or FASTA format).

-2 <filename> The file name with reverse PE reads (in FASTQ or FASTA format).

--s1 <filename> The file name with unpaired reads (in FASTQ or FASTA format).

--pacbio <filename> The file name with PacBio SMRT reads (in FASTQ or FASTA format).

--nanopore <filename> The file name with Oxford Nanopore reads (in FASTQ or FASTA format).

--sanger <filename> The file name with Sanger reads (in FASTQ or FASTA format).

--pe1-12 <filename> The file name with interlaced forward and reverse PE reads (in FASTQ or FASTA format).

--mp1-12 <filename> The file name with interlaced forward and reverse MP reads (in FASTQ or FAST A format).

You can use multiple options to provide reads from different sources or libraries. For example, if you have PE reads from Illumina and SMRT reads from PacBio, you can use the following options:

-1 illumina_pe_1.fastq -2 illumina_pe_2.fastq --pacbio pacbio_smrt.fastq

You can also use the --dataset <filename> option to provide a YAML file that describes your input data in more detail. For example, you can specify the library type, orientation, insert size, quality offset, and coverage for each file. You can find more information on how to create a YAML file on the SPAdes website: http://cab.spbu.ru/software/spades/#dataset.

Choosing command line options for different assembly pipelines

The next step is to choose the appropriate command line options for the assembly pipeline that suits your data and goal. SPAdes has several assembly pipelines for different types of data, such as:

--sc: Single-cell assembly pipeline for bacterial or viral genomes from single-cell or isolate samples.

--meta: Metagenomic assembly pipeline for mixed microbial communities.

--plasmid: Plasmid assembly pipeline for plasmid detection and extraction.

--rna: Transcriptomic assembly pipeline for RNA-Seq data.

--isolate: Isolate assembly pipeline for bacterial or viral genomes from isolate samples.

--moleculo: Moleculo assembly pipeline for long synthetic reads from Moleculo technology.

--bga: Biosynthetic gene cluster assembly pipeline for secondary metabolite gene clusters.

You can use one of these options to run the corresponding pipeline, or you can omit them to run the default pipeline, which is suitable for most cases. For example, if you want to assemble a bacterial genome from single-cell data, you can use the following option:

--sc

If you want to assemble a metagenomic sample from mixed reads, you can use the following option:

--meta

If you want to assemble a transcriptome from RNA-Seq data, you can use the following option:

--rna

In addition to these pipeline options, you can also use some other options to customize your assembly process, such as:

-k <value>: The k-mer size to use for assembly. You can specify a single value (e.g. -k 21) or a comma-separated list of values (e.g. -k 21,33,55). The default value is auto, which means that SPAdes will choose the optimal k-mer size based on your data.

-t <value>: The number of threads to use for assembly. The default value is 16.

-m <value>: The amount of RAM to use for assembly in GB. The default value is 250.

--careful: The option to run SPAdes in careful mode, which will reduce the number of mismatches and short indels in the resulting assembly.

--only-assembler: The option to run only the assembly module of SPAdes, without performing error correction or read mapping.

--continue: The option to resume a previously interrupted run of SPAdes from the last available checkpoint.

You can find more information on the available command line options on the SPAdes website: http://cab.spbu.ru/software/spades/#manual.

Evaluating SPAdes output

After running SPAdes, you will get some output files and statistics in a folder named after your project. For example, if you run SPAdes with the following command:

./spades.py -1 illumina_pe_1.fastq -2 illumina_pe_2.fastq --pacbio pacbio_smrt.fastq -o my_project

You will get a folder named my_project, which contains the following files and subfolders:

File or subfolder	Description

spades.log The log file that records the progress and status of SPAdes.

params.txt The file that contains the parameters and options used for SPAdes.

dataset.info The file that contains the information about the input data.

corrected/ The subfolder that contains the error-corrected reads.

mismatch_corrector/ The subfolder that contains the mismatch-corrected contigs and scaffolds.

K21/ K33/ K55/ .../ The subfolders that contain the intermediate assemblies for each k-mer size.

scaffolds.fasta The final assembly file that contains the scaffolds (sequences with gaps).

contigs.fasta The final assembly file that contains the contigs (sequences without gaps).

assembly_graph.fastg The final assembly graph file in FASTG format.

scaffolds.paths The file that contains the paths of contigs in scaffolds.

contigs.paths(#message) Continue writing the article. [assistant](#message) The file that contains the paths of edges in contigs.

spades.yaml The file that contains the summary statistics and quality metrics of the final assembly.

To evaluate the quality and accuracy of your assembly, you can look at some of these output files and statistics. For example, you can check the following metrics:

The number and length of scaffolds and contigs. You can use tools like QUAST or MetaQUAST to generate a comprehensive report on these metrics.

The N50 and NG50 values of scaffolds and contigs. These are measures of contiguity and completeness of your assembly. The higher the values, the better the assembly. You A: You can cite SPAdes using the following reference: Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology. 2012 May;19(5):455-77. doi: 10.1089/cmb.2012.0021. You can also use the BibTeX format: @articlebankevich2012spades, title=SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing, author=Bankevich, Anton and Nurk, Sergey and Antipov, Dmitry and Gurevich, Alexey A and Dvorkin, Mikhail and Kulikov, Alexander S and Lesin, Vladislav M and Nikolenko, Sergey I and Pham, Son and Prjibelski, Andrey D and Pyshkin, Alexey V and Sirotkin, Alexander V and Vyahhi, Nikolay and Tesler, Glenn and Alekseyev, Max A and Pevzner, Pavel A, journal=Journal of Computational Biology, volume=19, number=5, pages=455--477, year=2012, publisher=Mary Ann Liebert Inc
Q: How do I get help or report a bug for SPAdes?

A: You can get help or report a bug for SPAdes by contacting the developers via email or GitHub. The email address is spades....@cab.spbu.ru. The GitHub repository is https://github.com/ablab/spades. You can also check the FAQ section on the SPAdes website for some common questions and answers: http://cab.spbu.ru/software/spades/#faq.

Q: How do I update SPAdes to the latest version?

A: You can update SPAdes to the latest version by downloading the new binaries or source code from the SPAdes website: http://cab.spbu.ru/software/spades/. You can also use the --check-for-updates option when running SPAdes to check if there is a new version available.

Q: How do I uninstall SPAdes from my system?

A: You can uninstall SPAdes from your system by deleting the SPAdes folder and removing it from your PATH variable. You can also delete any output files or folders that you have created with SPAdes.

Download Spades Assembler

Keena Wiegert

How to Download and Use SPAdes Assembler

download spades assembler

Downloading SPAdes

Downloading SPAdes binaries for Linux

Downloading SPAdes binaries for Mac

Downloading SPAdes source code

Installing SPAdes

Installing SPAdes on Linux

Installing SPAdes on Mac

Verifying SPAdes installation and running a self-test

Running SPAdes

Providing input data

Choosing command line options for different assembly pipelines

Evaluating SPAdes output

Q: How do I get help or report a bug for SPAdes?

Q: How do I update SPAdes to the latest version?

Q: How do I uninstall SPAdes from my system?