How to download annotation for all jaspar core profiles?

carina...@gmail.com

unread,

Mar 23, 2020, 11:24:28 PM3/23/20

to JASPAR Q&A Forum

Hello,

I would like to download the annotation for all Jaspar core profiles. Something like this for each profile:

Name: TGA1A

Matrix ID: MA0129.1

Class: Basic leucine zipper factors (bZIP)

Family:

Collection: CORE

Taxon: Plants

Species: Nicotiana sp.

Data Type: SELEX

Validation: 10561063

Uniprot ID: P14232

Source:

Comment:

Is there a way to do it?

Best,

Paola

Anthony Mathelier

unread,

Mar 24, 2020, 3:50:34 AM3/24/20

to carina...@gmail.com, JASPAR Q&A Forum

Dear Paola,

All these information can be retrieved programmatically. You can for instance use our Biopython JASPAR module (see https://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc218) or our REST API (see https://academic.oup.com/bioinformatics/article/34/9/1612/4747882).

Best
AM

--
You received this message because you are subscribed to the Google Groups "JASPAR Q&A Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jaspar+un...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/jaspar/03ba3629-786b-4c25-a7eb-fef50f7ab1ba%40googlegroups.com.

-- 
Anthony

Jaime Castro

unread,

Mar 24, 2020, 10:44:26 AM3/24/20

to JASPAR Q&A Forum

Hi Paola

I sent you the code I used to extract these features using the API in R.

The code is tricky, because there are nested lists, so it requires a lot of data manipulation. but it works.

change the taxon variable if you want to use with other taxa.

Let us know if this works for you

Jaime

##################################################################

library("dplyr")
library("data.table")
library("jsonlite")
library("purrr")

taxon <- "Vertebrates"
jaspar.url <- paste0("http://jaspar.genereg.net/api/v1/matrix/?page_size=800&collection=CORE&tax_group=", taxon, "&version=latest&format=json")
result <- fromJSON(jaspar.url)

all.profiles.info <- sapply(result$results$matrix_id, function(id){

indiv.jaspar.url <- paste0("http://jaspar.genereg.net//api/v1/matrix/", id,"/format=json")
indiv.mat.info <- fromJSON(indiv.jaspar.url)

indiv.mat.info

})

## Fields:
# names(all.profiles.info)
#
# [1] "pubmed_ids" "description" "family" "pfm" "tax_group" "matrix_id" "sequence_logo" "remap_tf_name"
# [9] "pazar_tf_ids" "versions_url" "collection" "base_id" "class" "tffm" "tfe_ids" "name"
# [17] "tfbs_shape_id" "uniprot_ids" "sites_url" "species" "alias" "version" "unibind" "type"
# [25] "symbol"

## Vertebrates: 746 profiles
all.profiles.info.subset <- map(all.profiles.info, `[`, c("name", "matrix_id", "class", "family", "tax_group", "species", "type", "pubmed_ids", "uniprot_ids"))

## Species is a nested list with two entries: name and tax_id.
## They must be processed separately
species.df <- data.frame( species = do.call(rbind, lapply(map(all.profiles.info.subset, c("species", "name")), paste, collapse = "::") ))
tax.id.df <- data.frame( tax_id = do.call(rbind, lapply(map(all.profiles.info.subset, c("species", "tax_id")), paste, collapse = "::") ))

## Family/Class/Uniprot_ids may contain two or more entries (e.g., dimers), therefore, they must be processed separately
family.df <- data.frame( family = do.call(rbind, lapply(map(all.profiles.info.subset, "family"), paste, collapse = "::") ))
class.df <- data.frame( class = do.call(rbind, lapply(map(all.profiles.info.subset, "class"), paste, collapse = "::") ))
uniprot.df <- data.frame( uniprot_ids = do.call(rbind, lapply(map(all.profiles.info.subset, "uniprot_ids"), paste, collapse = "::") ))

## some profiles may contain two pubmed ids (this is a mistake when we curated the database). To avoid problems, we concatenate them
## But this problem must be fixed for future releases
pubmed.df <- data.frame( pubmed_ids = do.call(rbind, lapply(map(all.profiles.info.subset, "pubmed_ids"), paste, collapse = "::") ))

## Conver list to data.frame
all.profiles.info.subset <- map(all.profiles.info.subset, `[`, c("name", "matrix_id", "tax_group", "type"))
all.profiles.info.tab <- rbindlist(all.profiles.info.subset)

## Concat all the data.frames
all.profiles.info.tab.clean <-
cbind(all.profiles.info.tab, species.df, tax.id.df, class.df, family.df, uniprot.df, pubmed.df) %>%
dplyr::select(name, matrix_id, class, family, tax_group, species, tax_id, type, uniprot_ids, pubmed_ids)

fwrite(all.profiles.info.tab.clean, sep = "\t", file = paste0("Jaspar_2020_", taxon,"_table.tab"))

Ruo Kery

unread,

Nov 13, 2020, 5:51:37 AM11/13/20

to JASPAR Q&A Forum

Hi Paola,

using the JASPAR2020 R library to extract annotations for all TFs, it takes minor steps to achieve this goal.

# retreiveing all annotations for CORE+vertebrate as follows

library( JASPAR2020)

library(TFBSTools)

opts<-list()

opts[["tax_group"]] = "vertebrates"

opts[["collection"]] ="CORE"

JASPAR_PFMatrixList = getMatrixSet(JASPAR2020, opts)

## JASPAR_PFMatrixList is an object of PFMatrixList

## extract TF IDs as follows

ID(x)

## extract TF name as follows

name(x)

## extract TF name as follows

tags(x)

Reply all

Reply to author

Forward