rTASSEL ABH and numeric format

15 views
Skip to first unread message

Eduardo Beche

unread,
Apr 11, 2024, 4:34:56 AMApr 11
to TASSEL - Trait Analysis by Association, Evolution and Linkage
It is possible in rTASSEL create ABH and numeric files from genotype file? Similar from TASSEL?

Brandon Monier

unread,
Apr 11, 2024, 12:48:10 PMApr 11
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi Eduardo,

Yes - when you import your genotype data, you can use R's base as.matrix() function on the genotype table. This will convert the genotype table data into a numeric R matrix, which is encoded as 0, 1, and 2. The numbers represent the number of alternate alleles found in that observation (e.g. 0 - homozygous ref, 1 - het, 2 - homozygous alt). See example below:

library(rTASSEL)

g <- readGenotypeTableFromPath("your/path/here")

gNum <- as.matrix(g)

For ABH file creation, we do not have a wrapper built for that plugin, but we can directly access it through the Java API for now. See below for an example script:

## LOAD LIBRARIES ----
library(rTASSEL) # to get TASSEL API
library(rJava)   # lower level Java to R bridge code (comes with rTASSEL)


## PARAMETERS ----
# Input genotype data
inputFile    <- "example_input.hmp.txt"

# Output filename for results (will be written as CSV)
outFile      <- "example_abh_output.csv"

# A plain text file containing the name of all samples from "parent A" as they
# are found in the input filename. One name per line.
parentA      <- "parent_a_file.txt"

# A plain text file containing the name of all samples from "parent B" as they
# are found in the input filename. One name per line.
parentB      <- "parent_b_file.txt"

# Output format. Can only be "c", "i", or "r".
# IF "c", output will be A, H,   B for parent A, het, and parent B
# IF "i", output will be 0, 1  , 2 for parent A, het, and parent B,
# IF "r", output will be 0, 0.5, 1 for parent A, het, and parent B.
#
# NOTE: This field it not required. If absent, the default is "c" - output
#       will be in the form of A,H,B.
outputFormat <- "c"


## PROCEDURE ----
# API reference path to plugin
gToAbhRef <- "net.maizegenetics.analysis.data.GenosToABHPlugin"

# Prepare output format (c, i, or, r)
outputEnumFormat <- switch(
    EXPR = outputFormat,
    "c"  = J(gToAbhRef)$OUTPUT_CHECK$c,
    "i"  = J(gToAbhRef)$OUTPUT_CHECK$i,
    "r"  = J(gToAbhRef)$OUTPUT_CHECK$r
)
if (is.null(outputEnumFormat)) {
    stop("Format can only be of type: 'c', 'i', or 'r'")
}

# Load genotype data and reference pointer
myGeno <- readGenotypeTableFromPath(inputFile)@jGenotypeTable
input <- rJava::J("net/maizegenetics/plugindef/DataSet")
input <- input$getDataSet(myGeno)

# Set plugin parameters
genoToAbhPlugin <- .jnew(gToAbhRef)
genoToAbhPlugin$outfile(outFile)
genoToAbhPlugin$parentA(parentA)
genoToAbhPlugin$parentB(parentB)
genoToAbhPlugin$outputFormat(outputEnumFormat)

# Run plugin (results will be in output file path)
myDS <- genoToAbhPlugin$runPlugin(input)

Best,
Brandon M.
Reply all
Reply to author
Forward
0 new messages