Hello everyone,
I would like to ask a question about something I encountered using RCy3.
I have a network I would like to visualize under multiple circumstances, using multiple (20+) datasets.
I have written an RCy3 script which reads in the base network and visualizes it. I have set up a loop which iterates over my data files, and adds them to the network with the loadTableData command. Essentially, these are just gene identifiers with a p.value.
What my script does/I would like it to do:
- read in dataset file
- highlight the IDs in the networks, select their first neighbours
- create a new subnetwork out of this selection
- visualize the p.value using a continuous mapping for size and colour
- save a snapshot
- revert to the base network
- delete added columns
My problem is: when I add/delete columns something happens to the p.values, I get back values and IDs that aren't even in my original input files.
Here's an example. This is one of my input files:
Node name p p-corr
SL1344_2856 None 6.3101e-6 3.7861e-5
SL1344_3325 None 5.9315e-6 3.7861e-5
SL1344_3323 None 5.9315e-6 3.7861e-5
SL1344_0069 None 4.3396e-6 3.7861e-5
SL1344_0057 None 1.0797e-3 3.2392e-3
SL1344_0055 None 1.0797e-3 3.2392e-3
SL1344_1325 None 9.0257e-4 3.2392e-3
SL1344_0056 None 7.2570e-4 3.2392e-3
SL1344_0071 None 1.9829e-3 5.2878e-3
SL1344_0075 None 2.5288e-3 6.0690e-3
SL1344_0072 None 4.5240e-3 9.8705e-3
SL1344_1591 None 5.8982e-3 1.1796e-2
SL1344_4525 None 6.7170e-3 1.2401e-2
SL1344_3358 None 2.8175e-2 4.8300e-2
SL1344_0074 None 4.2529e-2 6.8047e-2
SL1344_0076 None 5.0603e-2 7.1440e-2
SL1344_0070 None 5.0603e-2 7.1440e-2
SL1344_3433 None 5.8645e-2 7.8193e-2
SL1344_0077 None 6.6333e-2 7.9600e-2
SL1344_0068 None 6.6333e-2 7.9600e-2
SL1344_0067 None 7.3993e-2 8.4563e-2
SL1344_1169 None 2.8198e-1 2.9424e-1
SL1344_0079 None 2.7181e-1 2.9424e-1
What I get as the output in Cytoscape for the same file, after loading the table through RCy3 with the loadTableData command:
shared name p p.corr
SL1344_1695 0.35802 0.46542
SL1344_1169 0.28198 0.37598
SL1344_4525 0.14795 0.20245
SL1344_0610 0.1103 0.15502
SL1344_1747 0.1103 0.15502
SL1344_0608 0.088909 0.13598
SL1344_0611 0.081517 0.12845
SL1344_0067 0.073993 0.12412
SL1344_0068 0.066333 0.12412
SL1344_0182 0.073993 0.12412
SL1344_0183 0.066333 0.12412
SL1344_3875 0.037434 0.12412
SL1344_3081 0.073993 0.12412
SL1344_3302 0.042529 0.12412
SL1344_2856 0.07371 0.12412
SL1344_3303 0.042529 0.12412
SL1344_1591 0.027056 0.12412
SL1344_4008 0.069442 0.12412
SL1344_3358 0.0062351 0.040528
SL1344_0675 0.0022262 0.019293
SL1344_3433 0.0014047 0.018262
SL1344_3469 5.72E-04 0.014864
SL1344_3332 1.54E-04 0.0080026
Not only are they different IDs, but they have completely different p.corr values as well. In fact, if I grep my input file directory for any of these values I get no hits back.
If I import the table manually I get the correct values back.
Here is my script:
library(RCy3)
library(RColorBrewer)
# -------- Setting up --------
setwd("/Users/olbeim/Documents/ContextTool_analysis/2_pipeline/visualization/")
salmonet <- read.csv("test_network.csv", header = TRUE, sep=',', stringsAsFactors = FALSE)
nodelist <- unique(c(salmonet$Node_1_locus_tag, salmonet$Node_2_locus_tag))
nameTable <- read.csv("/Users/olbeim/Documents/ContextTool_analysis/2_pipeline/lt_to_genename", sep='\t', header=TRUE, stringsAsFactors = FALSE)
nodes <- data.frame(id=nodelist, # integers #load data
stringsAsFactors=FALSE)
edges <- data.frame(source=salmonet$Node_1_locus_tag,
target=salmonet$Node_2_locus_tag,
layer = salmonet$Layer,# numeric
stringsAsFactors=FALSE)
createNetworkFromDataFrames(nodes,edges, title="SalCom network", collection="SalCom") #load network
loadTableData(nameTable, data.key.column = "LT", table.key.column = "name")#add gene names
setNodeColorDefault('#C0C0C0') # set to gray
setNodeLabelMapping(table.column = 'Gene') #set to gene name
fitContent() #make sure everyting fits
# -------- CHAT output pcorr size -------
files = list.files("~/Documents/ContextTool_analysis/3_output/CHAT_output/nodes25/", pattern="salcom_*", recursive=FALSE, full.names=TRUE) #read data files
for (i in files) {
nameTable <- read.csv(file = i, sep='\t', header=TRUE, stringsAsFactors = FALSE)
loadTableData(nameTable, data.key.column = "Node", table.key.column = "shared name") #add data table to network
selectNodes(nameTable$Node, by.col = "id", preserve.current.selection = FALSE)
setNodeSizeMapping('p.corr', c(min(nameTable$p.corr), max(nameTable$p.corr)), c(75, 20))
setNodeColorMapping('p.corr', c(min(nameTable$p.corr), max(nameTable$p.corr)), c('#5577FF','#FFFFFF'))
selectFirstNeighbors(direction="any")
createSubnetwork(nodes="selected",
subnetwork.name = i)
clearSelection()
fitContent()
exportImage(filename = paste(i,'_analysis.JPEG', sep=""), resolution = 72, zoom = 500)
setCurrentNetwork(52) #revert to original network using its SUID
deleteTableColumn(column = "Node")
deleteTableColumn(column = "p")
deleteTableColumn(column = "p.corr") #delete columns before reading in the next data file
}
I assume something goes wrong when I'm adding the tables. I realize this has been a long post, I am just trying to be as transparent with my problem as I possibly can.
I would really appreciate if someone could help me out with this. I am really struggling to understand why those IDs and pvalues show up when a., they shouldn't b., some of them don't exist.
I wonder, is it possible that it has something to do with the kind of information (e.g. string, float etc.) the columns are set to by default?
Best wishes,
Marton