Help with PPI network construction, diffeentially expressed genes, and using cytoHubba (total beginner).

126 views
Skip to first unread message

Caitlin Gibbons

unread,
Nov 20, 2021, 7:57:39 PM11/20/21
to cytoscape-helpdesk
Hi all.

I am working on a course project and I need help. I used GEO2R to obtain three tab delimited Excel files containing differentially expressed genes (DEGs) . Since these DEGs are in three separate files, I am assuming I could "simply" copy and paste them into another column then copy a sufficient number of genes to create a PPI network that doesn't look like an impenetrable ball of yarn ;)

I would then like to obtain the interaction file for use with cytoHubba in an effort to identify the top 10, or so, genes assuming they will be of maximal clinical interest.

I am using a Windows 10 machine and Cytoscape 3.9.0 and all apps are current.

Thank you.

~Caitlin

Scooter Morris

unread,
Dec 2, 2021, 11:21:08 AM12/2/21
to cytoscape-helpdesk
Hi Caitlin,

Well, at the risk of taking away the fun you'll have figuring this out yourself (course project and all...) we'll get you started.  So, assuming you have three different GEO datasets making up the 3 files, yes, you could certainly paste them together, or you could just import them directly into cytoscape.  That would involve taking all of the genes, then using those genes to generate a PPI network, then importing your expression data on top of that.  And yes, you'll definitely get a hairball from that.  A better approach would be to put all of your data into one spreadsheet and then filter your data so that only genes "of interest" are included.  Of course, the definition of "of interest" depends on the biological question.  For example, are you interested in genes that have correlated expression across all three datasets, or is there a more focused question around a particular disease or pathway?  In any case, you would certainly start by discarding any genes that aren't significant (e.g. p<.05) and any whose fold change is not substantial (<-1 or > 1).  That should trim down the list of genes to a more reasonable number.  Then, depending on your biological question, you might want to further filter any genes that don't have significance across all three experiments.  Once you get a smaller set of genes, you can then use stringApp or IntActApp or your favorite connection to a PPI database to pull down a PPI network.

I would recommend going through this tutorial: https://cytoscape.org/cytoscape-tutorials/protocols/rna-seq-data-analysis/#/ to get started.

-- scooter
Reply all
Reply to author
Forward
0 new messages