excel

14 views

Skip to first unread message

Shiaoching Gong

unread,

Mar 16, 2022, 12:03:50 PM3/16/22

to gen...@soe.ucsc.edu

Good morning,

Could someone please let me know how to transfer the following transcription factors to excel file?

Thanks,

Shiaoching

Do you know how to get rid of the unreal transcription factors and keep the strong transcription factors? The above TFs are too many to be true. I need to focus on a few important ones. I also need to know how to put them in an excel file.

Thanks so much for your time and help.

Best,

Shiaoching

Brian Lee

unread,

Mar 16, 2022, 4:04:10 PM3/16/22

to Shiaoching Gong, gen...@soe.ucsc.edu

Dear Shiaoching,

Thank you for using the UCSC Genome Browser. It may help to know we have an archive of previously answered questions. Here is a link to the previous answers that include some of your own questions: https://groups.google.com/u/1/a/soe.ucsc.edu/g/genome/search?q=Shiaoching

You can search this archive by entering terms of interest. Please take a moment to review our archived list of previously answered mailing list questions before sending multiple similar questions, as you might find help from other investigators who have asked similar questions, or even find the resources shared previously that could answer your current question.

From your questions you have a gene MEF2C and you want to know how it is regulated by transcription factors. From the screenshots shared in one email you are looking at human data, and from another email you are also looking at the mouse genome and the Mef2C gene, and seeking the transcription factor binding sites (TFBS) in the mouse genome. In another message you have tracks you have found on human (TFBS Clusters), and you share that you are interested in filtering out a "strong" set and to put them in an excel file.

I will start by looking at the human TFBS clusters track, and share an option for filtering on score to subselect some items, and how to output this data into an excel loadable document. Then for the mouse question I'll share the JASPAR TFBS data as a source (available on both the mouse and human genomes), and how it can also be put into an excel loadable file.

PART1 Human TFBS clusters track

How to start from scratch to find tracks

Here are the step-by-step directions, please follow along in the browser as you read these steps. First in a new browser tab, reset your browser. Go to the main page, https://genome.ucsc.edu and go up to the "Genome Browser" menu and select the drop down "Reset All User Settings." Next go to the "Genomes" tab and select "Human GRCh38/hg38". Next find the "hide all" button just below the main view, this will hide all default tracks. Next go to the "Genome Browser" menu and select "Track Search." Here you can search for items of interest. Enter "TF" for transcription factor and then click the box for the "TF Clusters" and "JASPAR Transcription Factors" tracks. By clicking the box, the Visibility column should automatically change from "hide" where you can set both to "pack" and then click the "View in Browser" button.

How to view data in the region of interest and adjust display

Now you will be in the main browser view and have these two tracks on hg38 with transcription factor information. You can then enter your gene of interest, MEF2C, and click the "go" button on the far right. There will be several potential matches, the assumption is that you will want MEF2C (ENST00000424173.6) at chr5:88718241-88904257, click that link. Note you can adjust the view by using the "zoom in" and "zoom out" buttons. Click the zoom out "1.5x" button to see the entire MEF2C gene region.

After following those above steps, here is a session that you can open in a separate browser tab to see what should be matching results: http://genome.ucsc.edu/s/brianlee/hg38_MEF2C_zoomout

Now you can tailor this view if you wish. For instance, you can click the long grey box on the far right side of the display for the top gene track to arrive at the "GENCODE V39 Track Setting" page. On this page you can look at "Tagged Sets:" and click the box for "MANE only" which will reduce the transcripts to only transcripts with Matched Annotation from NCBI and EMBL-EBI (MANE). Then click "submit" this will reduce the gene track display and reduce the number of transcripts, which likely will be of interest. Next you can click into the grey box on the left-hand side to arrive at the "TF Clusters Track Settings" page. Here you could use the "Filter by factor" option to reduce the displayed TFs to a subset of specific interest. Also, take a moment here to read the Track Description page, it will explain many things about this data. Of importance, read in the "Methods" section the details about how the "score" value is calculated. You can use the score as a way to filter the TFBS to a stronger set as desired.

How to export the TFBS Clustered Data on the Table Browser to a file that will load in Excel.

While on the "TF Clusters Track Settings" page, which again you can go to by clicking the left-side grey box when browsing the track, you need to find and then click the "view table schema" link, this will open a new tab. This new schema page will show you the fields for this table of data, and the table name will be displayed, "Primary Table: encRegTfbsClustered". When you are on a schema description page like this, you can go to the top blue "Tools" option from the menu bar and then select the "Table Browser" selection. This will automatically set into th Table Browser the very table already being viewed, already selected. However, if you needed to find it, you would set the "group: Regulation" and the "track: TF Clusters" and the "table: encRegTfbsClustered" to narrow down on this table on hg38 (again the above step of accessing the Table Browser from a schema description page will have done this for you automatically). Note too that at this point the position as well will be prefilled for the region you were browsing, chr5:88,671,736-88,950,761, in this case when zoomed out by 1.5x around the gene.

At this point, with the encRegTfbsClustered selected in the Table Browser for the position around the gene of interest, you can go to the "output filename" box and put in a name like "MEF2C_TfbsClustered.csv", however, be absolutely sure to also select radio button next to "csv (for excel)" before you click "get output" so the file will indeed be a .csv file downloaded. The resulting downloaded MEF2C_TfbsClustered.csv file can then be opened in Excel. Note that the first column will be "bin" which is an internal number used in our system that you may not want. If you didn't want that column, or other columns, you could change the "output format" in the Table Browser from "all fields from selected table" to the option of "selected fields from primary and related fields" where upon clicking "get output" a new screen would allow you to pick only the fields of interest, or even add fields from related tables (such as hg38.factorbookMotifCanonical in this case).

How to reduce the number of items in the TFBS Clustered Data on the Table Browser

If you wish to reduce the number of the items in the output from the encRegTfbsClustered table in the Table Browser for the position around the gene of interest, you can use the "filter" option. Go back to the step before "get output" on the Table Browser and click the "create" button next to "filter" so that you will arrive on a new page. Here you will see those fields again from the encRegTfbsClustered table, with many options to select upon them. You can go to the "score is" row, and change the dropdown option from "ignored" to ">" for greater than and then enter a number like 700 on the far right (score is between 0 and 1000, as explained on the referenced Track Description page). Click "submit" to leave the filter page and then "get output" and the newly downloaded MEF2C_TfbsClustered.csv file will only have items with a score of 700 or above, reducing many of the items.

Please spend some time on your own trying these steps. You can save your selections for the future by using the "My Data" and "My Sessions" menu. To save sessions, you will need to create an account with an email, but this will allow you to stop and create as many snapshots of your steps as you wish. At any moment in your selection process you can go to "My Data" and then "My Sessions" and enter a new name to save explaining your steps. With saved sessions you can revisit in the future what you have done in the past, you can also click a "details" link to paste steps (such as these shared) for your session. And it will allow you to amend some of your selections so you can find different results. Here is a session from making some of these steps: http://genome.ucsc.edu/s/brianlee/hg38_MEF2C_TableSelections

PART2 Mouse mm10 JASPAR TFBS data

Review the above "How to start from scratch to find tracks" and "How to view data in the region of interest and adjust display" sections and instead of using hg38 go to "GRCm38/mm10" and then use "TF" in the track search step. Set the "JASPAR Transcription Factors" track on mm10 to "pack" and then search "Mef2C" and select "Mef2c (ENSMUST00000198199.4) at chr13:83504034-83663343" from the search results, and then zoom out "1.5x" as well. Next click into the grey box for the "JASPAR Transcription Factors Track Settings" and click the "Schema" link on the far right. By clicking the Schema link you will see the fields for the "Primary Table: jaspar2022" and from this page you can use the Tools menu to arrive at the Table Browser, so this table will be automatically selected for you. Here click the "create" button next to "filter:" and for the "score is" field set it from "ignored" to ">" for greater than and put in a value such as 700 and then click submit. Then put in the "output filename:" field a name like "mm10Jaspar_Mef2c_score700.csv" and be sure that you click the button next to "csv (for excel)" and then click "get output" to download the results. Open the resulting file in Excel.

It is highly recommended you make sessions as you step through these processes. With session links you create you will then have bookmark links back to familiar steps in case in the future you need to return to these steps. Or if you make minor adjustments, or pick other data tables, these session links will let you return to intermediate steps.

Also, sessions can be shared, and made public, where the details about steps can be viewed. In fact, here is one of the mouse steps to check only after you try doing the above steps on your own: https://genome.ucsc.edu/cgi-bin/hgPublicSessions?search=Shiaoching

Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further public questions, please send new questions to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible forum to help others find answers to similar questions. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu, which is a private internal list to our support team.

All the best,

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/BL0PR11MB3218A62D67A4051299415C8AB6119%40BL0PR11MB3218.namprd11.prod.outlook.com.

Reply all

Reply to author

Forward

0 new messages