downloading a table with cell line information

35 views
Skip to first unread message

Eric Foss

unread,
Feb 10, 2015, 4:03:41 PM2/10/15
to gen...@soe.ucsc.edu
Dear UCSC Genome Browser, 

You have a page with information about many cell lines here: 


Can I download this as a table? 

Thank you. 

Eric

Brian Lee

unread,
Feb 10, 2015, 4:57:52 PM2/10/15
to Eric Foss, gen...@soe.ucsc.edu

Dear Eric,

Thank you for using the UCSC Genome Browser and your question about the cell lines listed on the ENCODE Cell Types 2007 - 2012 page:http://genome.ucsc.edu/ENCODE/cellTypes.html

Please note that the ENCODE data at UCSC is currently limited to the span up until 2012. For current ENCODE project release information, please see the ENCODE portal:
https://www.encodeproject.org/help/getting-started

For the data at UCSC, the metadata about antibodies, cell types, labs, protocols, treatments, ect. was collected into a controlled vocabulary document. That "cv.ra" document can be found on the ENCODE Downloads page, http://genome.ucsc.edu/ENCODE/downloads.html, under at "Metadata" section. Here is the link: http://hgdownload.cse.ucsc.edu/goldenPath/encodeDCC/cv.ra

When opening the cv.ra text file, you will see it is a collection of stanza, of different types for each kind of metadata (antibodies, cell types, labs, protocols,). The first section of the file is the collection of information about the "type Cell Line", where there are terms and tags, such as GM12878 or BC_Placenta_UHN00189. The term is used to open CGI links: http://genome.ucsc.edu/cgi-bin/hgEncodeVocab?ra=encode/cv.ra&deprecated=true&term=GM12878

You may find this UCSC ENCODE resource page helpful, http://genome.ucsc.edu/ENCODE/FAQ/, where you can find some FAQs and links to the "Experiment Matrix" which profiles Cell Lines against Assay Types (for both mouse and human). There is also the ChIP-seq Matrix available, which visualizes Cell Lines agains Antibody: http://genome.ucsc.edu/ENCODE/dataMatrix/encodeChipMatrixHuman.html

Also, every ENCODE file has a line about that file's metadata in a files.txt file located in the related downloads page. See this FAQ, and the one above it, for more information: http://genome.ucsc.edu/ENCODE/FAQ/#release7

There are also metaData tables for the mm9 and hg19 databases, linking every file to it's metadata. If you are interested in using MySQL, please see this resource: http://genome.ucsc.edu/goldenPath/help/mysql.html The following command would go through the metaData table for hg19 and pull the distinct related cell lines: mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -Ne 'select distinct val from metaDb where var like "cell";' hg19

That list, which would include terms like, GM12878 and BC_Placenta_UHN00189, could be put into the CGI link for hgEncodeVocab, or searched in the cv.ra to find more information.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead togenom...@soe.ucsc.edu.

All the best,

Brian Lee
UCSC Genome Bioinformatics Group


--


Brian Lee

unread,
Feb 10, 2015, 5:09:19 PM2/10/15
to Eric Foss, gen...@soe.ucsc.edu

Dear Eric,

In case it might be of interest, I forgot to add that you can add several cv.ra terms to the hgEncodeVocab CGI.

Below are two links, one for hg19 and one for mm9, where all the distinct cell terms from the related metadata tables have been put into a URL, each with a comma (term=X,Y,Z) providing a type of table seen on the cellTypes page:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -Ne 'select distinct val from metaDb where var like "cell";' hg19

http://genome.ucsc.edu/cgi-bin/hgEncodeVocab?ra=encode/cv.ra&deprecated=true&term=8988T,A549,Adult_CD4_Th0,Adult_CD4_Th1,AG04449,AG04450,AG09309,AG09319,AG10803,AoAF,AoSMC,Astrocy,BC_Adrenal_Gland_H12803N,BC_Brain_H11058N,BC_Breast_02-03015,BC_Jejunum_H12817N,BC_Kidney_01-11002,BC_Left_Ventricle_N41,BC_Leukocyte_UHN00204,BC_Liver_01-11002,BC_Lung_01-11002,BC_Pancreas_H12817N,BC_Pericardium_H12529N,BC_Placenta_UHN00189,BC_Skeletal_Muscle_01-11002,BC_Skeletal_Muscle_H12817N,BC_Skin_01-11002,BC_Small_Intestine_01-11002,BC_Stomach_01-11002,BC_Testis_N30,BC_Uterus_BN0765,BE2_C,BG02ES,BJ,bone_marrow_HS27a,bone_marrow_HS5,bone_marrow_MSC,Caco-2,CD20+,CD20+_RO01778,CD20+_RO01794,CD34+_Mobilized,CD4+_Naive_Wb11970640,CD4+_Naive_Wb78495824,Cerebellum_OC,Cerebrum_frontal_OC,Chorion,CLL,CMK,Colo829,Colon_OC,Dnd41,ECC-1,Endometrium_OC,Fibrobl,Fibrobl_GM03348,FibroP,FibroP_AG08395,FibroP_AG08396,FibroP_AG20443,Frontal_cortex_OC,GC_B_cell,Gliobla,GM04503,GM04504,GM06990,GM08714,GM10248,GM10266,GM10847,GM12801,GM12812,GM12813,GM12864,GM12865,GM12866,GM12867,GM12868,GM12869,GM12870,GM12871,GM12872,GM12873,GM12874,GM12875,GM12878,GM12878-XiMat,GM12891,GM12892,GM13976,GM13977,GM15510,GM18505,GM18507,GM18526,GM18951,GM19099,GM19193,GM19238,GM19239,GM19240,GM20000,H1-hESC,H1-neurons,H7-hESC,H9ES,HA-h,HA-sp,HAc,HAEpiC,HAoAF,HAoEC,HBMEC,HBVP,HBVSMC,HCF,HCFaa,HCH,HCM,HConF,HCPEpiC,HCT-116,Heart_OC,HEEpiC,HEK293,HEK293-T-REx,HEK293T,HeLa-S3,Hepatocytes,HepG2,HFDPC,HFF,HFF-Myc,HGF,HIPEpiC,HL-60,HMEC,HMEpC,HMF,hMNC-CB,hMNC-PB,hMSC-AT,hMSC-BM,hMSC-UC,HMVEC-dAd,HMVEC-dBl-Ad,HMVEC-dBl-Neo,HMVEC-dLy-Ad,HMVEC-dLy-Neo,HMVEC-dNeo,HMVEC-LBl,HMVEC-LLy,HNPCEpiC,HOB,HPAEC,HPAEpiC,HPAF,HPC-PL,HPDE6-E6E7,HPdLF,HPF,HPIEpC,HRCEpiC,HRE,HRGEC,HRPEpiC,HSaVEC,HSMM,HSMMtube,HSMMtube_FSHD,HSMM_emb,HSMM_FSHD,HT-1080,HTR8svn,Huh-7,Huh-7.5,HUVEC,HVMF,HWP,IMR90,iPS,iPS_CWRU1,iPS_hFib2_iPS4,iPS_hFib2_iPS5,iPS_NIHi11,iPS_NIHi7,Ishikawa,Jurkat,K562,Kidney_OC,LHCN-M2,LNCaP,Lung_OC,M059J,MCF-7,MCF10A-Er-Src,Medullo,Medullo_D341,Melano,Mel_2183,Monocytes-CD14+,Monocytes-CD14+_RO01746,MRT_A204,MRT_G401,MRT_TTC549,Myometr,Naive_B_cell,NB4,NH-A,NHBE,NHBE_RA,NHDF,NHDF-Ad,NHDF-neo,NHEK,NHEM.f_M2,NHEM_M2,NHLF,None,NT2-D1,Olf_neurosphere,Osteobl,ovcar-3,PANC-1,Pancreas_OC,PanIsletD,PanIslets,PBDE,PBDEFetal,PBMC,PFSK-1,pHTE,PrEC,ProgFib,prostate,Psoas_muscle_OC,Raji,RCC_7860,RPMI-7951,RPTEC,RWPE1,SAEC,SH-SY5Y,SK-N-MC,SK-N-SH,SK-N-SH_RA,SKMC,Small_intestine_OC,Spleen_OC,Stellate,T-47D,Th1,Th17,Th1_Wb33676984,Th1_Wb54553204,Th2,Th2_Wb33676984,Th2_Wb54553204,Treg_Wb78495824,Treg_Wb83319432,U2OS,U87,UCH-1,Urothelia,WERI-Rb-1,WI-38

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -Ne 'select distinct val from metaDb where var like "cell";' mm9

http://genome.ucsc.edu/cgi-bin/hgEncodeVocab?ra=encode/cv.ra&deprecated=true&term=10T1/2,3134,416B,A20,Adrenal,B-cell_(CD19+),B-cell_(CD43-),BAT,Bladder,BMDM,BoneMarrow,C2C12,Cerebellum,Cerebrum,CH12,CNS,Colon,Cortex,Duodenum,EPC_(CD117+_CD71+_TER119+),EPC_(CD117+_CD71+_TER119-),EPC_(CD117+_CD71-_TER119-),EPC_(CD117-_CD71+_TER119+),EpiSC-5,EpiSC-7,Erythrobl,ES-46C,ES-Bruce4,ES-CJ7,ES-D3,ES-E14,ES-EM5Sox17huCD25,ES-TT2,ES-WW6,ES-WW6_F1KO,FatPad,Fibroblast,ForelimbBud,FrontalLobe,FVLstem,FVprogenitor,G1E,G1E-ER4,GenitalFatPad,HeadlessEmbryo,Heart,HindlimbBud,J185a,Kidney,L1210,LgIntestine,Limb,Liver,Lung,MammaryGland,MEF,Megakaryo,MEL,MEP,Mesoderm,mG/ER,NIH-3T3,OlfactBulb,Ovary,Patski,Placenta,Retina,SkMuscle,SmIntestine,Spleen,Stomach,SubcFatPad,T-Naive,Testis,THelper-Activated,Thymus,TReg,TReg-Activated,WholeBrain,ZhBTc4

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,
Brian Lee

Reply all
Reply to author
Forward
0 new messages