Talk on TF-DNA motif discovery at IISER Pune on Wednesday at 3 pm

16 views

Skip to first unread message

Leelavati Narlikar

unread,

Oct 27, 2025, 6:48:36 AM10/27/25

to biomode...@googlegroups.com

Title: Refined transcription factor DNA-binding motif discovery from pangenomic ChIP-seq, ATAC-seq and similar datasets

speaker: Denis Thieffry, ENS-Paris/PSL University

space-time coordinates:
Seminar Hall 51, 4th floor, Main Building, IISER Pune
Wednesday, 29th October 2025, 3:00 PM - 4:00 PM

abstract:

The development of high-throughput sequencing (HTS) techniques has opened up new avenues for identifying, modelling and predicting DNA motifs bound by transcription factors (TFs). On the one hand, provided that a good antibody is available, chromatin immunoprecipitation assays coupled with HTS (ChIP-seq) can capture most TF-bound sequences in a given cell type or tissue at the genomic scale. On the other hand, epigenomic assays, including whole-genome bisulphite sequencing (WGBS) and combinations of ChIP-seq assays targeting chromatin marks, can be used to identify potential promoter and enhancer regions. Using these datasets, various types of computational analyses can be performed to deduce potentially related transcription factors. The most common approach is to analyse putative cis- regulatory sequences (promoters or enhancers) using collections of probabilistic models of transcription factor binding sites, typically in the form of position weight matrices (PWMs), which can be found in public databases such as JASPAR (https://jaspar.elixir.no/). However, this approach is inherently limited by the quality of the available PWM sets. Another approach is to apply pattern discovery algorithms to regions presumed to be co- regulated, then compare the patterns obtained with public collections of PWMs. Pattern discovery algorithms (e.g., Gibbs samplers, MEME) typically perform multiple local alignments on a set of sequences, which requires pre-filtering and heuristic sampling to process large sets (thousands) of sequences, at the risk of missing subtle variations in the patterns. To overcome the shortcomings of these multiple alignment approaches, Jacques van Helden initiated the development of a set of tools based on k-mer counting and multinomial statistics to identify words that are overrepresented in large sequence datasets and to construct refined PWMs (http://rsat.eu).
More recently, thanks to the accumulation of ChIP-seq data for various transcription factors, combined with WGBS data, in the same well-established cell lines, it has become possible to study in greater detail the impact of DNA methylation on transcription factor binding. By combining ChIP-seq datasets targeting various dimeric transcription factor partners in the same cell lines, Touati Benoukraf and collaborators were able to define refined PWMs for each dimer, containing higher information content than the degenerate motifs encoded in public databases. These refined motifs are now available in the MethMotif database (https://methmotif.org), while a series of functions written in the R programming language, grouped in the TFregulomeR package, is shared on github to ease the analysis of new ChIP-seqand WGBSdatasets (https://github.com/benoukraflab/TFregulomeR).