Hi! Thanks for your question.
This kind of "interval restriction" is described in the DeepSEA 2015 manuscript as follows:
"We focused on the set of 200-bp bins with at least one TF binding event, resulting in 521,636,200 bp of sequences (17% of whole genome), which was used for training and evaluating chromatin feature prediction performance."
In short, the whole point of the intervals file is that you are changing your training from whole-genome sampling to only "N region" sampling. In the `deepsea_TF_intervals.txt` case, we are restricting to regions that have at least one TF measured in that region. This means that the model can still be trained for ANY chromatin profile, DNase, histone, TF, etc. but it will not "see" any of the data that lies outside of those intervals.
Depending on your application, you might find that a model that learns chromatin features from only these regions will detect more biologically relevant signal than training on the entire genome. So it's totally up to you what interval file you choose. If you want to use `deepsea_TF_intervals.txt` I'd just check to see how much of your training set intersects with that BED file to make sure there's sufficient training data in those regions. :)
Hope this is helpful,
Kathy