So this comes back to what the effect of a less stringent P-value threshold is in MACS2. One might think that the only difference was the number of peaks, but there is another significant difference -- peak width. Peaks called with a less stringent threshold are wider, sometimes much wider. Also, sometimes one wide peak region called with lower stringency can be reported as more than one peak with higher stringency.
Why is this? Without doing a deep dive into the MACS2 source code I think of it like this. MACS2 may first identify summits, then "reach out" from those maxima to define the peak region. With a less stringent threshold, the reach extends farther before the signal drops below threshold. Hence wider peak regions.
In any case, that is why your jaccard is not 100%. Even if MACS2 called the "same"
peaks in both cases (e.g., around the same peak summit positions), the
lower-stringency set will be wider, leading to a considerably larger union than
intersection.
If you just want to see the correlation between the top 100 in the two sets, you might try instead extracting a fixed interval around the peak summit for each set. Or use bedtools merge and count the number of merged peaks versus input peaks.
One additional note from our experience with MACS2 thresholding. We found that using a P-value threshold less stringent that 0.01 (which appears roughly equivalent to a Q-value threshhold of 0.25) produced overly broad peaks such that the genomic coverage of called peaks did not seem reasonable. This was true for for marks ranging from broad (e.g. H3K27me3) to "sharp" (e.g. H3K4me3, CTCF).