DOM parsing with Twitter

cla...@gmail.com

unread,

Oct 16, 2022, 1:06:20 PM10/16/22

to Chromium Extensions

I'm working on an extension for modifying specific text by traversing the DOM. It works fine for sites that don't obfuscate page structure too much but has trouble with Twitter.

This is the basic loop I'm using...

//iterate over DOM nodes and match text

var elements = document.body.getElementsByTagName("*"); //HTMLCollection

for (let i = 0; i < elements.length; i++) {

var e = elements[i];

for (let j = 0; j < e.childNodes.length; j++) {

var node = e.childNodes[j];

if (node.nodeType === 3) { //text node

//

Twitter buries tweet text in a span element like this, which is within DIVs and an ARTICLE element.

If anyone has any experience with this, should I be iterating through the InnerHTML (textContent?) of elements (nodeType 1)? Or starting with a different parent than BODY for the traversal? I'd like to find a better approach to matching all text on a page. Thanks.

wOxxOm

unread,

Oct 16, 2022, 4:30:11 PM10/16/22

to Chromium Extensions, cla...@gmail.com

Your loop should work on the main twitter site, but there's a dedicated TreeWalker API to find text nodes, see https://stackoverflow.com/q/7275650

I guess you try to process embedded twits that use ShadowDOM, in which case you need to dig each shadowRoot individually.

wOxxOm

unread,

Oct 16, 2022, 4:32:06 PM10/16/22

to Chromium Extensions, wOxxOm, cla...@gmail.com

Another common problem is that modern sites like twitter generate their page contents dynamically after your content script has finished running. You can use MutationObserver API to detect it.

cla...@gmail.com

unread,

Oct 16, 2022, 5:41:52 PM10/16/22

to Chromium Extensions, wOxxOm, cla...@gmail.com

Thanks, I'll give that try.

Reply all

Reply to author

Forward