DOM parsing with Twitter

91 views
Skip to first unread message

cla...@gmail.com

unread,
Oct 16, 2022, 1:06:20 PM10/16/22
to Chromium Extensions
I'm working on an extension for modifying specific text by traversing the DOM. It works fine for sites that don't obfuscate page structure too much but has trouble with Twitter.

This is the basic loop I'm using...

//iterate over DOM nodes and match text
var elements = document.body.getElementsByTagName("*"); //HTMLCollection

for (let i = 0; i < elements.length; i++) {
  var e = elements[i];

  for (let j = 0; j < e.childNodes.length; j++) {
    var node = e.childNodes[j];

    if (node.nodeType === 3) { //text node
//

Twitter buries tweet text in a span element like this, which is within DIVs and an ARTICLE element.

<span class="css-901oao css-16my406 r-poiln3 r-bcqeeo r-qvutc0">TEXT</span>

If anyone has any experience with this, should I be iterating through the InnerHTML (textContent?) of elements (nodeType 1)? Or starting with a different parent than BODY for the traversal? I'd like to find a better approach to matching all text on a page. Thanks. 


wOxxOm

unread,
Oct 16, 2022, 4:30:11 PM10/16/22
to Chromium Extensions, cla...@gmail.com
Your loop should work on the main twitter site, but there's a dedicated TreeWalker API to find text nodes, see https://stackoverflow.com/q/7275650
I guess you try to process embedded twits that use ShadowDOM, in which case you need to dig each shadowRoot individually.

wOxxOm

unread,
Oct 16, 2022, 4:32:06 PM10/16/22
to Chromium Extensions, wOxxOm, cla...@gmail.com
Another common problem is that modern sites like twitter generate their page contents dynamically after your content script has finished running. You can use MutationObserver API to detect it.

cla...@gmail.com

unread,
Oct 16, 2022, 5:41:52 PM10/16/22
to Chromium Extensions, wOxxOm, cla...@gmail.com
Thanks, I'll give that try.
Reply all
Reply to author
Forward
0 new messages