Element#all_text, gsub, and optimizations

Skip to first unread message

Brandon Weaver

Jan 14, 2023, 5:44:23 PMJan 14
to Capybara
I was doing some performance work around some of our Capybara specs, and found that a lot of time is being spent in `gsub` which is being used to strip out various forms of spacing in a fairly hot path around `Capybara::Node::Element#all_text`.

I'll open some PRs later when I get a moment, but wanted to run a few ideas by folks first to see what thoughts are.

The most immediate improvement would be hoisting the regex in that loop to constants to prevent rebuilding them. Some ideas I would need to vet out first might include switching out `gsub` and regex for `lstrip`/`rstrip` and literal string tokens, and potentially caching `all_text` for nodes.

The caching I'm not sure on, because I'm not immediately sure from a glance if those elements are actively changing and how valid that might be, but if they are "static" in nature that could be a great perf gain.

Anyways, main point of all of this is that there's a lot of time spent stripping whitespace, and considering how frequently it gets hit that could be a really nice gain.

- Brandon
Reply all
Reply to author
0 new messages