For my case, I didn't get to going around major flow changes for A/B testing. These were mainly different page arrangements or page UI redesigns, then after some period they'll eventually prune the variations out. So in that case, I can prioritize testing, by passing in a pageset param, 1 being the most use, to X.. being the least used. It'll test using the most used (better scoring) flows first. Usually by the 3rd or 4th pagesets, we almost don't bother testing those as they'll be pruned out within the next month.
The abstract factory approach is a good approach for testing mobile/desktop/tablet variations on the same page. For A/B tests, of single PageObjects, you can quickly extend the original pageset, then override where they differ, then prune it out later when that pageset is removed.
For A/B tests of entire flows, it's annoying and you end up having to bite the bullet if it spans multiple sprints and have a separate set of tests. In terms of organizing my test structure, I use 2 layers of abstraction between separating my high level tests from low level details. The first level I call Pages/Screens, these are Page Objects, or Objects representing screens in the application. The other layer I call flows, these encompass any reusable steps that spans 2 or more screens/pages. At the test layer itself, the test will read almost like pure english. (Ideally I'd like to use a BDD framework like cucumber at this layer, however I opted not to as it's hard to extend those frameworks with custom test runners for massive parallelization vs. a simple unit test framework)