I want to share some of the feelings I have from working on Selenium IDE for over the past year, and would really like to know what are your thoughts .
First off let's lay the ground of what Selenium IDE currently is, and what it means for us and its users.
Selenium IDE is a WebExtension compatible with Chrome Extension APIs. In essence it is a webapp that has some elevated APIs that aren't available in other browser sandboxes (unlike the old Selenium IDE which was a XUL based extension that had more native capabilities). For us this means a few things:
Selenium IDE does not use WebDriver
We are using Selenium 1 in some cases, in others we use the atoms, but still it isn't native playback. Ironically we are facing the same issues today that Selenium faced a decade ago which made the project move towards WebDriver.
Some of the implications include:
- Can't handle basic authentication, e.g. basic auth involves a native popup which we can't automate.
- Can't configure the browser's proxy settings programmatically. The user has to go to the browser settings and change those accordingly.
- Playback is tied to the user's profile in the browser which means that cookies, local storage, and any other state is shared between runs. This is especially hard to account for since the user may browse while the test is undertaken which means that we can't delete the data as we can't distinguish between user data and test data.
- event.isTrusted will always be false, which means the browser won't fire the default behavior for an event that bubbles to the browser e.g., forms won't be submitted on Enter, and so forth...
- No native drag and drop, we can't dnd files as part of playback
- No file uploads as part of playback
This can either be resolved by asking the users to have a Selenium server up and running before using the IDE, which means that in addition to installing the IDE from the store, users would also have to install JRE and download a jar and make sure to run it beforehand. Or we'll need to have Selenium IDE be a native application which would let it spawn a driver as a child process.
Selenium IDE doesn't have filesystem access
As a glorified webapp Selenium IDE is unable to perform some of the basic features that are expected out of an IDE. For example save on write
, since we are a webapp all saves are basically file downloads, which means that they always get downloaded into a directory that they have to specify each time they save. Another example is the fact that we have introduced a new file format "side" which contains all the tests and suites in the same file. Users have complained that it makes it very difficult for them to re-use and maintain. This is due to the fact that we don't have a working directory or read access. So if a user needs to use multiple files they would have to open each one manually. Our workaround has been to put everything in one file, but it is very cumbersome.
For comparison the old IDE got the file path
when uploading a file and could use that to read more files independently of the user's explicit permission.
This could be resolved in one of two ways as well.\
1. Making available in the standalone Selenium server the ability to serve and write files through http to the IDE.
This would not only be a non-native behavior in place where the user expects a native one (native file picker vs webapp), it would also add additional code and effort to develop something trivial that the operating system should take care of.
2. Offer the IDE as a native app that would have a working directory and proper filesystem access.
Reliant on non-standard automation APIs
For some of the actions we take we are using CDP
to perform, to give users best effort in automation, this helps us around file uploads and events that require isTrusted to be true
. But we have seen on multiple occasions that these APIs are unreliable and break the IDE more often than we'd like it to happen. A few example include breaking our messaging bus
and breaking file uploads
. Offering these workarounds through CDP only account for Chrome which makes up half of the user base, but the Chromium team are deliberating limiting extensions access to the protocol starting with extensions 3.0. As for Firefox, Mozilla have not publicly stated that they will allow extensions to access Marionette, nor is it on the WebExtension spec.
The best workaround for this is to stop using APIs that are non-standard and using WebDriver guarantees (or the closest to it) that the APIs that we are using would not break unless it is dictated in advance by W3C, rather than a single browser's product team.
Poor control over recording environment
This is a precursor to my next point, but basically we have poor control over the browser's environment which causes two problems with recording that could lead to recorded tests not running right after being recorded.
- We are affected by CORS which limits our ability to record locators to a given frame with the "select frame" command. Right now we are recording indexes and switch to the frame based on that (which has proven to be unreliable).
- We can't run WebDriver in the recording window, which means we can't verify that elements are interactable according to WebDriver, e.g. checking that a click at a coordinate will fire against the target element.
This is a hard nut to crack because even if we used the Selenium standalone to run the playback, it wouldn't solve our recording problems. Also, if we would use the open WebDriver session to record and to playback, then why use WebExtension as an environment in the first place? The best solution, in my opinion, would be to use an environment that can dictate the frames security during recording, and that can also utilize WebDriver during the recording.
Inconsistencies between CLI and WebExtension playback
As I've mentioned before Selenium IDE uses mostly atoms to run the playback, which is a completely different codebase than the one we use for our cli
runner. This can cause differences between the two, most notably #2 from the previous section, since we're unable to equip users finding these issues with the tools necessary to resolve them on their own. As a result they are moving away from that solution and are either calling for code export (so they can iron out the test in a different code editor), or to refrain from using the runner at all and asking for a scheduler within the IDE.
In the end we want Selenium IDE to provide consistent and reliant playback, and the only way to do that is by using the exact same code for playback in the extension and cli runner. That can only be achieved using WebDriver.
Using multiple windows is confusing af
Since we can't host the AUT inside Selenium IDE this creates a situation where the user has to constantly change windows to author and record the test. The only way to resolve it is using an environment that can host a webview side-by-side with the IDE.
Native app solution
In my mind moving to a native app, most likely based on electron, would solve all of the issues above in the most elegant fashion. We can use WebDriver to drive the electron instance itself, which would let us use it both for recording and playback, and host the AUT from within the IDE.
This raises one big concern which is Firefox, since electron is Chrome based. Chrome for obvious reasons would be embedded within the app, but for Firefox there is a solution (although it's not that elegant). For playback in Firefox we would start geckodriver and play in a new window. Since we are native we can place the windows side by side to make it obvious about what is happening. For recording, we can bundle a recorder extension that we will install on the Firefox session to be able to record on Firefox (just like we would be able to do in Chrome).
Another big concern that I've heard is since the IDE would be a native app, it may be more difficult for enterprises to allow it in the organization. I would argue the opposite. Since we are using the user's default browser to run the automation we have direct access to some of the enterprises' keys. I believe that keeping it as an extension would make it more difficult than a native app. Moreover, we will be deploying a binary type that is already common within enterprises similarly to VS Code and Atom. As a result, I believe that it would actually be simpler to get the IDE accepted into enterprises.
As for IE, I don't think we can ever talk browsers without mentioning that guy. Selenium IDE as an extension will never support IE, since it will never run WebExtensions, but at least in a native solution we may be able to play back on it.
For any other browser we can support them in a similar fashion to Firefox, not only that but since our extension API surface will be much smaller, we will be able to support new browsers much sooner. For example, Edge atm doesn't support all the APIs we need to run Selenium IDE, but it does support the ones we need to run the recorder alone.
After woking on Selenium IDE for so long, I believe that due to the limitations of the technology that we are using we would never be able to give our users a Selenium IDE that is great, maybe good, but never great. If we want to give users a great IDE we need to have the same starting point as any other IDE, which is native.
In my mind Selenium IDE used to be an extension, because before WebDriver existed the only way to reliably automate a browser is to use the extension APIs to do so, this was a compromise on the users' UX for the tests' reliability, but we are not in that point in time anymore and there are far better solutions.
Please let me know your thoughts, as I'd like to start working on this.