An introduction to the new Grid
I'm going to assume you all know how to build selenium and are familiar with how it all works, so this is just an introduction to building and setting up the new Grid. There's still work to be done (some of which is detailed at the bottom), but this is enough to show the idea working as planned…. I'll also be writing a wiki page, but want to stabilise things first before pushing this in front of the world :)
As ever, I'm always happy to hear feedback, either privately or on one of the many channels we use to talk about this stuff.
As with the original selenium standalone/grid binary, everything is in one place:
./buckw build grid-tng
Or, if you’d like to build without the alias path so you know where the code is:
./buckw build //java/server/src/org/openqa/selenium/grid:selenium
Now it’s built, time to run it!
java -jar buck-out/gen/java/server/src/org/openqa/selenium/grid/selenium.jar
This should give you a list of all the commands that the binary is aware of. As the output says, you can run these by appending the command name and any extra flags you’d like.
First off, we need a “session map” server. Essentially, this acts as a hash of “session id” to “where on earth is the session running”, though there’s some additional information kept in there too.
java -jar buck-out/gen/java/server/src/org/openqa/selenium/grid/selenium.jar sessions
The next thing we need is a server that can distribute new sessions to nodes within the grid. You start this using:
java -jar buck-out/gen/java/server/src/org/openqa/selenium/grid/selenium.jar distributor
As you’ve started each of these, they’ve output the ports that they’re listening on. This is important, because the next step is to add a node. A node is responsible for actually running a session, and it needs two bits of information: where the distributor is (so it can let it know that there’s a new node in the world) and where the session map is (so later requests can be directed efficiently). Assuming you’ve used the default ports, you can run it using:
That “detect-drivers” is needed so that the node will automatically check the system for which drivers it can use. At some point, we’ll allow you to configure this more meaningfully :)
So, you’ve now got the basic system up and running. However, it is unlikely you’d want to expose your internal infrastructure in this way: we also want a gateway that local ends can connect to and communicate with. This is what the router is for:
This will be listening on port 4444, as you’d expect, and so you can now start a webdriver session by pointing at “http://localhost:4444”
The router is effectively stateless, so if you’re using something like k8s you can have a whole fleet of them fronting your grid. When a selenium command reaches the router, it finds the node running the session and forwards the message directly to that node, bypassing the distributor entirely. It caches session locations too, so often the only thing it needs to do is forward a request directly to the server that will process it — this should help us to scale grid to gigantic sizes.
Of course, starting all these things together is a massive PITA, so, just like before, you can spin up the entire grid in a single process using:
java -jar buck-out/gen/java/server/src/org/openqa/selenium/grid/selenium.jar standalone --detect-drivers
Unlike in previous versions of selenium, this will actually use exactly the same components as above, but this time simply linking them all together in the same process rather than starting up a small fleet of servers.
There are obviously a huge number of things to do on the new Grid:
- Right now, the order of start up matters. Nodes don’t try and attach to the grid repeatedly, and there are no health checks
- There are no health checks at all, for that matter
- Integrate Open Tracing and structured logging into the Grid. It should be possible to host this stuff in a cloud provider, point Honeycomb or DataDog at the logs and get traces out without any additional work by hard-pressed SREs.
- It should be possible to add Selenium 3 Grid nodes to the grid by pointing them at the distributor instead of a normal hub
- Protocol conversion needs to be pulled out into a filter
- RC support also needs to be spun out that way too.
- I want to add a redis-backed session map (probably using jedis) so that we can deploy this thing with some resiliency.
- Finish rolling in the scheduler branch to the tree so that we can actual distribute work fairly.
- Allow for configs to be created. I really want some way to define a patchset for JSON, but I'm happy to purloin bits from other successful grid implementations.
- Request retries would be nice --- if this server is down, go to this one.
I'm sure there's more, but that's enough for now :)
Happy hacking, folks!
Simon