Mac GUI Automation in the Appium project or someone else

1,345 views
Skip to first unread message

Dan Cuellar

unread,
Jul 25, 2013, 5:59:16 PM7/25/13
to appium-...@googlegroups.com
I've mostly got a demo-able version automation for mac os x apps with Selenium WebDriver working. It does pagesource, screenshot, start, stop, and find by label. I'm maybe 2-4 hours of work away from adding click and some other things

See it here.

The question is does this fall into its own project, or does it belong in Appium. Appium's tagline is "Automation for apps" and these are OS X apps. On the other hand, is Appium mobile only?

Just curious to hear what people think. If I can get some spare time I can probably get a usable set of functionality done in a couple of days worth of work.

Jonathan Lipps

unread,
Jul 25, 2013, 6:36:48 PM7/25/13
to Dan Cuellar, appium-...@googlegroups.com
I would love to see Mac OSX automation in appium.

"Appium for apps" is intentionally broad for exactly this reason.

--
http://appium.io
---
You received this message because you are subscribed to the Google Groups "Appium-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to appium-discus...@googlegroups.com.
Visit this group at http://groups.google.com/group/appium-discuss.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Dan Cuellar

unread,
Jul 25, 2013, 6:54:06 PM7/25/13
to appium-...@googlegroups.com, Dan Cuellar
Great. I am working on the prototype in Python + AppleScript, but I think the final version is going to have to be a Node + Objective-C tango.

The idea is that you can control almost any os x app from AppleEvents and you can read about the contents of the screen using the accessiblity protocols apple provides for 3rd party screen-reader manufacturers or just from AppleScript. I've confirmed that this is all possible, the work remaining is making it performant and making it implement json-wire, but I can 100% confirm it works.

Let me know if anyone wants to help, I can walk you through how it all works in person.

bootstrap online

unread,
Jul 25, 2013, 6:55:43 PM7/25/13
to Dan Cuellar, appium-...@googlegroups.com
That sounds awesome. I have an OS X app that I'm suppose to test at
work. I'll see if they'll let me spend time on this.

Dan Cuellar

unread,
Jul 25, 2013, 6:59:19 PM7/25/13
to appium-...@googlegroups.com, Dan Cuellar
Great, it's really really easy to do (performance not considered). The trouble is just getting the time to do it. It would be great if I have some help.

Vic Wong

unread,
Jul 25, 2013, 7:09:37 PM7/25/13
to appium-...@googlegroups.com, Dan Cuellar
Let me guess, it works uses Applescript? :)

Dan Cuellar

unread,
Jul 25, 2013, 7:37:47 PM7/25/13
to appium-...@googlegroups.com, Dan Cuellar
I'm doing the prototype in AppleScript, but the final product will probably use the Apple ObjC / C libraries for this.

Mac GUI scripting is already a thing, the problem is it's confined the AppleScript community (very small) There already exists products to do this. See http://pfiddlesoft.com/uibrowser/

The win here is identical to Appium. Take a working automation model that sucks for developers, implement json wire protocol so it doesn't have to be coded in the language apple wants you to use (AppleScript), and BAM the suckage is gone.

Dan Cuellar

unread,
Jul 26, 2013, 11:00:22 AM7/26/13
to appium-...@googlegroups.com, Dan Cuellar

David Luu

unread,
Jul 26, 2013, 11:08:20 PM7/26/13
to appium-...@googlegroups.com
Either way sounds good to me, as long as we have progress & new offerings.

But there is one thing to consider, especially for long term. Dan's Mac automation prototype comes from (what I assume to be) a core developer of Appium, so it's easier to include in the same project.

What about new projects brought up by outside developers? I guess they start a discussion here and issue a pull request from their GitHub project/fork and Appium team can decide whether to include or ignore/reject.

Appium is still a young project, but would probably be a good idea to lay down processes on how big Appium should officially expand. I know the Selenium team doesn't just pull in third party WebDriver language bindings as part of its official project (not yet anyways).

I haven't looked over the Appium (developer) documentation, if any, so I might be ignorant of what's already available/done. So with my ignorance in mind, if Appium is "Automation for apps", looking to the future, then to make Appium a comprehensive tool for automating all types of apps, it should be designed as a modular core (server) where you start it up and load the desired module/library to automate applications with. It would act more like a framework than a tool itself, acting as a Selenium server skeleton that speaks JSONWireProtocol (whatever general subset of the protocol) and it's up to the modules to implement the actual response handler functionality for each of the JSONWireProtocol commands. Whatever the module doesn't support, throw the relevant WebDriver exception back to the client.

We can then build out an Appium repository of modules to automate applications with that other developers can submit modules to. The main initial modules then would be for iOS, Android, and I guess the Mac OS X.

Those are just my 2 cents.

By the way, I've thought about utilizing Appium source (though like the older Python version than node.js since I'm not really familiar with node) to implement Windows GUI automation support. Like build an Appium interface to Sikuli and AutoIt. And a Sikuli based Appium would also be a good alternative to Dan's Mac OS X prototype since Sikuli is cross-platform. Though I haven't had the time and initiative to work on it yet, but it's still a pet project I have in mind to do sometime.

bootstrap online

unread,
Jul 27, 2013, 9:54:24 AM7/27/13
to David Luu, appium-...@googlegroups.com
Image recognition such as Sikuli is not good for automation. It's slow
and not cross platform.

Jonathan Lipps

unread,
Jul 27, 2013, 2:09:09 PM7/27/13
to David Luu, appium-...@googlegroups.com
I think these are good thoughts.

I also think there are reasons  that the main appium process is tightly coupled with android and ios automation. Here's the current architecture:

iOS:         appium server  <--unix socket-->  instruments_client.js  <--subprocess--> bootstrap.js
Android: appium server  <--tcp--> AppiumBootstrap.jar

On your proposal, we would have to add another layer to this picture:

iOS:         appium server <--http--> ios-"driver"-server <--unix socket--> instruments_client.js <--subprocess--> bootstrap.js
Android: appium server <--http--> android-"driver"-server <--tcp--> AppiumBootstrap.jar

While it is more modular, it comes at the cost of proxying extra http calls, which gets expensive.

Mozilla has implemented a TCP version of the JSONWP that could be nice for us if we're going to modularize things more.

David Luu

unread,
Jul 28, 2013, 1:59:12 AM7/28/13
to appium-...@googlegroups.com, David Luu
I know image recognition is not the best option for UI automation, but I wouldn't rule it out either. When you have cases where existing tools have a limitation gap in automating something (or say you need to instrument the source code and can't) then it can come in handy as a complementary tool to fill in the gaps.

I would not call it not cross platform either. Yes, the images used could potentially be platform specific (if the UI components are OS targeted like Windows start button, Mac icons, Windows icons, etc.) But there are also components that can be common across platforms. As a simple example (though yes, Selenium and other tools combined might be better), say you test a web application that presents a Flash component (that you don't have source to or can't automate even with instrumentation like Flash based file upload) that renders pretty much the same across browsers (and browsers on different OSes) like 99% of the time. Then a single image could represent that Flash upload button component to click for all platforms. And in case where they differ, that's just similar to some UI element locators that do differ across browsers for Selenium. the Sikuli tool is available for Windows, Linux, and Mac, so it does provide cross-platform usage, if used well. Bear in mind, you don't have to screen capture image recognize the whole screen. You look for certain items on a screen and if exist, do something to them or return true.

David Luu

unread,
Jul 28, 2013, 2:12:27 AM7/28/13
to appium-...@googlegroups.com, David Luu
Thanks for the insight on current Appium architecture Jonathan.

I haven't looked into the codebase yet, so another question: if we don't add another layer to modularize Appium, does Appium currently have a skeleton-ish architecture for someone to retrofit it for another tool (e.g. to WebDriver API enable another tool). Basically, a skeleton of Appium that simply has placeholders for the developer to interface to the actual tool. After all, on a high level, Appium is simply a "Selenium" server that listens for and routes JSONWireProtocol commands to the actual tool/code that processes it into some action then relays back a response to the Selenium client, or am I mistaken? If I'm not mistaken, a skeleton version would simply not open unix sockets or TCP connection (to iOS instruments or AppiumBootstrap.jar) and simply has placeholder comments in the handler methods for the developer to fill in the implementation and another section to fill in on failure perhaps for what to return back as an exception. I guess analyzing the current Appium server code would be a good way to start to see what it is doing and search & replace retrofit the code as needed to another tool. Though I just wonder whether a generic skeleton version might also be useful...Obviously the generic version doesn't do anything, it's more like "this is how you would implement the Appium portion of an Appium/WebDriver-based tool".

Dan Cuellar

unread,
Jul 28, 2013, 11:28:43 AM7/28/13
to appium-...@googlegroups.com, David Luu
Shifting the conversation back to Sikuli briefly. I think it would be a very powerful add-on for appium to have some find and click this image sort of behavior. Some controls are not accessible (e.g maps, videos, etc.) and it'd be nice to have some sort of openCV-ish capabilities for when the accessibility layer is totally clueless.

David Luu

unread,
Jul 28, 2013, 5:05:01 PM7/28/13
to appium-...@googlegroups.com, David Luu
If one were to implement image recognition addon/support for Appium, this would be my thought on location strategy:

* a custom find by method/type if not retrofitting against the existing ones (XPath, CSS selector, ID/name).

* locator can be defined as path to archived image to find on screen

* if retrofitting location strategy, find by ID/name could be name of image relative to default image location (specified in some config file or hard coded) and find by XPath/CSS can be just an alias for providing path to full image path or relatvie e.g. "/myPathTo/Image.png", "C:\\myPathTo\\image.png".

* could also consider being able to provide the find by value as base64 encoded string of the actual image rather than specifiying an image path or name. At Appium side, the data is then decoded back to image (and saved to temp directory on local disk as needed) to be used for image recognition.

* for specifying image by path/name, it would be local to machine running Appium, unless we deviate from JSONWireProtocol and extend it with Appium specific commands, since there's no official API to send files from WebDriver client to server remotely besides what's used internally for sendKeys() for file uploads, and when sending over FirefoxProfile remotely as part of desired capabilities. Though one could consider defining a new desired capability option for say passing over a zip file of images to be used for image recognition, encoded as base64 - it is then decoded, unzipped and deployed to temp directory for use at Appium server side. Then the tests can reference images by name relative to the image archive sent over as part of desired capability.

Dan Cuellar

unread,
Jul 29, 2013, 10:53:42 AM7/29/13
to appium-...@googlegroups.com, David Luu
I like this approach, let's open an issue as an enhancement for this on the github issues page. I like the idea of making it a locator strategy and sending an image as a base64 string.

David Luu

unread,
Jul 29, 2013, 4:06:49 PM7/29/13
to appium-...@googlegroups.com, David Luu
Dan, for locator strategy with base64 string of image, what were you thinking of specifically?

Create new locator strategy (which would require updates to Selenium client bindings) or reuse existing locator strategy to support this? Something like this perhaps:

driver.findElement(By.xpath("//ImageData:theBase64ContentHere"));

and then Appium handles it appropriately with extra logic when parsing XPath or CSS selector to check for special tag (e.g. ImageData) to handle case of passing base64 string of image?

We'd probably have to standardize on image format too unless the OpenCVish engine being used doesn't care about the format and can handle it implicitly.

Dan Cuellar

unread,
Jul 29, 2013, 7:03:21 PM7/29/13
to appium-...@googlegroups.com, David Luu
We could also just override one that we'll never ever use like By.Link or By.CssSelector, hopefully I didn't just bait the trolls with that idea.

For the format, I'm not sure what to use. I imagine something that can be transferred without lossy compression (NOT gif, or jpg) Maybe just use .png? I think TIFF would be too foreign for most people and Mac users would not want to make BMPs.

bootstrap online

unread,
Jul 29, 2013, 7:09:31 PM7/29/13
to Dan Cuellar, appium-...@googlegroups.com, David Luu
It should probably be a mobile method that works with png.
Reply all
Reply to author
Forward
0 new messages