Monday, November 19, 2012

Sikuli: Some First Impressions

Today marks a transition in many things.

From Sidereel to SocialText.
From Scrum to Kanban.
From San Francisco to Palo Alto.
From Entertainment to Collaboration.
From External Facing to Intra-Facing.
From Ruby on Rails to a hybrid of technologies including an old friend, Perl!

It's the latter aspect that has given me cause and curiosity to look into and learn about an interesting testing framework. That framework is called Sikuli, and a "visual language" called Sikuli Script.

Sikuli started as a project at MIT. It's now an open source tool that can be used with a number of different applications. It can be used with the web, it can be used with Flash apps, it can be used with compiled applications on a number of different platforms. Yeah, OK, that's all fine and good, but aren't there plenty of tools out there that already fit that space? Why use another one?

I thought much the same thing, until I thought about a few of the applications I've wanted to poke around with in the past... some of them just aren't designed with test-ability in mind, or to be more charitable, some programs just bedevil the expectations of many tools currently available. Some apps don't have much in the way of an open API to access and help with the process. Selenium works great when you can access the object layer. What do you do when you can't access those attributes so easily? What if the front end is all you have, and all you are going to get? It's here that Sikuli starts to get interesting.

Sikuli can be run in a number of different ways, and the most likely way it will be run from a first timer's perspective is to use its IDE. The IDE puts front and center a number of function calls and simple tools that the user can take advantage of to make scripts. These scripts are similar in a lot of ways to Selenium/WebDriver and Capybara in what they do, with an interesting difference. The function calls can take images as their arguments. Not paths to images, actual images on the screen. Here's a simple example using a Win32 application I use with my scout troop called Troopmaster. The following is a very basic screenshot and just a couple of commands.

This admittedly trivial example performs a very simple set of instructions:

Load an app (Troopmaster).
Wait for a value on the screen to appear.
Open the Merit Badge Counselor tool.
Click on the "Add New" button (so we can create a new Merit Badge Counselor).

That's it! All very straightforward, all very basic stuff. The cool thing is that, in many ways, doing little more than these kind of steps, you can accomplish a lot of tasks. Sikuli uses image recognition to help determine where you want to do certain things. Based on those images, often even a non programmer or tester can put together tests or automate tasks to help them accomplish certain goals (it should be noted that Sikuli is not positioned solely as a testing framework. It's also used as a sort of "macro language" to help with automating basic repetitive tasks).

Now, of course, there's a lot more Sikuli can do and there are some frustrations and challenges that need more than just  "point here, click this, fill this in, Click OK". While very basic and trivial tasks can be done without extensive programming knowledge, to get beyond the training wheels basics, it helps to know what the architecture and language structure is. Sikuli is written in Jython, which is a Python implementation that allows the user to import and access many Java library functions, as well as to compile the source code down to a JVM. The user also has the full breadth of the Python language. If I ever wanted to have a good excuse to spend time with Python and get familiar with more than the basics, here's a great opportunity to do exactly that.

Again, this post is not meant to be an all encompassing tutorial. I installed it on my PC last week to get familiar with it, and wanted to get some first impressions out there. So where do I go from here? I want to get more familiar with doing things that are non trivial, and the best way I know how to do that is to, well, publicly declare that I'm going to do it. Does this sound like a new set of entries for the Practicum page? Hey! That's what it sounds like to me, too! Thus, it's time for another "bold boast"... let's see what we can do with Sikuli, and what we can't. Let's see in real time if it's a workable tool or if it's another "interesting, but..." kind of a framework. Also, let's give me a good excuse to poke around with and dive deeper into Python, let alone Jython and how it adds to this mix (to be frank, I'd never even heard of Jython before I downloaded this app, so I know almost nothing about it or its inner workings).

If you'd like to play along at home with me, you can get Sikuli and learn more about it at 


Michael Herrmann said...

***shameless advertising start***

I stumbled over your blog because I saw and really liked the video of your presentation about ATDD, GUI Automation and Exploratory Testing at CAST 2012. I think you were spot on with everything you said.

Sikuli, while useful as a fallback, has several drawbacks:
* it's difficult to cater for slightly different images, eg different themes on different versions of Windows
* images are more cumbersome to work with than plain text, for instance when it comes to search/replace or version control
* it's not really possible to extract data using images, eg reading the value of a text field.

I'm working on a GUI automation tool called Automa. It lets you work with images like Sikuli but advocates the use of its plain text commands. For instance:
click("File", "Save")
write("test.txt", into="File name")

One of the cool things is that it doesn't require reference to application internal ids like "name1234" or XPaths. This in turn makes it possible to write test/automation scripts before the implementation, in the true spirit of ATDD. Our homepage is Maybe you'll like it!
***shameless advertising end***

Michael Larsen said...

Michael, thanks the the stumble, the information, and the shameless plug :). autloma sounds interesting, I'll definitely take a look.