Monday, November 19, 2012
Sikuli: Some First Impressions
Today marks a transition in many things.
From Sidereel to SocialText.
From Scrum to Kanban.
From San Francisco to Palo Alto.
From Entertainment to Collaboration.
From External Facing to Intra-Facing.
From Ruby on Rails to a hybrid of technologies including an old friend, Perl!
It's the latter aspect that has given me cause and curiosity to look into and learn about an interesting testing framework. That framework is called Sikuli, and a "visual language" called Sikuli Script.
Sikuli started as a project at MIT. It's now an open source tool that can be used with a number of different applications. It can be used with the web, it can be used with Flash apps, it can be used with compiled applications on a number of different platforms. Yeah, OK, that's all fine and good, but aren't there plenty of tools out there that already fit that space? Why use another one?
I thought much the same thing, until I thought about a few of the applications I've wanted to poke around with in the past... some of them just aren't designed with test-ability in mind, or to be more charitable, some programs just bedevil the expectations of many tools currently available. Some apps don't have much in the way of an open API to access and help with the process. Selenium works great when you can access the object layer. What do you do when you can't access those attributes so easily? What if the front end is all you have, and all you are going to get? It's here that Sikuli starts to get interesting.
Sikuli can be run in a number of different ways, and the most likely way it will be run from a first timer's perspective is to use its IDE. The IDE puts front and center a number of function calls and simple tools that the user can take advantage of to make scripts. These scripts are similar in a lot of ways to Selenium/WebDriver and Capybara in what they do, with an interesting difference. The function calls can take images as their arguments. Not paths to images, actual images on the screen. Here's a simple example using a Win32 application I use with my scout troop called Troopmaster. The following is a very basic screenshot and just a couple of commands.
This admittedly trivial example performs a very simple set of instructions:
Load an app (Troopmaster).
Wait for a value on the screen to appear.
Open the Merit Badge Counselor tool.
Click on the "Add New" button (so we can create a new Merit Badge Counselor).
That's it! All very straightforward, all very basic stuff. The cool thing is that, in many ways, doing little more than these kind of steps, you can accomplish a lot of tasks. Sikuli uses image recognition to help determine where you want to do certain things. Based on those images, often even a non programmer or tester can put together tests or automate tasks to help them accomplish certain goals (it should be noted that Sikuli is not positioned solely as a testing framework. It's also used as a sort of "macro language" to help with automating basic repetitive tasks).
Now, of course, there's a lot more Sikuli can do and there are some frustrations and challenges that need more than just "point here, click this, fill this in, Click OK". While very basic and trivial tasks can be done without extensive programming knowledge, to get beyond the training wheels basics, it helps to know what the architecture and language structure is. Sikuli is written in Jython, which is a Python implementation that allows the user to import and access many Java library functions, as well as to compile the source code down to a JVM. The user also has the full breadth of the Python language. If I ever wanted to have a good excuse to spend time with Python and get familiar with more than the basics, here's a great opportunity to do exactly that.
Again, this post is not meant to be an all encompassing tutorial. I installed it on my PC last week to get familiar with it, and wanted to get some first impressions out there. So where do I go from here? I want to get more familiar with doing things that are non trivial, and the best way I know how to do that is to, well, publicly declare that I'm going to do it. Does this sound like a new set of entries for the Practicum page? Hey! That's what it sounds like to me, too! Thus, it's time for another "bold boast"... let's see what we can do with Sikuli, and what we can't. Let's see in real time if it's a workable tool or if it's another "interesting, but..." kind of a framework. Also, let's give me a good excuse to poke around with and dive deeper into Python, let alone Jython and how it adds to this mix (to be frank, I'd never even heard of Jython before I downloaded this app, so I know almost nothing about it or its inner workings).
If you'd like to play along at home with me, you can get Sikuli and learn more about it at http://www.sikuli.org/