EWT39 - Optical Recognition. A Weekend Testing Follow-Up

I was pleasantly surprised this past week when I saw that there was going to be the 39th European Weekend Testing event. Much of the action for EWT has been covered by the group meeting on weeknights, which for me is right in the middle of my workday. Hence, I'm not able to make those. So I was happy to see that we were going to have a session that would allow me to attend, even if somewhat early (start time of 8:00 AM for those of us on the West Coast of the US, but 4:00 PM GMT/UTC).

Today I'd like to welcome Eusebiu Blindu as European Weekend Testing's newest facilitator. I think he did an excellent job and I look forward to future interactions with him at the helm of EWT's sessions.

The mission today was as follows:

Compare the following image to text converters:

1)file size capability
2)speed of conversion
3)quality of conversion (defined by the tester)
4)find other aspects that can be a relevant metric and point out the observations

From here, we split off and presented the findings of our tests. I wanted to try some things that I normally have to deal with when it comes to documents that I'd like to convert to other things. These included:

  • tables of data
  • checklists
  • phone lists

I have literally dozens of these in PDF format, many of them flat images that have been scanned or emailed to me. Oh how it would be nice to see this data liberated and usable in other capacities. These apps are designed to help do that. From this premise, I used one of my favorite approaches to testing, which I call (and so do lots of other people, it's not my idea ;) ) "persona based testing". Persona based testing is where we take on the role of a person looking to accomplish a task, and put ourselves in their shoes. By doing this, we try our best to forget the technology (especially if we know it) and try to experience the situation from the perspective of another person. My goal was to see if, with the three document types I described (all three flat PDF files) if I could get them to be viewable and in a usable manner.

All three applications could deal well with straight text, say, from a letter or a short story on a page. The ability of the applications started to diverge and break down and we added more complexity to the tasks. The Free-OCR application took my phone list and separated their four columns into one long column, one listed right after the other (not useful for its intended purpose). Online OCR, was able to scan it is with no line breaks, but preserving the line order, so that was a little more useful. Google Doc conversion maintained the format of the original file including line breaks. The toughest test was to see how it handled a table of data with cell borders (visible lines) between the data. The fact is, noe of them handled it well. All of the "scanners" broke up the text into random batches and none of them corresponded with another.

We discussed the various methods that we could put into play to test these applications,and here's where the true value of Weekend testing comes in... no matter how much you think you know about an application, no matter how much experience you have in testing, someone else is going to provide another idea for you to consider that you didn't. In a short time period, something has to give. None of us tested all aspects on all three tools (whether or not it was because we didn't have time, or figured someone else was going to test the particular item is open to interpretation). We also had an interesting discussion comparing the claims of the product vs the realities, both in what it was able to do and what it wasn't (interesting find, though one of the apps stops you at 15 uploads an hour, if you clear your cookie cache and history, and then go back to the site, you can continue to upload files).

So my thanks to the testers that participated in today's session. Each one of these helps us develop our skills and approaches to testing, and more to the point, help us think just a little differently. I'll argue that I think the latter is the more important of the group, but any practice adds value, so I'm again happy for the chance to keep improving with a little help from my friends.
