Wednesday, November 26, 2014

Green Thoughts: Day Two at #esconfs

It really puts into perspective how much of a time difference eight hours is. I was mostly good yesterday, but this morning came way too fast, and I am definitely feeling the time change. A bit of a foggy morning today, but a brisk walk from the Maldron Pearse and a snack on a peanut bar, and I'm ready to go :).

Convention Center Dublin, aka Where the Action is :)
The first keynote for today is being given by Isabel Evans and its titled "Restore to Factory Settings", or to put simply, when a change program goes wrong. We always want to think that making changes will always be positive. the truth is, there are some strange things that happen, and its entirely possible that we might find ourselves in situations we never considered. Isabel works with Dolphin Systems, who is associated with Accessibility issues (hey, something in my wheelhouse, awesome :) ).

The initial problems were related to quality issues, and their idea was that improved testing would solve these problems. Isabel figured "30 years of testing, sure I can do this, but there's not just testing issues here". Starting with issue #1, improve testing. First, there were no dedicated testers. Testing was done by whoever would be able to do testing. Seemed an obvious first step was to do some defect recognition, and develop skills to help them discover defects and actually talk about them.

Isabel suggested that she sit with the developers and work with them, and even that was a request that was at first a difficult transition. She had to fight for that position, but ultimately they decided it made sense to work with that arrangement. By recruiting some of the support team and getting others involved, they were able to put together a test team.

With a variety of initiatives, they were able to improve defect recognition, and requirements acquisition also improves. Sounds great, right? well, the reality is that the discovery of issues was actually adding to the time to release. They improved the testing, which identified more bugs, which added a greater workload, which adds pressure to the release schedule, which means more mistakes are made, and more bugs are introduced. Now, for those of us who are familiar with testing, this is very logical, but the point Isabel is making is that testing alone, and focusing on a greater emphasis on testing will not automatically mean better quality. In fact, it has a significant chance of making the issues worse at first, because they are now being shown the light of day.

The talk title "Restore to Factory Settings" is that, when things get tough, the natural reaction is to go back to doing what everyone always did. there are enthusiastic adopters, people against the whole idea, and then there are waverers in the middle. The waverers are the ones who hold the power. They will revert back to their SOP when things get tough. Even the enthusiastic adopters, if they are not encouraged, will revert back to SOP. The people against will go back to the old ways the second they get a chance. Management, meanwhile, is getting agitated that this big movement to change everything is not getting traction. Sounds absurd, but it happens all the time, and I'm sure we've all experienced this in one form or another.

The key takeaway that Isabel was describing is that changes to testing are often icing on a much thicker and denser cake. Changing testing will not change the underlying culture of a company. It will not change the underlying architecture of a product. Management not being willing to change their approach also adds to the issues. If the rest of the cake is not being dealt with, testing alone will not improve quality. In fact, it's likely in the short term to make quality issues even worse, because now there is clarity of the issues, but no motivation to change the behaviors that brought everyone there.

This all reminds me of the four stages of team development (forming, storming, norming and performing), and the fact that the testing changes fit clearly into the storming stage. If the organization doesn't embrace the changes, then the situation never really gets out of the storming stage, and morale takes perpetual hits. Plans describe what likely won't happen in the future, but we still plan so we have a basis to manage change. Risk management is all about stuff that hasn't happened, but we still need to consider it so we are prepared if it actually does happen. In short, "Say the Same, Do the Same, Get the Same".

Change is hard, and every change in the organization tends to cause disruption. Change programs bring all of the ugly bits to the surface, and the realizations tumble like dominos. To quote Roland Orzabel's "Goodnight Song", nothing ever changes unless there's some pain. As the pain is recognized, the motivation for change becomes more clear. Prioritization takes center stage. Change has a real fighting chance of succeeding.

Ultimately, there is a right time for implementing changes, and not any one thing is going to solve all the problems. Continuous improvement is a great pair of buzzwords, but the process itself is really tricky to implement and, more important, to sustain.

---

Next up, "Every Software Tester has a PRICE" with Michael Bolton. I've come to this session because I am often curious as to how we present the information we find, and the ways that we can gather that information. Ecery test should have an expected predicted result. Makes sense for checks, but it doesn't really hold up for testing. Testing is more involved, and may lead you to completely different conclusions. Often, the phrase we hear is "I don't have enough information to test". Is that true? It may well be, but the more fundamental question is "where do we get the information we need in the first place?"

Our developers, our company web site, our user guide, our story backlog, our completed stories, our project charter, our specifications, our customers, other products in our particular space, etc. Additionally, the elusive requirements that we are hoping to find to inform us are often not anything that is written down. Tacit knowledge that resides in the heads collectively of our organization is what ultimately makes up the requirements that matter. The tricky part is gathering together all of the miscellaneous parts so that it can be made actionable. Think about it this way. For those of us who have kids, do we know the exact address of our kids schools or where they go for their extra curricular activities? I'm willing to bet most of us don't, but we know how to get there. It only becomes an issue when we have to explain it to someone else. As a tester, we need to consider ourselves that person that needs to consider how to get those addresses for all of those collective kids and where they need to go.

The fact is, there's lots of information that is communicated to us by body language and by inference. Requirements are ultimately "collective tacit knowledge". What we require of a product cannot be completely coded or ever truly known. That doesn't mean that we cannot come close, or get to a reasonable level that will help generate a good enough working model. One of the interesting aspects of the iPhone, for example, is "charisma"... what makes an iPhone an iPhone, and what makes it compelling? Is it its technical capabilities, or does it just "feel good"? How do we capture that charisma as a product element, as a feature to be tested?

One of the best sources of information, and one not talked about very often, is the process of "experimentation". In other words, we develop the requirements by actively testing and experimenting with the product, or with the people responsible for the product. Interviewing individuals associated with the product will help inform what we want to be building (think customer support, customers, manufacturing, sales, marketing,  subject matter experts, senior management, etc.) and our experimenting with their input will give us even more ideas to focus on). We also develop oracles to help us see potential issues (in the sense that an oracle is some mechanism that helps us determine if there is an issue before us). The product itself can inform us of what it could do. We can also do thought experiments of what a product might do.


What this shows us is that there are many sources of information for test ideas and test artifacts in ways that most of us never consider. We place limits on our capabilities that are artificial. So many of our ideas are limited by our own imaginations and our own tenacity. If we really want to get deep on a topic, we are able to do that and do it effectively. Often, though, we suffer not from a lack of imagination, but from a lack of will to use it. So much of what we want to do is dictated by a perceived lack of time, so that we try to limit ourselves to the areas that will be the quickest and most accessible. This is not a bad thing, but it points out or limitations in our efforts. We trade effectiveness for efficiency, and in the process, we cut off so many usable avenues that will help us define and determine how to guide our efforts.

---
Next up, "How Diversity Challenged me to be Innovative as a Test Team Leader" with Nathalie Rooseboom de Vries - van Delft.

What does diversity really mean? What does it mean to embrace and utilize diversity? What happens when you go from being a team of one as a consultant to wanting to be a team leader an manage people? How can we get fifteen unique and different people to work together and become a single team? What's more, what happens when you have to work with a team and a culture that is ossified in older practices? This is the world Nathalie jumped into. I frankly don't envy her.

One of the biggest benefits of being a consultant is that, after a period of doing a particular job or focus, you can leave, and the focus is temporary, and you don't have to live with the aftermath of the decisions that follow on. When we make a commitment to become part of a team long term, we inherit all of the dysfunction, oddity, and unique factors that the team is built from. The dynamics of each organization is unique, but they tend to have similar variations on a theme. The ultimate goal of an organization is to release a product that makes money. Testers are there to help make sure the product that goes out is of the highest quality possible, but make no mistake, testers do not make a company money (well, they do if you are selling testing services, but generally speaking, they don't). Getting a team on the same page is a challenge, and when you aim to get everyone working together, part of that balance is understanding how to emphasize the strengths of your team mates.

Nathalie uses an example of what she called a "Parent/Adult/Child" metaphor to transactions. The Parent role can have over Positive and over Negative issues. the Parent role can nurture, but it can also be controlling, it can be consoling and yet blaming. the child role is both docile and rebellious, unresponsive and insecure. In some early interactions, there may well be Parent/Child interactions, but the goal is to move away over time to a more Adult/Adult interaction. To get that equality of behavior, sometimes you have to use the parent relationship to get the behavior from the "Child", or if you want to get the Parent to respond differently, the Child needs to use a different technique to get that behavior to manifest.

The ability to challenge members of your team will require different methods. Diversity of the team will make it impossible to use the same technique for every member. They each have unique approaches and unique interests and motivations. One of Nathalie's approaches is to have a jar with lollipops and a question and answer methodology. If you post a question, you get a lollipop. If you answer a question, you get a lollipop, too. The net result is that people realize that they can answer each other's questions. They can learn from each other, and they can improve the overall team's focus by adding to the knowledge of the entire team and getting a little recognition for doing that. She also uses a simple game called "grababall" which has a number of tasks and things that need to be done. The idea is that when you grab a ball, you have a goal inside of the ball to accomplish. If you accomplish the goal, you get a point. At the end of the year, the highest point accrual gets a prize. By working on these small goals and getting points, the team gets engaged, and it become a bit more fun.

Diversity is more than just the makeup of the team, of having different genders, life experiences or ethnic backgrounds. Diversity goes farther. Understanding the ways that your team members are motivated, and the different ways that they can be engaged can give huge benefits to the organization. Take the time to discover how they respond, and what aspects motivate them, then play to those aspects.

---

Next up, "Introducting Operational Intelligence into Testing" with Albert Witteveen. Albert has had a dual career, where he has spent time in both testing and in operations (specifically in the Telco space). Testers are all familiar with the issues that happen after a product goes live. The delay, the discovery, the finger pointing... yet Operations discovers the problem in a short period of time.

What is the secret? Why do the operations people find things testers don't? It's not as simple as the testers missed stuff (though that was part of the answer), it's also that the operational folks actually utilize the product an manage and monitor the business processes. Operations people have different tools, and have different focuses.
Testers can be a bit myopic at times. If our tests pass, we move on to other things. Small errors may be within the margin of error for us. In Ops, the errors need to be addressed and considered. Operations doesn't have an expected result, they are driven by the errors and the issues. In the Ops world, "every error counts".

Operations managers have log entries and other issues that are reported. With that, they work backwards to help get the systems to tell them where the issues are occurring. In short, logs are a huge resource, and few testers are tapping them for their full value.

So what does this mean? Does it mean we need Operations people on the testing team? Actually, that's not a bad idea. If possible, have a liaison working with the testers. If that's not a reasonable option, then have the operations people teach/train the testers how to use logs and look for issues.

Sharing the tools that operations uses for monitoring and examining the systems would go a long way to be able to see what is happening to the servers with a real load and a real analytics of what is happening in the systems over time.  If there is any one silver bullet I can see from doing Ops level monitoring and testing, it's that we can objectively see the issues, and we can see them as they actually happen, not just when we want to see it happen.

--

I'm in Adam Knight's talk "Big Data, Small Sprint". What is big data, other than the buzzword that talks about storing a lot of details and records? Who three years ago even really knew what "big data" was? When you talk about big data, you are talking about large bulk load data. Adam's product specifically deals with database archiving.

This model deals with tens of millions of records, dozens of partitions and low frequency ingesting of data (perhaps once a week). their new challenge was to handle millions of records per hour, with tens of thousands of partitions. The ability to work within Agile and targeting the specific use cases of this customer, they were able to deliver the basic building blocks to this customer within one sprint. Now imagine storing tens of billions of records each day (I'm trying to, really, and it's a bit of a struggle). Adam showed a picture of an elephant, then a Titanosaurus, and then the Death Star. This is not meant to represent the size of increase for the records, but the headaches that testers are now dealing with.

In a big data system, can we consider the Individual? Yes, but we cannot effectively test every individual uniquely. Can the data be manipulated? Yes, but it needs to be done in a different way. We also can't manage an entire dataset on a single machine.  we an back up a system, but the back up will be too big for testing purposes. Is big data too big to wrap ones head around? It requires a different order of magnitude to discuss (think moving from kilometers or miles to astronomical units or light years to describe distances in space).

OK, so this stuff is big. we get that now. But how can we test something this big?  We start by changing our perspective, and we shift from focusing on every record to focusing on the structures and how they are populated with representative data (from records to partitions, from data to metadata, from single databases to clusters). Queries would not be made where they would pull a row from every conceivable table. Instead, we'd be looking at pulling representational data over multiple partitions. Testers working on big data projects need to develop special skills beyond classic testing. There is a multi-skill request, but the idea of getting multiple testers that have all of the skills needed in one person is highly unlikely. Adam discusses the idea of developing the people on the test team to strive to be "T" shaped.  A T-shaped tester would have many broad but rudimentary or good  test skills, as well as a few core competencies that they would know very deeply. By combining complementary T-shaped testers, you can make for a fully functional square shaped team.

Adam mentioned using Ganglia as a way to monitor a cluster of machines (there's that word again ;) ) so that the data, the logs and other detail can be examined in Semi Real times. To be frank, my systems don't come anywhere close to these levels of magnitude, but we still have a fairly extensive amount of data, and these approached are interesting, to say the least :).

---

I promised my friend Jokin Aspiazu that I would give him a testing challenge while we were here at EuroStar. Jokin humored me for the better part of an hour and a half showing me how to tackle an issue in the EuroSTAR test lab. I asked him to evaluate a time tracking application and either sell me on its implementation or convince me it was not shippable, and to find a scenario significant enough to kill the project.

He succeeded :).

Of course, this app is designed to be a TestLab app to be puzzled through and with, but I asked him to look beyond the requirements for the Test Lab's challenge and look at the product holistically, and to give me a yes/no in a limited amount of time (all the while articulating his reasoning as he was doing it, which I'm sure must have been mildly annoying ;) ).

With that, I determined Jokin earned advancement as a Brown Belt in the Miagi-do School of Software Testing.  For those of you here, high five him and buy him a drink when you see him, he's earned it!

---
A last minute substitution caused me to jump onto another talk, in this case "The Silent Assassins" by Geoff Thompson. These are the mistakes that can kill even the best planned projects.

Silent Assassin #1: Focus on Speed, not on Quality. Think of the idea of a production floor taking up more space to fix problems coming off the line than is allocated to actually developing new product well.

Silent Assassin #2: The Blindness to the True Cost of Quality. Supporting and maintaining the software costs a lot more than putting it together initially. Consider the amount of money it will take to maintain a system.

Silent Assassin #3: Go by Feel, Not by Facts. Metrics can be abused, and we can measure all sorts of worthless stuff, but data and actual deliverables are real things, and therefore, we need to make sure we have the facts on our side to say if we are going to be able to deliver a product on time. In short, we don't know what we don't know, so get our relevant facts in order.

Sinet Assassin #4: Kicking Off a Project Before the Business is Ready. Do our customers actually understand what they will be getting? It's not enough for us to deliver what "we" think the appropriate solution is, the customers need to have a say,and if we don't give them a say, the adoption may be minimal (opr even non-existent). Which leads to...

Silent Assassin #5: Lack of Buy-in From Users. Insufficient preparation, a lack of training, no demonstration of new features and methods, will likewise kill the adoption of a project with users.

Silent Assassin #6: Faulty Design. Software design defects compound as they go. The later a fundamental design issue, the harder it will be to fix, and in many cases, the problems will be exponentially more difficult to fix.

OK, that's all great, but what can we actually do about it? To disarm the assassins, you need to approach the areas that these problems fall into. The first area is processes. It's the way you do the work and the auto-pilot aspects of the work we do. The next is People. Getting people on the teams to work with each other. Getting buy-in from customers and communicating regularly, and taking the people into consideration of our efforts. The last area is tools, and its listed last because we often reach for the tool first, but if we haven't figured out the other two first, the tools are not going to be effective (or at least not as effective as they actually could be). Focus on effectiveness first, then shoot for efficiency.

Shift Left & Compress: put a clear focus to deliver the highest quality solution to customers at the lowest possible cost point. In my vernacular, this comes down to "review the work we do and get to the party early". Focus on the root causes of issues, and actually do something to stop the issues from happening. The Compress point is to do it early, prioritize up front, and spend your efforts in the most meaningful efforts. easy to say, often really difficult to do correctly. Again, the organization as a whole needs to buy in to this for it to be effective. This may also... actually, scratch that, it will need to have investments in time, money, energy, emotion and commitment to get past these assassins. These are difficult issues, and they are costly ones, but tackling it head on may give you a leg up on delivering a product that will be much less costly to maintain later. The money will be spent. The question is how and when ;).

---

Yes, this happened, and yes, it was glorious :)!!!

...and as an added bonus, Smartbear sings a Tester's Lament to the score of Frozen's "Let It Go" :)
---

Wednesday's closing Keynote is with Julian Harty; the topic is "Software Talks - Are You Listening?"

First question... why bother? Don't we know what we need to do? Of course we do, or so we think. However, I totally relate to the fact that software tells us a lot of things. It has its own language, and without listening, we can do many things that are counter to our best intentions.

The first way that our software talks to us is through our user base and through their interactions. If we remove features they like, especially in the app world, the software will tell us through lower ratings (perhaps much lower ratings). Analytics can help us, but yet again, there's much more we can learn earlier (and actually do something about) than do it later when it's been released.

Logs are our friend, and in many ways, the logs are the most loquacious artifact of all. So much information is available and most testers don't avail themselves of it, and that is if they look at them at all. analytics can be filtered from devices, churned into a number of different breakdowns and then we try to understand what is happening in real time. The information we gather can be helpful, but we need to develop insights from it. we want to gather design events, implementation events, field test data, evaluations, things that will tell us who is using what, when and where. A/B Testing fits very well in this space. We can see how one group of users reacts compared to another group. We can gauge precision and accuracy, so long as we don't conflate the two automatically. It's entirely possible that we can be incredibly accurate, but we are missing the target completely.

There are darks sides to analytics, too. One cardinal rule is "Do No Harm", so your app should not do things that would have negative effects (such as having a flashlight app track your every move while it is in use and upload that location data). We can look at the number of downloads, the number of crashes and the percentage of users that use a particular revision.  If we see that a particular OS is in significant use, and that OS has a number of crashes, we can deduce the priority of working on that issue and its effect on a large population of users.

The key takeaway is that we can learn a lot about our users and what they do, what they don't do and what they really wish they could do. We leave a lot on the table when we don't take advantage of what the code tells us... so lest's make sure we are listening ;).

Well, that's it for today, not counting socializing, schmoozing and dinner. Hope you had fun following along today, and I'll see you again tomorrow morning.

No comments: