Thursday, April 2, 2015

Delivering The Goods: A Live Blog from #STPCON, Spring 2015



Two days goes by very fast when you are live-blogging each session. It's already Thursday, and at least for me, the conference will end at 5:00 p.m. today, followed by a return to the airport and a flight back home. Much gets packed into these couple of days, and many of the more interesting conversations we have had have been follow-ups outside of the sessions, including a fun discussion that happened during dinner with a number of the participants (sorry, no live blog of that, unless you count the tweet where I am lamenting a comparison of testers to hipsters ;) ). I'll include a couple of after hour shots just to show that it's not all work and conferring at these things:


---

Today I am going to try an experiment. I have a good idea the sessions I want to attend, and this way, I can give you an idea of what I will be covering. Again, some of these may matter to you, some may not. At least this way, at the end of each of these sessions, you will know if you want to tune in to see what I say (and this way I can give fair warning to everyone that I will do my best to keep my shotgun typing to a minimum. I should also say thank you (I think ;) ) to those who ran with the mimi-meme of my comment yesterday with hashtag "TOO LOUD" (LOL!).

---

9:00 am - 10:00 am
KEYNOTE: THINKING FAST AND SLOW – FOR TESTERS’ EVERYDAY LIFE
Joseph Ours



Joseph based his talk on the Daniel Kahneman book  "Thinking Fast and Slow". The premise of the book is that we have two fundamental thinking systems. The first thinking system is the "fast" one, where we can do things rapidly and with little need for extended thought. It's instinctive. By contrast, there's another thinking approach that's required that makes us slow down to work through the steps. that is our non-instinctual thinking, it requires deeper thought and more time. both of these approaches are necessary, but there's a cost to switch between the two. It helps to illustrate how making that jump can lose us time, productivity and focus. I appreciate this acutely, because I do struggle with context-switching in my own reality.

One of the tools I use if I have to deal with an interruption is that I ask myself if I'm willing to lose four hours to take care of it. Does that sound extreme? Maybe, but it helps me really appreciate what happens when I am willing to jump out of flow. By scheduling things in four hour blocks, or even two hour blocks, I can make sure that I don't lose more time than I intend to. Even good and positive interruptions can kill productivity because of this context switch (jumping out of testing to go sit in a meeting for a story kickoff). Sure,the meeting may have only been fifteen minutes, but getting back into my testing flow might take forty five minutes or more to get back to that optimal focus again.

Joseph used a few examples to illustrate the times when certain things were likely to happen or be more likely to be effective (I've played with this quite a bit over the years, so I'll chime in my agreement or or disagreement.

• When is the best time to convince someone to change their mind?

This was an exercise where we saw words that represented colors, and we needed to call out the words based on a selected set of rules. When there was just one color to substitute with a different word, it was easier to follow along. When there were more words to substitute, it went much slower and it was harder to make that substitution. In this we found our natural resistance to change our mind as to what we are perceiving. The reason we did better than other groups who tested this was that our ability to work through the exercise was much more likely to be successful in the morning after breakfast rather than later in the day after we are a little fatigued. Meal breaks tend to allow us to change our opinions or minds because blood sugar gives us energy to consider other options. If we are low on blood sugar, the odds of persuading a different view are much lower.

• How do you optimize your tasks and schedule?

Is there a best time for creativity? I know a bit of this, as I've written on it before, so spoilers, there are times, but they vary from person to person. Generally speaking, there are two waves that people ride throughout the day, and the way that we see things is dependent on these waves. I've found for myself that the thorniest problems and the writing I like to do I can get done early in the morning (read this as early early, like 4 or 5 am) and around 2:00 p.m. I have always used this to think that these are my most creative times... and actually, that's not quite accurate. What I am actually doing is using my most focused and critical thinking time to accomplish creative tasks. That's not the same thing as when I am actually able to "be creative". What I am likely doing is actually putting to output the processing I've done on the creative ideas I've considered. When did I consider those ideas? Probably at the times when my critical thinking is at a low. I've often said this is the time I dod my busywork because I can't really be creative. Ironically, the "busy work time" is likely when I start to form creative ideas, but I don't have that "oh, wow, this is great, strike now" moment until those critical thinking peaks. What's cool is that these ideas do make sense. By chunking time around tasks that are optimized for critical thinking peaks and scheduling busy work for down periods, I'm making some room for creative thought.

Does silence work for or against you?

Sometimes when we are silent when people speak, we may create a tension that causes people to react in a variety of different ways. I offered to Joseph that silence as a received communication from a listener back to me tends to make me talk more. This can be good, or it can cause me to give away more than I intend to. The challenge is that silence doesn't necessarily mean that they disagree, are mad, or are aloof. They may just genuinely be thinking, withholding comment, or perhaps they are showing they don't have an opinion. The key is that silence is a tool, and sometimes, it can work in unique and interesting ways. As a recipient, it lets you reflect. as a speaker, it can draw people out. the trick is to be willing to use it, in both directions.

---

10:15 am - 11:15 am
RISK VS COST: UNDERSTANDING AND APPLYING A RISK BASED TEST MODEL
Jeff Porter

In an ideal world, we have plenty of time, plenty of people, and plenty of system resources, and assisting tools to do everything we need to do. Problem is, there's no such thing as that ideal environment, especially today. We have pressures to release more often, sometimes daily. While Agile methodologies encourage us to slice super thin, the fact is, we still have the same pressures and realities. Instead of shipping a major release once or twice a year, we ship a feature or a fix each day. The time needs are still the same, and the fact is, there is not enough time, money, system resources or people to do everything comprehensively, at least not in a way that would be economically feasible.



Since we can't guarantee completeness in any of these categories, there are genuine risks to releasing anything. we operate at a distinct advantage if we do not acknowledge and understand this. As software testers, we may or may not be the ones to do a risk assessment, but we absolutely need to be part of the process, and we need to be asking questions about the risks of any given project. Once we have identified what the risks are, we can prioritize them, we can identify them, and from that, we can start considering how to address them or mitigate them.

Scope of a project will define risk. User base will affect risk. Time to market is a specific risk. User sentiment may become a risk. Comparable products behaving in a fundamentally different manner than what we believe our product should do is also a risk. We can mess this up royally if we are not careful.

In the real world, complete and comprehensive testing is not possible for any product. That means that you will always leave things untested. It's inevitable. By definition, there's a risk you will miss something important, and leave yourself open to the Joe Strazzere Admonition ("Perhaps they should have tested that more!").

Test plans can be used effectively, not as a laundry list of what we will do, but as a definition and declaration of our risks, with prescriptive ideas as to how we will test to mitigate those risks. With the push to removal of wasteful documentation, I think this would be very helpful. Lists of test cases that may or may not be run aren't very helpful, but developing charters based on risks identified? That's useful and not wasteful documentation. In addition, have conversations with the programmers and fellow testers. Get to understand their challenges and areas that are causing them consternation. It's a good bet that if they tell you a particular area has been giving them trouble, or has taken more time than they expected, that's a good indication that you have a risk area to test.

It's tempting to think that we can automate much of these interactions, but the risk assessment, mitigation, analysis and game plan development is all necessary work that we need to do before we write line one of automation. All of those are critical, sapient tasks, and critical thinking, sapient testers are valuable in this process, and if we leverage the opportunities, we can make ourselves indispensable.

---

11:30 am - 12:30 pm
PERFORMANCE TESTING IN AGILE CONTEXTS
Eric Proegler

the other title for this talk is "Early Performance Testing", and a lot of the ideas Eric is advocating is to look for ways to front load performance testing rather than wait until the end and then worry about optimization and rework. this makes a lot of sense when we consider that getting performance numbers early in development means we can get real numbers and real interactions. It's a great theory, but of course the challenge is in "making it realistic". Development environments are by their very nature not as complete or robust as a production environment. In most cases, the closest we can come to is an artificial simulation and a controlled experiment. It's not a real life representation, but it can still inform us and give us ideas as to what we can and should be doing.



One of the valuable systems we use in our testing is a duplicate of our production environment. In our case, when I say production, what I really mean is a duplicate of our staging server. Staging *is* production for my engineering team, as it is the environment that we do our work on, and anything and everything that matters to us in our day to day efforts resides on staging. It utilizes a lot of the things that our actual production environment uses (database replication, HSA, master slave dispatching, etc.) but it's not actually production, nor does it have the same level of capacity, customers and, most important, customer data.

Having this staging server as a production basis, we can replicate that machine and, with the users, data and parameters as set, we can experiment against it. Will it tell us performance characteristics for our main production server? No, but it will tell us how our performance improves or degrades around our own customer environment. In this case, we can still learn a lot. By developing performance tests against this duplicate staging server, we can get snapshots and indications of problem areas we might face in our day to day exercising of our system. What we learn there can help inform changes our production environment may need.

Production environments have much higher needs, and replicating performance, scrubbing data, setting up a matching environment and using that to run regular tests might be cost prohibitive, so the ability to work in the small and get a representative look can act as an acceptable stand in. If our production server is meant to run on 8 parallel servers and handle 1000 consecutive users, we may not be able to replicate that, but creating an environment with one server and determining if we can run 125 concurrent connections and observe the associated transaction can provide a representative value. We may not learn what the top end can be, but we can certainly determine if problems will occur below the single server peak. If we discover issues here, it's a good bet production will likewise suffer at its relative percentage of interactions.

How about Perfomance Testing in CI? Can it be done? It's possible, but there are also challenges. In my own environment, were we to do performance tests in our CI arrangement, what we are really doing is testing the parallel virtualized servers. It's not a terrible metric, but I'd be leery of assigning authoritative numbers since the actual performance of the virtualized devices cannot be guaranteed. In this case, we can use trending to see if we either get wild swings, or if we get consistent numbers with occasional jumps and bounces.

Also, we can do performance tests that don't require hard numbers at all. We can use a stop watch, watch the screens render, and use our gut intuitions as to whether or not the system is "zippy" or "sluggish". They are not quantitative values, but they have a value, and we should leverage our own senses to encourage further explorations.

The key takeaway is that there is a lot we can do and there's a lot of options we have so that we can make changes and calibrate our interactions and areas we are interested in. We may not be able to be as extensive as we might be with a fully finished and prepped performance clone, but there's plenty we can do to inform our programmers as to the way the system is behaving under pressure.

---

1:15 pm - 1:45 pm
KEYNOTE: THE MEASURES OF QUALITY
Brad Johnson

One of the biggest challenges we all face in the world of testing is that quality is wholly subjective. there are things that some people care about passionately that are far less relevant to others. The qualitative aspects are not numerable, regardless of how hard we want to try to do so. Having said that, there are some areas that counts, values, and numbers are relevant. to borrow from my talk yesterday, I can determine if an element exists or if it doesn't. I can determine if the load time of a page. "Fast" or "slow" are entirely subjective, but if I can determine that it takes 54 milliseconds to load an element on a page as an average over 50 loads, that does give me a number. The next question of course is to think "is that good enough?" It may be if it's a small page with only a few elements. If there are several elements on the page that take the same amount of time to serially load, that may prove to be "fast" or "slow".



Metrics are a big deal when it comes to financials. We care about numbers when we want to know how much stuff costs, how much we are earning, and to borrow an oft used phrase "at the end of the day, does the Excel line up?" If it doesn't, regardless of how good our product is, it won't be around long. Much as e want to believe that metrics aren't relevant. Sadly they are, in the correct context.

Testing is a cost. Make no mistake about it. e don't make money for the company. We can hedge against losing money, but as testers, unless we are selling testing services, testing is a cost center, it's not a revenue center. To the financial people, any change in our activities and approaches is often looked at in the costs those changes will occur. their metric is "how much will this cost us?" Our answer needs to be able to articulate "this cost will be leveraged by securing and preserving this current and future income". Glamorous? Not really, but its essential.

What metrics do we as testers actually care about, or should we care about? In my world view, I use the number of bugs found vs number of bugs fixed. That ratio tells me a lot. This is, yet again,  a drum I hammer regularly, and this should surprise no one when I say I personally value the tester whose ratio of bugs reported to bugs fixed is closest to 1:1. Why? It means to me that testers are not just reporting issues, but that they are advocating for their being fixed. Another metric often asked about is the number of test cases run. Tome, it's a dumb metric, but there's an expectation outside of testing that that is informative. We may know better, but how do we change the perspective? In my view, the better discussion is not "how many test cases did you run" but "what tests did you develop and execute relative to our highest risk factors?" Again, in my world view, I'd love to see a ratio of business risks to test charters completed and reported to be as close to 1:1 as possible.

In the world of metrics, everything tends to get boiled down to Daft Punk's "Harder Better Faster Stronger". I use that lyrical quote not just to stick an ear-worm in your head (though if I have you're welcome or I'm sorry, take your pick), but it's really what metrics mean to convey. Are we faster at our delivery? Are we covering more areas? Do we finish our testing faster? Does our deployment speed factor out to greater revenue? One we either answer Yes or No, the next step is "how much or how little, how frequent or infrequent. What's the number?"

Ultimately, when you get to the C level execs, qualitative factors are tied to quantitative numbers, and most of the time, the numbers have to point to a positive and/or increasing revenue. That's what keeps companies alive. Not enough money, no future, it's that simple.

Brad suggests that, if we need to quantify our efforts, these are the ten areas that will be the most impactful.


It's a pretty good list. I'd add my advocacy and risk ratios, too, but the key to all of this is these numbers don't matter if we don't know them, and they don't matter if we don't share them.

---

2:00 pm - 3:00 pm
TESTING IS YOUR BRAND. SELL IT!
Kate Falanga


One of the oft heard phrases that is heard among software testers and about software testing is that we are misunderstood. Kate Falanga is in some ways a Don Draper of the testing world. She works with Huge, Huge is like Mad men, just more computers, though perhaps equal amounts of alcohol ;). Seriously, though, Kate approaches software testing as though it were a brand, because it is, and she's alarmed at the way the brand is perceived. The fact is, every one of us is a brand unto ourselves, and what we do or do not do affects how that brand is perceived.



Testers are often not very savvy about marketing themselves. I have come to understand this a great deal lately. The truth is, many people interpret my high levels of enthusiasm, my booming voice, and my aggressive passion and drive to be good marketing and salesmanship. It's not. It can be contagious, it can be effective, but that doesn't translate to good marketing. Once that shine wears off, if I can't effectively carry objectives and expectations to completion, or to encourage continued confidence, then my attributes matter very little, and can actually become liabilities.

I used to be a firebrand about software testing and discussing all of the aspects about software testing that were important... to me. Is this bad? Not in and of itself, but it is a problem if I cannot likewise connect this to aspects that matter to the broader organization. Sometimes my passion and enthusiasm can set an unanticipated expectation in the minds of my customers, and when I cannot live up to that level of expectation, there's a let down, and then it's a greater challenge to instill confidence going forward. Enthusiasm is good, but the expectation has to be managed, and it needs to align with the reality that I can deliver.

Another thing that testers often do is they emphasize that they find problems and that they break things. I do agree with the finding problems, but I don't talk about breaking things very much. Testers generally speaking don't break things, we find where they are broken. Regardless of how taht's termed, it is perceived as a negative. It's an important negative, but it's still seen as something that is not pleasant news. Let's face it nobody wants to hear their product is broken. Instead, I prefer, and it sounds like Kate does to, that emphasizing more positive portrayals of what we do is important. Rather than say "i find problems", instead, I emphasize that "I provide information about the state of the project, so that decision makers can make informed choices to move forward". Same objective, but totally different flavor and perception. Yes I can vouch for the fact that the latter approach works :).

The key takeaway is that each of us, and by extension our entire team, sells to others an experience, a lifestyle and a brand. How we are perceived is both individual and collectively, and sometimes, one member of the team can impact the entire brand, for good or for ill. Start with yourself, then expand. Be the agent of change you really want to be. Ball's in your court!

---

3:15 pm - 4:15 pm
BUILDING LOAD PROFILES FROM OBJECTIVE DATA
James Pulley

Wow, last talk of the day! It's fun to be in a session with James, because I've been listening to him via PerfBytes for the past two years, so much of this feels familiar, but more immediate. While I am not a performance tester directly, I have started making strides to start getting into this world because I believe it to be valuable in my quest to be a "specializing generalist" or a "generalizing specialist" (service mark Alan pAge and Brent Jensen ;) ).



Eric Proegler in his talk earlier talked about the ability to push Performance testing earlier in the development and testing process. To continue with that idea, I was curious to get some ideas as to how to build a profile to actually run performance and load testing. can we send boatloads of request to our servers and simulate load? Sure, but will that actually be representative of anything meaningful? In this case, no. What we really want to create is a representative profile of traffic and interactions that actually approaches the real use of our site. To do that, we need to think what will actually represent our users interactions with our site.

That means that workflows should be captured, but how can we do that? One of the ways that we can do this is analysis of previous transactions with our logs, and recreating steps and procedures. Another way is to look at access or error logs to see what people want to find but can't, or see if there are requests that look to not make any sense (i.e. potential attacks on the system). The database admin, the web admin and the CDN Administrator are all good people to cultivate a relationship with to be able to discuss the needs and encourage them to become allies.

Ultimately, the goal of all of this is to be able to steer clear of the "ugly baby syndrome" and look to cast blame or avoid blame, and to do that, we really need to make it possible to be as objective as possible. with a realistic load of transactions that are representative, there's less of a chance for people to say "that test is not relevant" or "that's not a real world representation of our product".

Logs are valuable to help gauge what actually matters and what is junk, but those logs have to be filtered. there are many tools available to help make that happen, some commercial, some open source, but the goal is the same, look for payload that is relevant and real. James encourages looking gat individual requests to see who generated requests, who referred the request, what request was made, and what user agent made the request (web vs mobile, etc.). What is interesting is that we can see patterns that will show us what paths users use to get to our system and what they traverse in our site to get to that information. Looking at these traversals, we can visualize pages and page relationships, and perhaps identify where the "heat" is in our system.

---

Wow, that was an intense and very fast two full days. My thanks to everyone at STP for putting on what has been an informative and fun conference. My gratitude to all of the speakers who let me invade their sessions, type way too loudly at times (I hope I've been better today) and inject my opinions here and there.

As we discussed in Kate's session, change comes in threes. The first step is with us, and if you are here and reading this, you are looking to be the change for yourself, as I am looking to be the change in myself.

The next step is to take back what we have learned to our teams, openly if possible, by stealth if necessary, but change your organization from the ground up.

Finally, if this conference is helpful and you have done things that have proven to be effective, had success in some area, or you are in an area that you feel is under representative, take the third step and engage at the community level. Conferences need fresh voices, and the fact is experience reports of real world application and observation have immense value, and they are straightforward  talks to deliver. Consider putting your hat in the ring to speak at a future session of STP-CON, or another conference near you.

Testing needs all of us, and it will only be as good as the contributors that help build it. I look forward to the next time we can get together in this way, and see what we can build together :).


No comments: