TESTHEAD: October 2025

Wednesday, October 15, 2025

Breaking the Cycle: Leading Organizations That Thrive Amid Constant Change with Allan Rennebo Jepsen (a PNSQC Live BLog)

Well, that was fast! We have reached the end of the conference with our final keynote. There has been a lot to absorb and learn over the past few days, and I've enjoyed my time here. With that, here's the closing talk for PNSQC 2025.

Allan Rennebo Jepsen is a leadership expert and author based in Copenhagen, Denmark, who has spent over two decades helping organizations adapt to change rather than fighting it. Allan opened up with an example of an online banking initiative that went horribly wrong, where everything that could go wrong did, to the point where the CEO at a corporate meeting said that a large majority of customers of the bank would not be around a year from now. Meaning their business model was cratering and they risked going out of business. Change is needed, but how in the world are they going to do it? Large banks are optimized for compliance, and because of that, they are literally designed to preserve the status quo.

PNSQChronicles: Leading Organizations That Thrive Amid Constant Change with Allan Rennebo Jepsen

So what ends up happening? “We are trapped in a loop of meetings, updates, and alignment sessions, with initiatives and plans made, but somehow, progress keeps slipping away.”

This ongoing “cycle of coordination” eats up energy and time while creating an illusion of progress. Wouldn't it be nice to be able to break the cycle and replace it with focused, adaptive, and empowered teams with goals and efforts that can actually move the organization forward?

As Allen points out, the one true constant is change, and change is not slowing down. New technologies, shifting customer expectations, and global competition mean that businesses must evolve faster than ever before. “Success today requires a new paradigm, one that values bold leadership over small tweaks.”

Many organizations instead fall into reactive patterns and mindsets. They organize based on what the business thinks is important, rather than looking at what the actual customers need. Every new change triggers a cascade of alignment meetings, but before that alignment solidifies, another change hits. The result is paralysis through coordination.

What will it take to break out of this vicious cycle? Genuine adaptability, that's what. Organizations that show that they can learn, adjust, and act decisively in uncertain conditions are the ones that will ultimately thrive.

What do we need to do to actually make change happen?

Strategic Prioritization

We need to focus our energy where it matters most. Not every initiative deserves attention. Leaders need to be ruthless in prioritizing efforts that drive real impact and cut through the noise of constant change.

Winning Together

Too many departments chase local wins at the expense of the organization’s overall success. Instead, focus on organizational collaboration and allow for genuine long-term thinking. Optimizing one team while degrading the system as a whole is counterproductive.

Thriving Amid Change

When teams are empowered to make decisions and take action without waiting for endless approvals, they can develop true adaptability. With those experiences, the team can get better and address challenges related to change. Success likelihood grows when teams have both the authority and the accountability to act quickly. Resilience and adaptability are learned competencies, not traits reserved for a few innovative companies. If you want to be resilient and adaptive, you need to practice and work towards being resilient and adaptive.

Allan encourages creating a “low-risk, high-impact” framework for building organizations that are genuinely adaptable. Continuous learning loops, clear ownership, and a culture of shared goals help with and reinforce the opportunities to effectively adapt to change.

As Allen points out, “These organizations didn’t get there by accident. They created the conditions where adaptability could flourish.”

Artificial intelligence, decentralized decision-making, and global competition are reshaping how organizations operate. Instead of fearing disruption, leaders should view it as a constant condition to master. Focus on adaptability as the ultimate organizational competency.

“You can’t predict the next disruption, but you can prepare your organization to handle whatever comes next.”

Key Takeaways

- Prioritize what truly matters. Avoid being spread thin across too many initiatives.

- Empower your teams. Give them the clarity and authority to act.

- Collaborate across silos. Success comes from shared goals, not isolated wins.

- Embrace continuous learning. Every challenge is a chance to adapt and improve.

If there's any one message or idea to bring home today, it is that change doesn’t have to be chaotic. Leaders who build adaptable, empowered teams will be the ones who turn constant change into a competitive advantage.

Want to know more? Allan’s book “Impactful Organizations – And How to Become One!" expands on many of the ideas shared today.

RAG to the Rescue: Reimagining Enterprise Unit Test Management with AI (a PNSQC live blog)

There is an entire paper with a lot of great information behind this session, so rather than try to recap all of that, I actively encourage all to Read the paper (PDF). Additionally, this paper is a joint effort with Gaurav Rathor, Ajay Chandrakant Bhosle, and Nikhil Y Joshi. With that out of the way...

RAG is a practical framework that blends Retrieval Augmented Generation into the Model Context Protocol to help generate and manage unit tests in large enterprises. Test creation, governance, auditability, and measurable business impact are key components considered by this approach. Again, there's a lot to dissect here, so check out the paper specifically for details.

Unit testing in big organizations is challenging, no question. Legacy code, multiple frameworks, technical debt, uncertain code coverage, and long fix times are all issues that may of us are familiar with. The authors suggest using AI to assist with test generation as well las test management.

Neat. So how would that work?

The proposal consists of four major parts that work together as a microservice.

AI Agents interpret intent, plan steps, use memory, and invoke tools. They orchestrate the workflow rather than directly hitting data stores.
Model Context Protocol acts as a secure, typed gateway between agents and external systems. Think of it as a translator and auditor that controls what context reaches the model.
RAG Knowledge Layer retrieves code, prior tests, specs, domain docs, and compliance rules from a vector database to ground prompts in the organization’s reality.
LLM Generator builds unit tests from that curated context, returning deployable test classes to clients.

Given a request to create tests for a class, the agent pulls the file and metadata, queries the vector database by package and class, and retrieves related classes, utility code, mocks, example tests, and notes. That bundle becomes a structured prompt, delivered to the LLM through MCP. The output is a test class that aligns with domain logic and existing patterns.

The pilot for this system was aimed towards a specific and critical micocroservice. The team accomplished automated unit test generation, which created many quick coverage wins, and also helped track time saved and defect trends discovered.

According to Gaurav, ROI modeling compared manual test creation to be about four hours per case. With this new RAG driven approach, they were able to cut this time down to about one hour, including review.

Again, there is a lot to take in from these examples and I encourage checking out the paper (linked above) for the specifics.

The Ethics of Quality with Gwen Iarussi (a PNSQC Live Blog)

I'm back. I spent the morning moderating the Management, Leadership & People track, so I wasn't able to do that and live blog like I normaly do but let's suffice it to say that Sophia McKeever (Quality With Hearts Aligned - Bolstering Your Quality with Emotional Intelligence), Heather Wilcox (Is the 'Iron Triangle' Dead?) and Kristine O'Connor (Quality Intelligence: Embedding Customer Voice to Drive Agile Excellence) all did great jobs and if you get a chance to see their talks, do so :).

For the first session this afternoon, I get to see long time friend (and actual hirer of my consulting services during my stint in the wilderness) Gwen Iarussi discuss The Ethics of Quality

PNSQChronicles: The Ethics of Quality & Quality as a Living System with Gwen Iarussi

A great quote to start out this conversation is, "Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should." (Dr. Ian Malcolm, Jurassic Park). You may well remember this line and it is just as valid a warning about unchecked innovation. It's not enough to ensure a product "works". Additionally, we have a continued need to advocate for not just that something works, but that it works appropriately. We don't have the luxury of testing blindly and ignoring the possible impacts of our work and efforts.

I remember back in the days of my Cisco Systems work life that the NetFlow protocol tracking software was developed. I was on the team that was ultimately codifying the "call record" or any and all internet data. I remember part of me was a little leery about what we were doing but I recall being told by one of the developers that, "if we don't develop this with our approach and methodology and ethics, then our competition will create it, and we will have no control over the ethics that they choose (or do not choose) to use."We can argue whether or not that was a good idea (we do of course have end to end tracking of all network data but we also have the ability to encrypt that data as well, so yin/yang, I guess ;) ).

My point with that last example may be seen as trivial today but part of me genuinely wondered if our efforts were being done in the appropriate place. At that time, I was young and unsure of my voice, so I just rolled with it. Today me would be less likely to do that, I'd want to ask a number of questions. Granted, I may not have been able to have modified or altered any of the outcomes, but I certainly would have been more direct and focused with my interactions and questions. I would have advocated for positions I felt we should represent.

Our tech environment today is getting even more fragmented and over the top. Speed of execution is taking precedence over the quality and ethics of what we are creating. We have all sorts of areas where the haves are taking advantage of the have nots through these systems, and it's very much by design. Let's not forget the fact that this age of AI and fast, ubiquitous tech is consuming land, fuel and water resources at an alarming rate. Much of the tech advances and buildout to host and serve it is not sustainable in the long term. At least not if we want to make sure that people have access to water, fuel, and food a few years from now.

Much of this comes down to accountability. How do we develop accountability and stand for what we want to see happen in our environments? How do we make our ethics more than just platitudes? By us building a culture of accountability, we are able to foster a culture of ethics and making considerations that go beyond merely speed and efficiency.

Ultimately, this all comes down to a quote that is attributed to Sitting Bull, "Let us put our minds together and see what life we can make for our children". Ultimately, every good or bad decision we make may have an effect on us, but they will surely and certainly effect future generations. With this isn mind, I would absolutely say it would be in our best interests, including business interests, to be able to leave to our children a world and systems that have effective safeguards to prevent our actively destroying our environment, whatever and wherever those may be.

Monsters & Magicians: Testing the Illusions of Generative AI with Ben Simo (a PNSQC Live Blog)

Day 2 of the main program is underway (Day 3 includes the workshop presentations). It's crazy to see/feel how quickly this event goes by. As always, I've had a great time at this event and enjoyed my interactions with everyone. It's especially neat to realize how many people I know in this space, and when keynote speakers are literal friends, such as today with Ben Simo.

Machines to do things beyond our physical or mental powers have existed for thousands of years. We can go back to the Antikytheria Mechanism for what may quite possibly be the world's first "artificial intelligence," depending on how you want to interpret that term. Over time, as we have come to grips with and developed an understanding of the rules, laws, and repeatability of activities, what was magical once upon a time has become commonplace in our everyday use.

PNSQChronicles: Brief Interview of "Monsters and Magicians" with Ben Simo on YouTube

We now see Large Language Models and predictive text generation as the current amazeballs part of our reality. Many people are excited about these technologies, but at the same time, there are many risks surfacing, with reports of organizations suffering actual harm or damage because of using AI tools. We have heard of apps that have jacked up rates arbitrarily, published legal documents with no basis in law, fact, or reality, and taken models of "virtual people" who learn from interactions and the biases and inputs trained these models to be incredibly racist and hateful.

These situations point to an interesting set of questions: how specifically can we as testers benefit from this wild new world of seemingly random query and response systems? How do we test software that produces inexplicable fuzzy outputs? At the end of the day, software deals with patterns and algorithms. We have technologies such as machine learning, clustering, and ways that data can be grouped and sorted. If we give LLM's a closed data set of information and ask it to work with just that information, it does a remarkably good job of transforming or "creating" work and assets. The key here is that we have given it a known and bounded set of information. Because it is bounded, it is working with a known set of information and can be guided specifically as to what to do with it. As we open it to the outer world and give it fewer controls or restrictions, we open the model up to having to look at vague clusters of data with potentially dubious provenience. A great example of this that I saw in practice during one of the workshops was with the idea of creating spec documents in markdown that would reside at the base of your document tree. By making the spec document the oracle of choice, and giving the instructions that the spec document was the arbiter of what the model was to do, we limit the chance of hallucinations and odd reactions considerably. Not completely, but we make it much easier to track what the model is doing.

An example I had fun with recently was when I heard through a podcast a story of the son of Hephaistos who went to Olympus during the waning days of the Greek pantheon's influence and took a remnant of the fire from Olympus (in an homage to the myth of Prometheus). However, something about this seemed "off", as in it was being presented as an ancient myth, but it clearly wasn't. Could we identify where the original story came from? Through various prompts, reviewing the text transcript of the story, and other clues, the LLM determined that, indeed, it was a modern story being told in the manner of an ancient Greek myth, and even noted that the delivery of the story in meter and timing mimicked closely the delivery of Hesiod. I was not able to determine who wrote the story or where to find it on the internet, but the details it did provide were interesting. Many of them felt fanciful, but all of them felt plausible. That's the danger with LLM output. Unless we are diligent, the very plausibility of the output could be accepted easily as though it were fact. Closer inspection found numerous areas that were not accurate (referencing older myths that I was aware of and had history with, but attributing individuals and characters that didn't belong there). People who may not have this knowledge or familiarity might accept what's being presented as fact because it flows so naturally and just "feels right and authoritative".

In these cases, the Boolean PASS vs FAIL rationale doesn't work. It's not that the tests pass or fail, but that large elements do pass, but there are outputs that are "off". It's not a total fail, but it's also "corrupted" in a way that we cannot simply rely on the output. Additionally, we can run tests multiple times and get slightly different outputs with the exact same information being presented. In my own world of testing AI, we use a variety of benchmarks and monitors that help us determine if the models are behaving in ways that we expect them to. We have a variety of tests and models that allow us to determine things like Performance drift, comprehensive analysis, bias drift and disparity, the currency of the data, and comparing for homoscedasticity (a $10 word that means looking at the variance of the errors in a model and determining if it is constant/consistent across all of our observations).

A neat tool that we have is the "AI Risk Repository" which helps us identify risks and domains of use of models where the various risks can be found. By looking at the areas that are potential risks, we can better be informed or consider aspects we can apply to our testing efforts.

Ultimately, one of the key takeaways from this talk is the idea that we are at peril of being beguiled by the magic surrounding us. We want to be responsible with our use of AI, and thus, we need to test and consider how best to apply what we learn and spread that knowledge amongst our colleagues. Magic is often sleight of hand, and it's important that we understand and can understand how that sleight of hand is performed.

Tuesday, October 14, 2025

Dance Bot Showcase at PNSQC: Where Robots Move to the Beat of Quality (a PNSQC Live Blog)

This is an interesting and unusual activity... unless you have been coming to PNSQC for at least the past 15 years. The various LEGO League entries for talks and demonstrations are nothing new. A robot dance competition? Okay, that's different ;).

This year, PNSQC is hosting the first-ever Dance Bot Showcase—a celebration of creativity, code, and community!

We are starting out with a Q&A with the various team captains and learning about the challenges they faced putting on this event, and determining the criteria necessary to make this event happen.

PNSQC 2025 Robot Dance Off

Teams featured are Gear In Motion, The Beaver Bots, and Mechanical Mages (placing 3rd, 2nd, and 1st, respectively).

The various robotics teams described a number of challenges and issues they faced while preparing for this event or for other events that they have participated in. This particular demonstration is a literal "robot dance off", meaning they need to program and choreograph robots to actually dance in sync with music. These teams demonstrate technical skill as well as showcasing creativity and artistic flair. It’s more than just a performance—it's a hands-on demonstration of how software quality, testing, and intelligent systems intersect in the real world.

Each of the youth teams engages with key software concepts like adaptability, precision, and resilience through robotics. With a mix of live performances, a student-led fireside chat on software and testing challenges related to robot programming and choreography, the Showcase connects generations through shared curiosity and learning. Oh, but that's not all. Judges will develop end-user quality criteria to determine how to evaluate and judge the participants and their robot entries. Just as we would evaluate web application software quality from the end-user perspective, we will see a variety of aspects within the dance bot competition to highlight various quality characteristics.

In addition to the three local teams, we have three international teams (from the Dominican Republic, the Democratic Republic of Congo, and Mauritius) that have also contributed presentations of their dancing robots (also included in the video footage).

Have I piqued your interest? I hope so, because as soon as I can get the footage up, we will showcase the dancing robots on our YouTube channel as well as here.

Web for All: Enhancing Quality Intelligence with Automated Accessibility Testing using Playwright and axe-core with Rodrigo Silva Ferreria

Sometimes what we expect to happen doesn't. I was looking forward to Rodrigo Silva Ferreria speaking about Accessibility testing and Accessibility Automation. However, due to technical difficulties and him not being able to connect to the Zoom session (he was presenting remotely), I was asked to pinch hit at the last minute, which I did (I shared my most recent version of "Senses Working Overtime" that I presented at CAST back in August). However, I was all prepared to do a session on Rodrigo's talk, and I read his paper in anticipation for it, so rather than talk about the presentation I gave, I'm going to talk about the presentation Rodrigo would have given, using his paper as the basis.

One of the recurring themes at PNSQC is that quality is holistic. It’s not just about performance or reliability, but about inclusion and usability for everyone. Rodrigo Silva Ferreira’s paper “Web for All: Enhancing Quality Intelligence with Automated Accessibility Testing using Playwright and axe-core” emphasizes this by showing how teams can bake accessibility (a11y) directly into their test automation workflows.

Rodrigo starts by reminding us that accessibility isn’t optional, it’s a moral, legal, and usability imperative. Yet too many teams still treat it as an afterthought. Manual audits take time, training is scarce, and accessibility issues often get discovered long after release. By then, the fixes are costly, the users are frustrated, and sometimes lawsuitas are incoming.

Manual a11y testing will always have a role, but it doesn’t scale. Rodrigo’s approach uses Playwright in combination with axe-core from Deque Systems. Together, they create an efficient and repeatable testing loop that fits naturally into CI/CD pipelines.

The result: accessibility checks happen early and continuously, not just during audits.

The Tools: Playwright + axe-core

Playwright handles the browser automation (navigating pages, interacting with UI elements, capturing screenshots, etc.) while axe-core brings the accessibility intelligence, scanning for WCAG violations, tagging them by severity, and suggesting fixes.

By taking the @axe-core/playwright package, we can inject axe-core into the current browser context. After a page or component is loaded, it performs an automated scan and outputs results as machine-readable data (JSON/HTML). The flow is go to the page in question, inject the call for axe-core, save the output as an object, and then that will alow you to print out the results as either an HTML or JSON output. What you do with the object contents is entirely up to you.

Setting It Up

The paper walks through the steps of integrating these tools into a test suite:

Install @axe-core/playwright alongside Playwright
Configure the environment (Node.js, CI/CD, etc.)
Add a simple accessibility test — e.g., scan a page and report any violations
Include tests in GitHub Actions or another CI system so they run automatically on each PR or nightly build

Rodrigo emphasizes keeping it low-maintenance. The goal is to empower QA and dev teams to start small and expand coverage organically.

After implementing the approach, Rodrigo’s team observed:

Broader coverage of A11Y across key pages and user flows
Earlier defect detection (shift-left in action)
Improved team awareness and ownership of accessibility
Fewer regressions and less rework later in the SDLC

Automation didn’t replace manual audits, it amplified them by catching the easy stuff early, so specialists could focus on deeper, context-specific checks.

Key Takeaways

By integrating A11Y automation into standard test pipelines, teams reinforce the idea that accessibility isn’t “someone else’s problem.” It becomes part of the fabric of software quality.

Rodrigo closes with a challenge: start small. Pick one key flow, one component, one page, and add automated accessibility checks. From there, grow. Over time, this builds a sustainable, inclusive testing strategy that benefits every user.

If you want to try out what Rodrigo suggests, here's what you need to get started:

Playwright – Microsoft’s end-to-end testing framework
axe-core – Deque Systems’ open-source accessibility engine
WCAG Guidelines – W3C Web Content Accessibility Guidelines

Again, happy to talk to these topics but I had wanted to hear Rodrigo's presentation directly. Hopefully for some, this will be at least a substitute for that.

Being a Change Agent with April K Mills (Welcome to PNSQC Live Blog Stream)

Hello all, and welcome to PNSQC, coming at you from Portland State University Place Hotel and Conference Center. Some of you may be hearing concerning things coming from Portland and having been here for two days now in real time, it is absolutely quiet and normal here (though I must confess that the media I have seen of protests with people in frog and chicken suits, I love how absolutely unserious Portland is. We literally started the conference with "The Beat Goes On" marching band (I will upload and link media when I get the chance).

Everyone is a Change Agent

Our first talk/keynote today is coming from April K. Mills, and her talk is "Everyone is a Change Agent". I had a chance to discuss this wth April in our PNSQChronicles vidcast (will link here later), but the key point here is that change agents don't have to be high-power, in-charge people. Every one of us has the capability, and I'd argue the responsibility, to be the change agents our organizations need. To borrow from an idea I agree with a great deal, "everyone needs to stand for something, even if doing so means we have to stand alone".

One of the ideas that April is recommending we start right now to do and be a change agent is to stand against and fight the spread of "AI Slop." How many of us have found ourselves literally struggling with and drowning in AI-generated content in our organizations that seems to say a bunch but doesn't really mean anything? I have run into this a fair bit in social media, but it is absolutely creeping into publications and corporate marketing missives. I saw a person online coin a phrase I have now started using: "AI;DR".

Note: I use this when I am seeing obvious AI output that has been generated, reads as completely soulless, and you can see that much of it is not factual or is genuinely wrong in various areas. In these cases, I do believe that announcing "AI;DR" is an excellent response.

YouTube: PNSQChronicles episode with April K. Mills

One of the things I have always appreciated (and I'm quoting Matt Heusser is the wording here) when I come to conferences like PNSQC is that I am not looking to upend the entire world with everything I learned. First of all, it's not practical, and second, systems are developed over time, and there may well be entrenched interests to overcome. Thus, it's not practical to come back from a conference with everything under the sun to overturn everything we do. Instead, we need to find a few areas that we can implement and do so without needing permission to do those things. All of us can do that. We find a way to add a new approach, process, or method based on what we learned, and then, in a few weeks or months, when people are curious as to how we have made key improvements, we can then say/demonstrate what we have been doing. There may be varying barriers to exactly how much we can do in certain areas, but we can all do something in a different way.

When we can implement an under-the-radar change, it's often a good idea to try it out quietly and see if the change is worth pursuing. Often, we can determine pretty quickly if our brilliant idea may not be so brilliant (or at least we discover the key contexts that help explain why we haven't tried doing this before). Make no mistake, that in and of itself is often a valuable exercise.

Sometimes the best question we can ask is "Why?" There is an idea called the "Power Paradox" that comes from the idea that people can only do so much because they don't have the power to make changes. We hear that in politics and senior management all the time. If only we could get the people at the top to agree with us and make the changes necessary. Truth be told, we don't always need to do that, and sometimes, the best way to break down that false assumption of power is to ask why we do something (strategically, don't become the kid in the back seat asking, "Are we there yet?" every five minutes). By asking "Why?" at strategic times, we may not sway the people at the top, but we may get more of us at lower levels to also ask why, and if more of us start asking why, that often triggers those higher up to realize that this is an issue that is not serving the people it intends to. Sometimes it takes more than just asking why, but strategic action often starts from just that question.

Key Takeaway: Being a change agent doesn't have to be a major initiative, and we don't have to reinvent the wheel to be a change agent. We don't need to have the right boss, with the right budget, with the right revenue, at the right time to make changes. Often, we can make changes just by trying to do some different things, and often, we can do them without having to ask permission to do so. Start there first and work your way up :).

Pages

Wednesday, October 15, 2025

Tuesday, October 14, 2025

The Tools: Playwright + axe-core

Setting It Up