Tuesday, October 15, 2019

Testing AI and Bias - a #PNSQC2019 Live Blog

Wow, have we really gotten to this point already? We're down to the last formal talk, the last Keynote. These conferences seem to go faster and faster every time I come out to play.

I've had the chance to hear Jason Arbon talk a number of times at a variety of conferences over the past several years and Jason likes to tackle a variety of topics with ML and AI. Thus, I'm definitely interested to see where Jason was going to go with the topic of AI and Bias. This is a wild new area of testing and as many of you all know I am fond of Carina Zona's talk "Consequences of an Insightful Algorithm".

OK, we understand bias is there but what can we actually do about it? Well, here's the spoiler. You can't remove bias from AI. That is by design. AI takes training data and based on the training system it learns. Literally at the start of the process bias enters in. The point is not to eliminate bias, it is to make sure that undesirable bias is not present or is minimized.

Think of a search engine. How can a machine look at a number of source articles and then, based on a query, decide what might be the most important information? We start with an initialized system. Think of it as a "fresh brain" and not in the zombie sense ;). From there, we then go to a training system, which is information already graded and scored by a human or group of humans, so that the system can train on that data and those values. Can you guess where the bias has crept in? Yep, it's there with the training set. If the system guesses wrong, it gets negative reinforcement. If it gets it right, it gets positive reinforcement (from a machine's sense of reward, I guess).

There are other factors at play such as commerce (ads will get preference or at least money will). Crawlers start on "popular" sites and then look at less linked sites. It will also be biased, by language, education and very likely dominant gender and race. There is also the fact that Microsoft Bing is set by default for Windows. For those people who don't know how to change their search engine, you end up with a "lowest-common-denominator" of users that match up with some weird demographics. According to Jason, Microsoft Bing's core user group is single women in the midwest (he said it, not me :p ). Indicative of that is that much of Microsoft Bing's largest search volume comes from that demographic. What might that indicate what is "feeding the neural network"?

There is also Temporal Bias or "Drift". Over time, the searches can be effective by many things such as weather, politics, astrology, etc. Sample Size can also affect the way that data is represented. One way to check this is to keep changing the sample size until the scores stop changing. Not a guarantee but at least it gives a better feeling that more people are being represented.

There is also a bias in the literal training data. In most cases, the people who provide seach data training sets are for people paid $20/hr or less. In short, the people feeding are neural networks are not those of us who are designing them. We can debate if that is a good or bad thing or if software engineers would be any better.

There's even a Cleaning Bias. Bing cleaned out mis-spellings, random numbers and letters, etc.and the irony was that Google didn't do that and thus Google can even help people find ht they are looking for even if they misspell it.

What happens when there is no "right" answer? Right answer meaning there is one answer for a particular word and there are multiple possible different options for the same word?

As Jason has said a few times, "are you all scared yet?!" Truthfully, I'm not scared but I am a lot more cynical. Frankly, I don't consider that a bad outcome. I think it helps to have a healthy skepticism of these types of systems. We as testers should remain relevant a bit longer if we continue to be ;).

No comments: