Thursday, March 13, 2014

Retro Book Review: The Manga Guide to Statistics

This is yet another in my series of “books I’ve had from NoStarch that I need to finish reviewing before I feel I can ask them to send me more titles”. I’m saying this because NoStarch has been incredibly generous to me over the years, and has provided me with so many titles I sometimes feel I’ll never get through all of them, at least not to a level that is deserving of a proper and thorough review. On the bright side, I am getting close to clearing my queue, and soon, I will be able to ask again if they can send me some more titles :).

For those of us who do software testing for a living, we know that we have a number of artifacts that come from our testing. One of those classes of artifacts is data. Tons and tons of data. How do we make sense of it all? What is worth looking at? Why is it worth looking at? What decisions can we make if we compile, analyze and distill the data we receive? More to the point, how do we analyze the data so that we can distill it? That’s where Statistics comes in handy.

I’ll be blunt. I took one statistics class when I was in college. I hated it. In fact, I never finished it. Please understand when I say “I have an aversion to statistics as something I have to actually do”, I am not kidding. As a software tester, that puts me in a bit of a bind. If I can’t make some sense of the data I receive, I can’t do as effective a job. At best, I need to farm that work out to someone else on my team who can do the statistical analysis, meaning I need to get their take and explanation to make decisions. That causes delays. Overall, it would be better to just suck it up and learn a bit about statistics. It’s a core piece of domain knowledge any good software tester should possess, if not immediately, then at some point in their career.

There are lots of ways to learn about Statistics, and frankly, most of them are a bit painful. College courses, text books, online videos, etc. can help, but they are often slow, or assume that you have some background in the ideas already. What to do when you want to get the gist of the idea before you tackle the hairier details? That’s where “The Manga Guide to Statistics” comes in handy.

A caveat: this should absolutely not be your only guide to learning statistics. If that’s what you are looking for, then this book will not deliver on that promise. It is, however a good primer to get you started, and help you look at statistics in a way that’s fun and engaging, especially if Manga tropes appeal to you.

To set the stage, or protagonist, Rui, has a chat with the dreamy co-worker of her father (Mr. Igarashi) about understanding statistics. As Rui expresses interest to her dad about learning Statistics, he agrees to get her help. Rui creates a fantasy of being tutored by Mr. Igarashi, only to have her hopes dashed when an employee of her father, Mamoru Yamamoto (read, drawn to not be dreamy), comes to teach her about statistics. Hilarity ensues. There, that’s the Manga trope, and yes, “kawaii” abounds.

If you are familiar with Manga, you know I’ve already spoken volumes about what to expect ;). For those not familiar with Manga, the treatment of the topics are generally amusing, usually at the expense of dignity of either our protagonist or our long suffering tutor, but the light hearted humor is meant to help us relate to the material better. In between the story line, a number of key statistical analysis ideas and concepts are discussed, in a way that makes them accessible and quite a bit less scary than what normally appears in text books. Also, as in the other "Manga Guide To" books, the material is presented in a way that covers a lot of ground. It’s made accessible, but it’s not “watered down” or made to be trivial. The examples actually require the reader to understand some underlying mathematics concepts. If you’ve gotten through at least Intermediate Algebra, most of the math will be easy to follow.

Chapter One focuses on understanding data types, and how we can more readily put terms like Categorical (Qualitative) Data and Numerical (Quantitative) data into aspects that more easy to understand (using a High School Slice-of-Life Manga Drama as the basis for the comparisons). By looking at examples like reader questionnaires, we get to the idea of what these data types are (categorical Data cannot be directly measured, while Numerical Data can be). It also shows how categorical data can be given a point value and treated as numerical data.

Chapter Two gets more into numerical data and discusses some key statistics concepts, such as looking at Frequency Distributions and Histograms (conveniently described by looking for the the best ramen shop in the city, by varying definitions of “best”, and by comparing a team’s bowling scores). By looking at data points and other criteria, and examining how that criteria can be condensed into a table of values, Rui and Maoru show us how we can calculate Mean(or average), the Median (actual mid point of samples) and the standard deviation (the “fudge factor” of what’s been collected).

Chapter Three goes into categorical data. By its nature, categorical data, or qualitative data, cannot be boiled down to a number as is, but there are ways that certain aspects of qualitative data can be categorized and that categorization can be made into quantitative (numeric) data and calculated. Using a cross tabulation, some numerical analysis can be performed, and therefore qualitative data can be measured, albeit imprecisely.

Chapter Four goes into the ideas of Standard Score and Deviation Score, or how to look at a specific data point and see how it relates to the rest of your data, or how to examine data points in a variety of ranges or with different unites of measurement.

Chapter Five talks about Probability, and the ways in which we can predict an outcome based on the data on hand (more correctly, make an educated guess as to the outcome, which is what Probability is meant to do). Data can be plotted on a graph, and that graph can be converted into a curve with enough data points. That curve (standard distribution) can be moved based on the mean and standard deviation. Using a number of different models (Normal distribution, Standard normal distribution, Chi-square distribution, t distribution and F distribution)  we can make the curve “move”. By taking into account the way that the curve moves, we can calculate a ratio, or probability, which in turn can allow us to make a variety of predictions.

Chapter Six looks at comparing the relationship, or correlation, between two variables. By charting variable values on a scatter plot, we can eyeball the values and see if we have a positive or negative correlation, or if there is little to no correlation. If we sense there is a correlation, we can use a variety of indexes (spelled out here as Correlation Coefficient for numerical-numerical data, Correlation Ratio for numeric-categorical data and Cramer’s Coefficient for categorical-categorical data) to determine the overall strength or weakness of that correlation. This chapter also points out that these indexes are “fuzzy”, but they are better than nothing.

Chapter Seven examines Hypothesis tests, which are used to help clarify, or understand, if a hypothesis made by examining sample data is correct. We can test for independence of variables, if our tests are looking at variables in critical regions, and gives us examples as to how to perform a statistical analysis of those tests, and a variety of tests to see if variables are independent, homogeneous, and the degree in which they are either, both, or neither. Lather, rinse, repeat.

The book closes with a Appendix that describes how to use Excel and set up the examples explained in the book, and how to get to the functions and create the formulas necessary to do the measurements described in the previous chapters. This is a wonderful addition, and it gives even neophytes to statistics ways to play with the data and see how they can analyze the data, their results, and practice the hypothesis tests or determine probability of future events/actions.

Bottom Line:

Statistics can be fun, if you plot the story right. If following the antics of Rui and Mamoru sounds like a good time to you, and if gaining a fundamental understanding of some key statistics concepts is your end goal, then this is a nice format in which to learn those fundamental ideas. Note, I said “fundamental ideas”. Do not think that this would be an appropriate guide to say “OK, great, now I know all the statistics I need to know”. Granted, you may learn enough about statistics to be useful, and it may give you additional insights, but this is not an in depth study. Having said all that, for those who want to get into the nitty-gritty stuff, the Appendix about setting up tables and examples using statistical functions and formulas is worth the purchase price alone.

On the Manga story front… does our intrepid heroine Rui master the art of Statistics? Will her unrequited love for Mr. Igarashi remain as such? Will Mamoru be able to replace the spot in Rui’s heart where she hold an affection for Mr. Igarashi? Even if he does, is such a relationship just a little bit creepy? Ahh yes, all of this, and more shall be answered. For those who read manga, well, you probably already know the answers to all of those questions... but it’s still a fun read. For those curious as to whether or not a Manga can teach you a thing or three about statistics, the answer is “yes”, but you’ll need to look elsewhere to build on what’s covered here. As to my target market (i.e. my fellow software testers), if statistics is not your strong suit, this makes for a very practical introduction, and plenty of takeaways to make you just a bit more dangerous at work, and I mean that in the best possible way.

No comments: