Thursday, March 7, 2019

Big Log - #30DaysOfTesting Testability Challenge

Anyone who has read my blog for a little while knows that I tend to fit song titles into my blog posts because it's just silly fun I like to do. Do you know how hard it is to find blog titles related to logs ;)? Robert Plant to the rescue (LOL!).

OK, seriously, I'm getting caught up with the 30 Days of Testability Challenge and here's the second checkpoint.

Perform some testing on your application, then open your applications log files. Can you find the actions you performed in the logs?

I hate to tattle on my application but it's not so much that I can't find what I need to in the log files, it's that there are so many log files and situational log files that it's a process to figure out exactly what is being looked at. I'm mentioning this because we need to keep a clear understanding of what we mean when we say "your application". Do we mean the actual application I work with? Do we mean the extended suite of applications that plug into ours? I mention this because, for each of the components that make up our application, there is a log file or, in some instances, several log files to examine.

We have a large log file that is meant to cover most of our interactions but even then, there are so many things that fly past that it can be a challenge to figure out exactly what is being represented. Additionally, there are logs for a number of aspects of our application and they are kept in separate files, such as:

- installation and upgrades
- authentication
- component operations
- third-party plug-ins
- mail daemons
- web server logs
- search engine logs

and so on.

To this end, I have found that using screen, tmux or byobu (take your pick) and splitting one of my windows up into multiple fragments allows me to have a clear look at a variety of log files at the same time so that I can see what is actually happening at any given time. Some logs fly by so fast that I have to look at individual timestamps to see dozens of entries corresponding to a single second, while other logs get updated very infrequently, usually when an error has occurred.

To that end, I'm a little torn as to my preference. Having monster log files to parse through can be a real pain. However, having to keep track of a dozen log files to make sense of the big picture is also challenging. Putting together an aggregator function so that I can query all of the files at the same time and look for what is happening can be a plus but only if they use a similar format (which, unfortunately, isn't always the case).

Based on just this cursory look, what could I suggest to my team about log files and testability?

If we have multiple log files, it would be a plus to have them all be formatted in a similar way:

 - log_name: timestamp: alert_level: module: message

repeated for each log file.

Having an option to gather all log files into an archive and have them archived each day 9or whatever time option makes the most sense).

Make it possible to bring these elements together into the same file and be parsed so as to determine what is happening and if we are generating errors, warnings or providing info messages that can help us to determine what is going on.

Finally, if at all possible, try to make the messages put into the log files as human-readable as we can.

No comments: