Notes From a Participant: Using Tests to Fix a Customer-Reported Problem

John Welty, a participant of this course, sent me these notes and asked me to share them as part of the course, so we offer them to you as a “bonus track”.

Experience Report

I wrote unit tests to explore a problem reported by a customer. I had had the impulse to reproduce the problem on a machine we had on the floor, but I couldn’t do that before someone shipped that machine to the customer. This situation gave me a good opportunity to build confidence through unit tests before remotely updating the customer’s machine.

To help visualize the data, I made up a CAD drawing to show the various states of the segments being generated based on the requested cutouts. This is an inline plasma cutting table that has limited travel, so I break the cuts into segments and move the part to index from segment to segment.

I don’t know exactly what this means, but I infer that there already exists some external motivation to break the work into smaller pieces. —jbrains

This helped me to think through exactly what my expected outputs were for a given input. As I then wrote and ran the tests, I saw quickly that the function I was focused on was not the source of the problem. While exploring that code with tests, I did see a branch that would likely never be hit but that would cause an infinite loop. I wrote a test to prove that. I then removed that branch and verified all the existing tests still passed. (I’d written a few previous tests around this part of my code.)

That felt good to see how the act of testing helped me see things I had missed originally. I noticed a mistake that I attributed to confusion about naming: I had thought I needed to decrement something that I didn’t; on the contrary, I was decrementing the loop index, which I was incrementing as part of the loop, causing an infinite loop!

As I went on looking for the function causing the customer problem, I isolated one smaller function from its client, reducing its inputs and making it easier to test. This helped a lot. I found the mistake and added a new function to make it easier to fix the same mistake in multiple locations.

This wasn’t TDD but this is legacy code and these tests are now something I can run on demand. These are in my machine control project. As of now these are in a testing configuration, so running these requires selecting one of those test configurations at a time.

As I want to TDD all new code anyway, needing to do something special to run these legacy code tests may not be a problem. I’ll definitely want to combine my two test configurations at the very least so that I can run all the tests at once. That might help focus me on the task.

My code under test in this project is written in Structured Text and my tests are in C. The test framework is awkward, but I’m finding it easier to work with as I gain experience with it. I need to deal with the naming problems, but this code is pretty brittle, so I want more tests in place first.

I still have a tremendous amount of code in programs which can’t be tested as units. I need to go with Golden Master and Sampling (accompanying article) to get these under test for now so that I can refactor more safely. I need to find a better way to log inputs and outputs from the production code as it is in order to generate useful Golden Masters. For that I need to clarify the inputs and outputs, which is messy, but necessary. Most of these are using large global data structures even when they only care about a limited portion of the data.

I’ve thought of wrapping the programs into function blocks to let the compiler tell me what I need but I fear breaking things through this process. Logging and building a test framework feels safer for now.

Comments

I notice a few recurring themes from John’s experience.

Focusing on a smaller part of the system at once makes it easier to spot obvious problems. This explains why I let myself be biased towards smaller, more-isolated tests.
Often I can get enough confidence from smaller tests, even when I have the impulse to run end-to-end tests to verify that I see the same behavior that a customer sees. Yes, the smaller tests might make simplifying assumptions that are violated in the larger system, but most of the problems I fix come down to finding a silly mistake in basic logic that I can reproduce and fix with focused tests. Often it comes down to two bits of code that seem reasonable, but interfere with each other, and then become separated, like John’s incrementing and decrementing code.
Isolating behavior to make it easier to test benefits the programmer, even if the newly-isolated code isn’t an obvious candidate for reuse. We tend to understand code better when we isolate it more from its context.
Naming problems often lead to logic problems, especially when two programmers (maybe you and six-months-later-you) understand the same name differently and write code that embodies those differences in understanding.
The act of trying to write tests, even end-to-end Golden Master tests, provides a natural motivation for clarifying the boundaries of some part of the system: precisely knowing the inputs, the expected outputs, and the relevant state of both the inside world and the outside world. We need to know all these things in order to change the system safely, but we programmers somehow tolerate continually reverse-engineering this knowledge, rather than recording it in tests or documentation.
Code that depends on large amounts of its context (such as large, global data structures) feels unsafe to change, which slows us down, even when we think we understand the parts that need to change. The cost from managing the risk of accidentally breaking something unrelated dominates the cost of fixing the problem.

What do you think about any of this? Does it trigger any similar experiences for you? Does it trigger any doubts? Do you have any suggestions? We invite you to continue the discussion below in the comments.

Complete and Continue