Deciding When To Isolate Behaviors From Each Other

Question

I have another question concerning The World’s Best Introduction to TDD: Level 1, specifically the episodes “Tension in Abstraction” and “Stepping Towards the Boundary”.

When you write the test severalBarcodesInterspersedWithEmptyLines(), you explain that the test is checking two separate behaviors.

Namely, firing the correct “barcode scanned” event for each command combined with skipping empty lines. And then I have to think about trimming whitespace from the commands as they arrive! —jbrains

In the video (from time code 8:30, approximately) you implement zero() and one() tests. [zero: If there are no lines of input, then don’t fire any “barcode scanned” events. one: If there is one single line of valid input (implicit assumption!), then fire that one event.] So far, so good. When you implement the several() test—processing several lines of text—you notice the tension that these tests actually cover two separate behaviors: processing command input and interpreting the command. In the end of the video you explain this, and you even split the process() method into two separate methods, but you don’t reflect it in the next test severalBarcodesInterpsersedWithEmptyLines() (in the next video), but only explain that for now it is enough.

In that video, I extract “interpret command” from “process all incoming text”, in order to emphasize and document what my intuition has already sensed. I chose not to go too far down that road, because of YAGNI, because it was a bit of a distraction, because I would probably go down that road relatively soon in antoher video, but all the same, I wanted to document the idea in the code base. For this reason, I kept the extract method named “interpret command”, because that name communicates the idea well enough to me. I also wanted to show the audience one way of handling the situation where I have an idea of what to do, but I don’t do it immediately. Programmers are pretty bad at that in general. —jbrains

Would it be acceptable to write that test severalBarcodesInterpsersedWithEmptyLines() using mocks? I have this sketch in mind:

def severalBarcodesInterspersedWithEmptyLines():
     ...
     interpreter = TextCommandInterpreter(baracodeScannedListener)

     interpreter.processTextInput(IOReader("::barcode 1\n::\n\n::barcode
2\n::\n\n"))

     assert interpreter.interpretTextCommand was called 4 times with
["::barcode 1::", "", "::barcode 2::", ""]

This differs from my (little) integrated test in one significant way: I set expectations on the “barcode scanned listener” (the interface that the Controller implements to connect to the command-line UI framework), whereas this version sets the corresponding expectations on the command interpreter, isolating it from the Controller layer. Notice that this test establishes part of the contract between the command interpreter and its clients: clients are permitted to ask the interpreter to interpret empty commands/clients are justified in expecting the interpreter to handle empty commands. —jbrains

I don’t know why, but I feel more comfortable with my test, where we don’t care what the other method (interpretCommand()) is doing. I don’t know how complicated it is to implement it in Java, but in Python it is quite straightforward and I would certainly feel temptation to do it this way, as soon as I would detect the other behavior.

The next step would be to test interpretTextCommand() with different commands.

What do you think about this?

Discussion

Well, this is easy.

The Point

I like the test that you wrote. I didn’t feel the need to isolate those specific behaviors right away, but I wouldn’t resist doing it. If we were pairing, I’d almost certainly just do it now, since it would take less than 2 minutes. I didn’t do it in the video mostly for teaching reasons.

The Details

When I had less experience and confidence in my refactoring skill, I tended to isolate behaviors earlier. I did this primarily to practise the steps of the refactoring, to see the effects on the tests, and generally to uncover what else I could learn from doing it. As I gained more experience and confidence, I tended to isolate behaviors later, preferring instead to use the “Three Strikes” principle. As I felt more comfortable refactoring, I also felt more comfortable delaying specific changes until I had more evidence. I isolated behavior more often to develop the skill of doing it, and then I reverted to the principle of generalizing just in time, rather than too early. You might consider a similar strategy: isolate that behavior now, because you want to practise it, but possibly preferring to isolate it later, because you want to avoid premature generalization. It depends how confident you feel that the isolated design “is” better.

Intuition Speaks: A Framework Appears!

Indeed, I have the feeling that the design will go in the direction you describe, and now we’re mostly haggling about when to let it go there. At this stage in the system, I could already see an overall data flow through the emerging command-line user interface framework.

Multiline text flows into the system.
Chop the text into lines.
Sanitize the lines, at least by trimming needless whitespace. (What other kinds of domain-neutral sanitizing can we do? Anything?)
Reject lines that don’t conform to the syntax requirements of a command. (There might not be any, in general.)
Interpret each line as a command, sending output somewhere.
Stop when the multiline text stream ends.

This sounds to me like a general description of a command-line text-based interface. I can already see it. Now that I can see it, I want to push the design in that direction. In the past, I would fight my own impulse to push the design in a specific direction, but as I gain experience, I allow myself to do this. Along the way, I remain aware of signs that the code is resisting my intention. If you feel unsure, then you should continue to let your fundamental rules guide your design decisions.

Don’t Delay Features

I will probably refactor in the direction you describe, but I don’t want to delay delivering features in order to refactor. In the past, I would over-invest in refactoring because I didn’t trust myself to refactor enough. Once I developed strong refactoring habits, I could safely revert to the principle of delivering the feature as soon as I could without cutting corners the way I had done in the past. The system treats every valid command as “barcode scanned”, so as long as this remains true, I don’t need to isolate “interpret command” from “process one line of text”. Since I see this and I trust my refactoring skills, I can leave the design as it is until it comes time to add a feature that requires a second command. (Level 2 of the course!)

What “Should” You Do?

So now the magic question: what should you do? Separate the behaviors now? Do it later? How do you know?

As usual, the answer is “it depends”.

If you want to focus on learning the effects of separating the behaviors, then separate them, write the smaller tests, and learn something about the connexion between the size of a “unit” and the nature of the tests. You accept the risk of delaying delivery of the feature, and maybe that’s OK in your situation. It only matters to me that you make this decision consciously.

If you want to focus on detecting the signals that encourage you to refactor, then leave the behaviors tangled together, write more tests, and look for duplication and bad names. You accept the risk of spending more time refactoring later, but you deliver the feature sooner.

When someone is paying me for features, I prefer not to delay delivery in order to refactor, particularly if I’m considering a refactoring in the style of clean up before moving on. (I’m mostly done, but I’ve made a small mess, and I don’t want to leave it like this for the next feature.) Instead, I deliver first, then refactor, as long as deploying new versions costs relatively little and generally goes smoothly. I trust myself actually to refactor, so I don’t worry about delaying refactoring. If you don’t trust yourself (or your team) to refactor after delivery, then you might refactor before delivery in order to develop the habit of cleaning up before moving on. Once you have developed the habit of cleaning up before moving on, then you could start delivering sooner, secure in the knowledge that you really will clean up before moving on.

So I don’t know what you should do. Is it acceptable to do as you suggested? Maybe. “Acceptable” to whom? You don’t need my permission, but you might need the agreement of the other people in your project community. Maybe a 10-minute delay in delivering the feature hurts nothing, because pushing to production is the bottleneck, anyway. Maybe the other programmers don’t understand your clever design and need some convincing. Maybe the other programmers need to see someone take the initiative to refactor more aggressively so that they feel comfortable deciding to invest time in refactoring more. Maybe you just want to practise and see what happens! These are all good reasons to decide one way or the other.

Simply make the decision consciously.

Another Design Idea

I noticed another design opportunity here. It relates to the equivalence between firing events and returning values, and it would remove test doubles entirely from the tests. I’m not endorsing one option over the other, but merely pointing out the equivalence, and inviting you to think about it. (Yes: another exercise to the reader. Discuss it in the comments, if you like.)

References

The World’s Best Introduction to TDD: Level 1. An introductory training course for test-driven development and evolutionary design.

Various, “You Aren’t Gonna Need It”. An early discussion about delaying design decisions (usually generalizations) until a feature explicitly requires them.

Complete and Continue