The fundamentals of unit testing : Tests shouldn’t ape the production code

Aping the logic of the class under test is an bad idea when writing tests, but it’s an easy mistake to make.

Aping… Bueller?

"Aping" simply means imitating/repeating the logic used in the class under test. In its simplest form, the problem can be illustrated like so:

Production code:

public class Calculator
{
        public int Add(int a, int b)
        {
                return a + b; // actual logic of the class under test
        }
}

A test for the above class:

[Test]
public void adding_two_numbers_returns_the_sum()
{
        // Arrange
        var calculator = CreateCalculator();

        // Act
        int result = calculator.Add(1, 2);

        // Assert
        Assert.That(result, Is.EqualTo(1 + 2)); // ape alert, ooh ooh ooh! >:O
}

Although this doesn’t look like the end of the world, we can see that the test method line "Assert.That(result, Is.EqualTo(1 + 2))" is directly replicating the logic in the Add method.  This is “aping”.

Why is this a bad thing?

This is bad for a few reasons. Firstly, if the class under test is bugged and the test apes its logic to get an expected answer, we’ve just reproduced the bug in our test code and erroneously passed the test. So, instead of having working production code, we have two bits of bugged code and a false sense of security.

Secondly, when writing tests, if state can be tested, it should be done using as declarative a style as possible — we don’t want to say how we get to the answer, just what it should be.

I don’t want to see the test code literally add 1 and 2 to get 3. Instead, I want to see 3 in the test.  Here’s a fixed-up version of the test that does not ape the production code:

[Test]
public void adding_two_numbers_returns_the_sum()
{
        // Arrange
        var calculator = CreateCalculator();

        // Act
        int result = calculator.Add(1, 2);

        // Assert
        Assert.That(result, Is.EqualTo(3));   
}

You should always scan your test code looking for this anti-pattern.  The example I’ve cited is very simple, but I’ve seen a lot of real world code that has succumbed to aping the production code resulting in bugs going undiscovered.

Validating your answers

You may be thinking, “OK genius, so you skipped adding 1 and 2 in the test code, but you must’ve known to follow the addition algorithm in your head to be able to come up with the answer of 3.  How do you know your brain didn’t get it wrong and the test is bugged?” 

It’s a good question and the truth is, it’s often the case that it’s not clear-cut.  However, there are steps you can take to validate your answers (or at least be a little more sure of them).

  • Don’t just assume you’ve got it right first time.  Double check it for a few minutes more.
  • Write out some notes on paper (I sometimes do the testing version of truth tables).
  • Maybe you have a specification for this particular part.  Use it to check the answer  (stop laughing at the back, there! – it occasionally does happen; e.g. conventions for a maths library).
  • Use an alternative method to come up with the same answer (it’s often the case that the same value can be calculated via multiple algorithms / schemes).
  • Use an existing implementation to corroborate your findings.  E.g. if you’re writing a test to make sure that JSON is validated correctly, then check that JSONLint.com agrees with your test results.
  • If you’re not sure how to corroborate an exact answer, then check any invariants you’re aware of as a smoke test.  If any of the invariants don’t hold, your answer is probably wrong.

The fundamentals of unit testing : Use factory methods!

Tests are just like normal code in the sense that repeating yourself can cause problems.  Practically every single call to new in a test method is a maintenance risk.

A Simple Illustration of the Problem

Say we have 30 tests, each directly invoking constructors like the following example:

[Test]
public void when_adding_a_valid_order_it_is_appended_to_audit_trail()
{
        var orderService = new OrderService(); // hello Mr. Constructor

        … // rest of test
}

It doesn’t look so bad but, if in 3 months, OrderService is changed to use constructor injection and its interface is changed to include a new “Start” method that must be called prior to certain methods being usable, we have a problem. Now we have 30 broken tests that must be fixed individually.  I once had to clean up hundreds of tests broken by a single constructor call.  It wasn’t fun.  Or productive.

A Solution

I say “a solution” as there are other ways to tackle this problem, such as using test framework setup methods / test data builders etc.  However, this is the easiest way to get started and is the approach I tend to reach for in the majority of cases.  I’ll blog about why I tend to eschew framework setup methods later.

We can insulate our tests from breaking changes by moving object creation to simple private factory methods. It is then often a case of updating the factory method once.

[Test]
public void when_adding_a_valid_order_it_is_appended_to_audit_trail()
{
        // Arrange
        var orderService = CreateOrderService();

        … // rest of test
}

private OrderService CreateOrderService()
{
        var orderService = new OrderService(new StubLogger(), new MessageGateway());
        orderService.Start();

        return orderService;
}

Benefits

This has tangible benefits. Firstly, it only takes maybe 5 seconds of work (extract method using ReSharper) to create the factory method, but it can save literally hours of work down the line.

Secondly, the tests and the creation of objects are separated to improve maintainability.

Thirdly, it often makes the test code easier to read. If the factory method were inlined in the test code, it adds a load of noise. All the test needs is a default configuration of the order service. We only worry about the details if it’s relevant. Most test fixtures only need a handful of factory methods; for those that require more varied configurations, check out the Test Data Builder and Object Mother patterns (though be wary of over-using the Object Mother pattern as not only does it have a stupid name, but it’s inflexible and has other problems) . 

An alternative to this approach is to use a parameterised factory method and/or optional parameters.

In terms of cost/benefit trade-offs, creating one or more factory methods is a simple, easy technique.  Now when constructors change, you don’t spend half an hour fixing up constructor calls.

The fundamentals of unit testing : KISS

KISS, (Keep it Simple, Stupid) is a guiding principle in software engineering and it’s of vital importance when writing tests.  Tests should be straightforward and contain as little logic as possible. 

I know from painful experience that trying to do something overly ‘clever’ in a test is a recipe for disaster.

Control Flow / Extra Logic in Tests

The moment you start changing the flow of a test based on some condition, you are setting yourself up for a fall.  Think about it:  we test a particular unit of production code because its logic may be flawed.  If the test that validates the production code tends towards the same level of complexity, there is a distinct and real risk that the test code will also contain errors.

Here’s a real example of a bad test that contains error-prone logic (please ignore the potential floating point comparison issues… I’ll touch on that later):

[Test]
public void IdentityTest()
{
        var m = Mtx4f.Identity;

        for(int r = 0; r < 4; ++r) // potential error; looping
        {
                for(int c = 0; c < 4; ++c) // potential error; looping
                {
                        if(r != c) // potential error; logical test / branching
                            Assert.AreEqual(m[r, c], 0.0f);
                        else  // potential error; logical test / branching
                        {
                                Assert.AreEqual(m[r, c], 1.0f);
                        }
                }
        }
}

Read more »

The fundamentals of unit testing : Repeatable

The main purpose of automated testing is to highlight problems with correctness in production code.  When a test fails, it means, “something is wrong here; we have a bug in our production code!”   I’m sorry about your canary.  It died a valiant death.

Tests Should be Repeatable

… so what do I mean when I say that tests should be repeatable?  By repeatable, I mean each test can be executed an arbitrary number of times without observing a change in the test results.  If a developer can run a unit test 100 times in a row and get two or more distinct results without altering any production/test code, the test is not repeatable. 

Why Should I Care?

If a unit test cannot yield the same result with every single test run, then it is non-deterministic and untrustworthy.

An unreliable, flaky test is analogous to a car alarm that has been going off for two days. There might be something wrong, but nobody will pay any attention to it. 

Most people who’ve worked with large test suites will have experienced a particular situation from time to time – the case where a handful of the tests intermittently fail.  Developers quickly recognise the problematic test name(s) and exclaim, “these tests sometimes fail for no reason”.  The next step is to click the “rebuild” button on the continuous integration server.  Without changing a thing, the tests pass.  Hooray! If you joined in with the cheering, then please punch yourself in the face (it was a ruse).

Read more »

The fundamentals of unit testing : Atomic

Every single unit test you write should be atomic. 

A definition

Atomic tests are order-independent, relying on and causing no side effects.  They neither rely on nor alter shared state; they do not mutate their environment. If a test run involves 2, 10, 100 or 1,000,000 tests, the tests should be able to be executed in any order.  If one legitimately fails because a bug has been introduced in the production code, it shouldn’t have a domino effect.

For example, if we have 3 tests named A, B and C, we should be able to run these tests in any order.  For a tiny set of 3 tests, there are already 6 ways of arranging the order of execution: ABC, ACB, BAC, BCA, CAB, CBA.  If the order ABC is fine, but the order BAC causes test C to fail, then one of the tests is non-atomic. 

Put short, a test should be built up and torn down cleanly, leaving no trace of the fact that it has been executed. 

Why is this important?

A good automated test has a descriptive name, is implemented using straightforward code and leads us to the cause of the failure without much hassle.  When dealing with well written atomic tests, at the very worst, diagnosing the cause of a failure should be a simple debugging job.

However, if non-atomic tests create knock-on effects it introduces a significant maintenance risk, as to diagnose failures, we must understand the whole picture.  Instead of just inspecting the local state that exists prior to executing the failing test, we have to start delving into environmental differences (machine configuration, file systems) and AppDomain issues (typically global / static state).

Read more »

The fundamentals of unit testing: Spring The Trap

This is a lesson I learned from the Pragmatic Unit Testing book.  Five years later and it’s still a tool that I frequently reach for – it’s simple and incredibly useful.

Who tests the tests?

I’ve had a few long-winded and borderline philosophical discussions with colleagues in the past where the question is asked: “Who tests the tests?”  In essence, how can we trust the tests to tell us that the production code is working correctly? Who’s to say the test isn’t bugged? If a test is bugged and erroneously passes, we have the worst of both worlds.  We have a piece of (potentially) broken functionality and we have a test that is offering false assurances.

Is this likely?

At first glance, it sounds a bit fanciful – I mean, what are the chances of writing a bad test? The truth is: It’s not a matter of if, it’s a matter of when. Everyone slips up now and again. Test code is code. Programmers habitually write incorrect code.

In my experience, it’s especially easy to write an incorrect test when you’re pre-occupied by something else (such as understanding & fixing a complicated bug) and aren’t really in the mood to question yourself before you check in and head home for a deserved nap.

I’ll put it more pointedly: if the tests are written after the production code, how do you know that the test is correct? Think about it, if your workflow follows this pattern: “create method that does X, write test for method to make sure X is correct, the test passes”, how can you have confidence that the production code and its tests work? The test has never failed and, as a result, you are on the happy path.

A common cause is copy-pasting an existing test and forgetting to change it to do something else by varying an argument or value. It happens. You now have two identical tests – one is misleadingly named and both pass.

Reducing Errors

There’s two things you can do to increase the chance of a test being correct.

Read more »

The fundamentals of unit testing : Correct

Once you’ve decided what you’re going to test, chosen a good name and started to write the code, the next step is to make sure that the test is purposeful and correct.  This is a simple tip and largely common sense.

Does the test code’s functionality match its name?

It is simple to accidentally write a test that purports to test one thing, but actually tests another. E.g.

[Test]
public void model_is_loaded_correctly_from_stream()
{
        // Arrange
        var resourceLoader = CreateResourceLoader();
        var modelDataStream = CreateModelDataStream();

        // Act
        var loadedResource = resourceLoader.LoadFromStream(modelDataStream); // fine so far

        // Assert
        Assert.That(resourceLoader.LoadedResources, Is.EqualTo(1)); // buh?!
}

In this test, we can see that it claims to test that loading a model from a stream returns a valid resource. However, the test is then asserting that the loaded resources count has increased. It is not checking the returned resource for correctness.  Either the test name is out of date, or it is not testing the correct thing. 

If it is the former case, we should rename the test to “loading a model increments the loaded resources count”.  If it is the latter case, we should change the test such that we are asserting on the model’s state.  How do you know the model has been loaded correctly if you’re not inspecting its vertices, indices and UVs?

If we are misled by our test names, it is irritating because it looks like something is covered, but it’s not.  If we’re asserting on the wrong stuff, then our test won’t save us when the unit of code malfunctions.  Maybe it’s already bugged.

Is the Assert meaningful?

All tests should assert against some meaningful result. Put simply, if a test does not prove much, then it is not worth writing or maintaining.

Here is an example of a test that looks useful but, on closer inspection, is doesn’t really prove much at all.

[Test]
public void game_properties_are_correctly_serialized()
{
        // Arrange
        var currentLevel = 4;
        var playerName = "Steve";
        var game = CreateGame(playerName, currentLevel);               
        var serializer = CreateGamePersistence();
       
        using(Stream stream = new MemoryStream())
        {
                // Act
                serializer.Save(game, stream);
        }
        
        Assert.That(true, Is.True) // WHAT?!?!?
}

I’ve seen the “Assert true == true” thing numerous times.  If the assert is missing or tells us nothing useful, the chances are the test isn’t going to prove much, so why are some developers compelled to do it? My guess is that people think, “hmm, I’ve written a test with no asserts, there we go”.  If you ever get tempted to do this, don’t!  It’s a sign that something is wrong.

Note: it’s sometimes useful to have methods like this to act as surrogate Main() methods so we can invoke them via a test runner for easy debugging but, in general, it’s not a terribly good idea and you wouldn’t want to run these tests as a part of your normal development workflow.

The fundamentals of unit testing : Narrow & Focused

When it comes to writing unit tests, it pays to be specific.  

If you have a narrow & focused test, it will…

  • be easy to make a descriptive name for the test
  • perform the bare minimum of work required to achieve its goal. 
  • be easy to understand
  • be easy to maintain 
  • execute in a shorter period of time

An unfocused example

To give you an idea of what an unfocused test looks like, consider the following (and yes, I’ve seen plenty of these, both in professional [RTW] & open source code):

[Test]
public void StackTestA()
{
        var stack = new Stack<int>();
        Assert.That(stack.Count, Is.EqualTo(0));

        stack.Push(1);
        Assert.That(stack.Count, Is.EqualTo(1)); 

        stack.Push(2);
        Assert.That(stack.Count, Is.EqualTo(2)); 

        var p = stack.Peek();
        Assert.That(p, Is.EqualTo(2);
       
        stack.Push(3);
        Assert.That(stack.Count, Is.EqualTo(3)); 
       
        stack.Pop();
        Assert.That(stack.Count, Is.EqualTo(2)); 
       
        bool c0 = stack.Contains(1);
        Assert.That(c0, Is.True);
       
        bool c1 = stack.Contains(10000);
        Assert.That(c1, Is.False);
       
        stack.Push(3);
        Assert.That(stack.Count, Is.EqualTo(3)); 

        int popped = stack.Pop();
        Assert.That(popped, Is.EqualTo(3));
        Assert.That(stack.Count, Is.EqualTo(2));
}

Can you spot the problems? 

Read more »

The fundamentals of unit testing : It’s a skill

The first thing I’m writing about is probably the most important.  This is a bit of a meandering tale, but it is crucial to understanding the pitfalls of automated testing.

A Cautionary Tale

Back in the day at Realtime Worlds (the best part of 5 years ago), when it came to automated testing we were trying to pull ourselves up by our bootstraps.  Most people (including myself) didn’t have much of a clue about automated testing.  It was a pretty bold move for a games company to try it on a large scale, but it took a very long time for it to pay any benefits.  In fact, given the lackadaisical approach and lack of buy-in, it probably had an overall negative effect in the majority of areas.

In my first job fresh out of university, I had the nominal title of Junior Test Engineer.  At first, it was our job to try and write extra tests that developers missed, debug failing tests and generally take care of the build.  I soon learned to be very cynical.

As you can imagine, it wasn’t very effective (hint: the larger the distance between a person’s actions and the consequences of said actions, the less of an interest they will take).  I would characterise it as a baptism of fire.  Thanks to spending all day, every day, on automated testing, I acquired a large amount of knowledge by trial and error and learning with my colleagues.  That knowledge was eventually ploughed back into the team in a supporting role.

Read more »

The fundamentals of unit testing (series)

I’ve not written about testing for a while, but I still enjoy writing automated tests (I am boring) and trying to coach people on good testing practices (I am boring), so hopefully this series will be a useful refresher.

I plan to write roughly ~15 posts on the fundamentals of good unit testing.  By good, I mean useful, trustworthy and maintainable.  These tips are for new through to intermediate developers; they’re not designed to catch the attention of seasoned automated testers, but there may be the odd nugget that’s useful for older hands.

Here’s the list of things I’m going to cover (I’ll update the links as I add ‘em):

Overview:

General principles to strive for:

General testing tips: