Archive for May, 2009

Visual Studio 2008 ordering fail

Posted in c#, rants, software on May 30th, 2009 by Mark Simpson – 2 Comments

Something I’ve just noticed is that Visual Studio falls far, far short when it comes to solution ordering.

Not only does it fail in an utterly abject fashion due to ordering things in an arbitrary fashion, but it doesn’t even allow you to drag things around and order them manually.  The end result is a frustrating experience, especially when you’re trying to make your solution neat, tidy and easily browsable.

I don’t understand why this is the case, because when you move projects around they are ordered alphabetically just fine.  Save, close and then re-open the solution and your projects will lose most of their ordering.

It’s akin to raking leaves into a neat pile then sitting back to admire your handy-work, only for Steve Ballmer to crash through your garden face, scrape the rake across your shins and then kick leaves in your face while bellowing “WOOOOOOO YEAH!”.

Well not really, but still.  It’s annoying.

Success

Here’s one I just created.  It looks grand.  Everything is ordered just as it should be.

Success!

Success!

Failure

Here’s the same solution after it has been closed and re-opened.

Fail!

Fail!

The ordering is FUBAR.  There’s an issue on Microsoft’s VS product feedback page about this and it was duly ignored by the looks of things :(

How can something so simple not just, well, work?  Anyone know of any workarounds to enforce ordering?

What’s in a name?

Posted in testing, tips on May 15th, 2009 by Mark Simpson – Be the first to comment

One of the things I try to encourage is the careful selection of names.  Just as self-documenting code is easier to read, so is a self-documenting test.  As I have previously stated, unit testing is programming, too — you can apply the same good practices to tests.

On numerous occasions I’ve had to review some code and tests.  I open the classes and find well-named, self-documenting, loving crafted, carefully designed code.  Then I look at the test for that code and find that the same principles have not been applied to the tests.  In fact, it’s almost like the tests have been written by Evil Chuck, the programmer’s alter-ego.

It’s not uncommon to see something like the hugely descriptive and easy to read:

[Test]
public void TestMethodName()
{
....
}

.. the excellent

[Test]
public void TestMethodName4()
{
....
}

.. and the ubiquitous

[Test]
public void TestConstructor()
{
....
}

This is not a good state of affairs!

Problems with badly named tests

Numerous problems exist with a name like “TestSomeMethod4B”.  Here are a selection of good reasons not to name your tests like this:

Readability
It’s not descriptive.  In fact, it’s totally obscure.  It might as well not exist.  What can you say about “TestSomeMethod4B”? Nothing much.  Even if it is documented with a comment, it hinders readability in the IDE and the test runner.  It’s better to name something descriptively and without a comment than the other way around.  That’s not to say that comments and descriptions are redundant, but in most cases you don’t need them if the test is descriptively named.

Intent
It says absolutely nothing about the intent.  It might be checking an exception is thrown, it might be checking a value is clamped to an accepted range of values.  It might be doing nothing.  To be able to decipher its intent, you have to read the code.  Imagine if you couldn’t understand the intent of any part or method of a program.  How would you manage to break it up into understandable chunks?  Answer: with great difficulty.

Obfuscation
It defeats any kind of attempt to understand the state and thoroughness of the testing for that class as a whole.  You cannot obtain an overview.  You can’t determine whether you’ve tested a method with 10 different invalid parameters or whether you’ve tested 10 different simple cases.  Without good naming you are reduced to skimming the code while trying to remember too much.  Just as well named subroutines aid comprehension of a larger problem, well-named tests give a good overview without forcing you to examine the contents of those tests.

Redundant Prefix
In 99% of cases, if a public method is part of a test fixture and it has a [Test] attribute, it’s a test.  You don’t need the Test prefix.  It just hinders the skimming of the tests in alphabetic order.  If you really insist on putting “Test” somewhere, put it at the end of the name.  I personally wouldn’t bother, though.

What are you doing?
Finally and arguably most importantly, if you don’t name your test well, there is a greater chance that you don’t know what the test is trying to achieve. Would you start to write a production code method without any inkling as to what it did?  Even if you did, would you then leave it in existence with the name “DoSomeStuff”?

Badly named tests are a smell.  In my experience, well-named tests are nearly always attempting to prove something regardless of the quality of the test body.  I cannot say the same for badly named tests.

Good intent and bad execution is often better than a tidy, aimless test.  The former can be refactored into something useful, the latter requires decryption just to understand why it exists in the first place.  In many cases, the test proves nothing.

If you write a test method and can’t think of a name that describes the test before you write it, stop.  Think about what you’re trying to achieve, then choose a name that describes it adequately.

How do I choose a good name?

First and foremost, what are you trying to prove with the test?  Tests are meant to demonstrate something.  They are meant to assert that some meaningful state is set, or some sort of interaction has taken place.

You need at least three pieces of information to name a test.

  1. The thing that is being tested, such as a function, method or property (or a sequence of them)
  2. The arguments/data/circumstance involved
  3. The expected outcome.  What is meant to happen?  Be as explicit as you can.

That’s it.  That’s all you need.  I prefer to write mine in the format 1_2_3, but I’m sure everyone has their own personal style.  As long as it’s readable and consistent, I don’t care.

Compare and contrast

Here’s a few examples of some good test names for a bounding box class.

  • AddPoint_ValidPointOutsideExistingBounds_IncreasesBoundsToContainPoint()
  • AddPoint_ValidPointInsideExistingBounds_DoesNothing()
  • AddPoint_ValidPointOnEdgeOfBounds_DoesNothing()
  • AddPoint_InvalidPointContainingNaN_DoesNothing()
  • AddPoint_InvalidPointNull_ThrowsException()

Now compare those to equivalent, but badly named tests:

  • TestAddPoint()
  • TestAddPoint2()
  • TestAddPoint3()
  • TestAddPointBad()
  • TestAddPointBad2()

One set is descriptive, easily graspable and allows you to skim the members list to get a good feel for the thoroughness of the tests.  The other set of names tells us very little.

To those who say “I don’t like long method names”, I say, “It’s a test.  You write it once and read it hundreds of times.  You never have to call into it from other code.  Your screen is plenty wide enough to accomodate it.”  There is no reason to use short or bad test names ‘just because’.  I’ve yet to hear any meaningful criticism against giving test methods long names.

As the famous nerd quote goes (paraphrasing):

“When I wrote this code, only God and I knew what I was doing.  Now only God knows”.

Just think about the poor sod who has to maintain your code in a couple of years. If you didn’t know what the hell you were doing, what are they going to make of it?

Other Advantages

If I have some good ideas about how to test something, or if I’m writing my tests before the production code, I will often use the 1_2_3 naming system to write out scores of empty test bodies.  You may be surprised to see how effective this is.

You can plough through a group of methods in no time at all, thinking of all of the horrible things that could go wrong and writing them down.  In no time at all, you can have a comprehensive test suite in waiting.  The names are there, the intent is clear and you can proceed.

This is also a great tactic for division of labour when you’re testing old code.  Prototype the test bodies, check in the skeleton fixture(s) and multiple people can get cracking on different areas.  I’ve done this quite successfully in the past.

Benefits of designing for testability

Posted in patterns, testing on May 11th, 2009 by Mark Simpson – Be the first to comment

When I started my job as a Software Test Engineer, I had very little knowledge about unit testing.  I had a good degree award and a load of acronyms to put on my (in retrospect, rather horrible) CV.  I thought I knew a bit about design, encapsulation, patterns, object-oriented programming and all the rest.  With a little trepidation, I felt I was ready to face the world as a programmer.

I applied for a few programming jobs at Realtime Worlds and did not succeed.  After the first rejection, I did not become disheartened.  I did what any sensible young chap would do — I went back to the drawing board.  I continually improved my skill, learned c#, created some new programs and re-wrote old programs with the knowledge I’d gleaned.  Every so often I’d check the website or speak to my friends who worked there, asking if anything suitable was potentially coming up.  Eventually, I landed at Realtime Worlds as a Software Test Engineer.

In an ideal world, I would’ve succeeded first time.  After all, it was my goal to be a software engineer.

Picture the scene

My tremendously awesome CV plopped onto the doormat, awaiting the arrival of the hiring manager.  At 9am on the dot, that fine fellow picked it up and was instantly startled.  “Oh my!”, he cried, “Who is this remarkable young gentleman programmer who wishes to join our fine establishment?”

The hiring manager sprinted up the stairs, burst into a CV triage meeting — cheeks purple and lungs wheezing — before hurling my golden-tinted paper across the room.   “Look! Look!  We’ve found him“, he honked.  The lead programmers threw their hats in the air, linked arms and then danced a merry dance.

The search was over, and the party went on long into the night (though the party did involve programming a Spectrum emulator for the 3DO).

It would’ve been great.  Well, in many ways yes.  In other ways, no.  Hindsight being 20/20, I am actually grateful that I didn’t get my first job doing normal development.  Why?

Things wot I knew

Firstly, bear in mind the fact that I said I thought I knew about design, encapsulation, this that and the other.  I did know a little bit, but I knew precisely nothing in the grand scheme of things.  Here’s the lowdown:

My knowledge of patterns was the singleton and some others.  I did know some others, but it may as well have just been “singleton”.  Lalala, I can’t hear you.  I remember the shock when my friend linked my to the “Singleton considered stupid” article prior to getting a job.  “But they’re my best mates!”, I gasped.  “Yeah, but they’re stupid”, he replied, before jabbing me in the eye with a stick and berating me for my incompetence.

My idea of simplifying problems by breaking them into smaller systems usually involved multiple managers all interacting via singletons.

My idea of extensible software was using horribly complicated, deep inheritance hierarchies everywhere.  Yeah let’s make this base class and then…

Pretty much everything I wrote was tightly coupled.  I thought that I had abstracted things away, but in general I just moved problems around.  Nearly every class relied on multiple custom, concrete types.  I never used factories.

I relied on implementation details.  I often reached into classes several levels deep.  House->GetKitchen()->GetSink()->GetTap();  I didn’t just break the Law of Demeter, I dropped it on the ground and used its smashed remains as a (crap) bouncy castle.

I could go on.  In short, as a dumb graduate, I was interested in my craft and enjoyed programming, but I had some bad habits and didn’t understand why a lot of the things I was doing were flat-out wrong, unmaintanable and are diametrically opposed to the principle of least surprise.  However, sometimes you don’t find out about these things until you’re forced to broach a particular topic.

The reason I’m so glad to be a software test engineer is that all, and I mean all, of these coding horrors were laid bare when I started learning how to test properly.  When your code is fundamentally untestable, there is no denying it.

Sowing the seeds

As a newbie, I was tasked with writing tests for a lot of our existing codebase.  This meant I was exposed to a lot of different structures and idioms of production code written by everyone. Some of it was very easy to test; other parts not so much.

We’re a games company and unit testing is not something that has gained widespread acceptance in games.  Back then, it was no surprise that the results were variable.  I spent months writing loads of tests and it took me a long time to feel like I was doing it in a way that I was happy with.

Anyway, rewind back to when I sucked more than I suck now.  Even though I had no idea about what testability concerns should be, I quickly learned through doing.  I read articles and kept attempting to write better tests.

After a couple of weeks, I started rocking back and forth if I saw a static class.  “Oh God”, I’d cry, “What do I need to initialise then tear down this time?”

After a month, I hated the sight of any class containing scores of methods, lengthy method bodies, multiple indentations of control flow logic etc.  “Jeez, what do I need to do to hit this side of the double nested if statement that’s part of a switch which is called in a chain of 8 private methods?  If only the logic were split up into nicer chunks…”

After another month, I started to wonder why many of the tests I was writing were so slow, given that I was only interested in testing one class at a time.  “Oh man!”, I’d exclaim.  “Why do I have to create all these slow thing?  Bah, I can’t even get at the logic I want to test! If only I could instantiate only what I need and totally control the test environment…”

After another month, I reasoned that if an interface were to be provided and the dependencies could be abstracted away into ‘seams’, it made the code infinitely more testable.  “Hmm, I see why loose coupling is good…”

After another month, I wondered why some of the tests were so fragile.  When some system or other changed, the tests for an unrelated class would fail!  “Oh… that’s why singletons are frowned upon.  The dependencies are hidden!”

After another month, a workmate found Misko Hevery’s guide to testability and the penny dropped.  Like Google’s testing blog logo — it was like switching on a light bulb!

Beyond that day, I’ve kept learning more and more techniques to use as part of my development and testing arsenal.  Making my code more testable was the goal, but it has given me so much more.  Testability is a great thing, but as with all software engineering techniques, it is not a silver bullet.

The most important thing is that, with seemingly no concerted (separate) effort on my own part, the code I write to be testable is magically a lot better than the way I used to write it.

Inadvertent Benefits

As Luke Halliwell succinctly pointed out a while back, testability concerns and good design practices tend to converge.  If your code is testable, there’s a greater chance that the problem has been broken down into units of work.  Read his summary; it definitely coincides with my own experiences thus far.

Actively seeking out solutions to make code more testable as resulted in some extremely valuable lessons — it has exposed me to new techniques (such as Dependency Injection) and ways of thinking.   These didn’t just alter the way I tested, they fundamentally altered the way I approach the writing of software.  In the ~x or so years I’ve been programming, designing for testability has been my single most valuable expedition!

Am I suddenly the world’s greatest programmer?  Far from it.  I know scores of folk at work who can program circles around me.  On the other hand, is my code easier to understand, more maintainable, more cohesive and less tightly coupled compared to what I was writing a year or so ago?  Undoubtedly.  Are there fewer surprises?  You bet. Would anyone who had to maintain my code be inclined to hunt me down and murder me?  No.  They may perform some sort of grievous wounding, but I will live.

I can’t believe how much I sucked.  I definitely suck less this year, though.

L4D – Pleasure and Pain

Posted in games on May 7th, 2009 by Mark Simpson – 2 Comments

Left4Dead is ridiculously good.  I can’t quite get over how tremendously visceral, tense, hilarious and fun it is.  I’m one of those people who tries games, but only latches onto one in a big way every couple of years; I haven’t loved a game like this since TFC or BF2.  I’ve already played over 200 hours of it; I can’t stop playing it.  It’s multi-faceted; it has different ways of playing to suit my moods.

I love the camaraderie and the shared experience.  Many of my friends have bought it and can’t stop playing it. We jabber away about last night’s games like a bunch of dullards.  “Haha, that tank that smashed a car through the front door and incapped us all”.  “Oh, did you see that pounce I did off the crane that knocked the other two survivors off the edge?”.  “haha, when we were all waiting to go up the ladder and a boomer dropped down right in the middle, then we all played pass the boomer”.

It’s a riot.  There’s just one problem: we absolutely hate it.

So near and yet so, so far

The premise, art, mood, gameplay and everything about it smacks of quality.  However, there’s nothing quite so galling as a flawed genius, and L4D is utter genius and utterly flawed.  “What’s the problem?” you ask?  Well, the problem is in the multiplayer matchmaking, or the total lack of it. If you’re playing campaign mode (4 players versus the AI) it’s perfectly acceptable.  If you’re playing versus mode (the mainstay of the multiplayer), it is utterly inadequate.

The game operates using lobbies.  Someone starts a lobby and is designated the “lobby leader”, then other players join the lobby.  Once 8 players have been shepherded into the lobby, the game is started.

It all sounds grand so far.  Unfortunately, Valve’s lobby system basically pulls in a bunch of random players. Now I’m sure it’s more complicated than that (region, skill, ping and some other things possibly come into it), but for all intents and purposes, it might as well be random.  This is because if you have mates — and let’s face it, why would you play a multiplayer game without your mates? — you can join a common lobby and take on allcomers.

However, it still sounds reasonable though, right?

Wrong.  The problem is that teams of friends versus teams of randoms has one inevitable outcome.  One team is likely to rack up the points while the other team gets utterly ruined.  Due to the fact that it’s 4 v 4, if even one of the players on the other team is markedly inferior, a skilled group of players will always win.  There is no padding as there would be when playing in games with larger player numbers.  I’ve played games of BF2 in an unstoppable squad and still lost because there were too many passengers amonst our remaining 26 players (32 v 32!)  By consequence of design, this doesn’t happen in L4D.

This wouldn’t be such an issue, but there is seemingly no mechanism for skill matching, nor is there a mechanism for teams of friends to play other teams of friends.

L4D Versus Terminology

This has resulted in two terms entering the vernacular of any L4D player who has even had a cursory go at Versus mode:

Rage Quitter – Someone who leaves a game due to frustration and a build-up of hot salty tears.  A rage quit commonly occurs due to:

  1. Their team getting destroyed
  2. Someone else on their team leaving (see #1) resulting in a domino rage
  3. Incompetence on their team (see #1)
  4. Deciding that they don’t like French/Italian/English/Scottish/<insert nationality here> people all of a sudden (usually coincides with #1)
  5. Some kind of alleged cheating bulllshit that is going on with the other team (usually imagined; see #1)
  6. Other people on their team being dicks

I suppose calling all quitters “ragers” is harsh.  Let’s face it: Most of us play games to have fun.  If our team is riduclously bad or there is a huge skill chasm, it’s not fun to be dismantled for the best part of an hour.

Pub Stomper - One person who plays as part of a group of friends/acquaintances.

Again, pub stomper is an unfortunate tag because it doesn’t accurately describe why (most) people play in a group.  In my case, I don’t like playing online games alone.  I’ve got single player for lone gaming; humans are more challenging to play against and half of the fun in online games is the interactions and replayability that come out of it.

I’m not hugely interested in winning, so I don’t “stomp” pubs for that reason.  The stomping just tends to fall out of the fact that I play the game a lot.

I will happily play the game with any of my friends.  It doesn’t matter if they’re terrible or a 2 day newbie who thinks the Witch just needs a hug, it’s still fun.  However, most of the friends I play with (both real life friends and from the Internets)  are playing most nights.  If you play, say, 4 nights of the week and those same friends also play a lot, it is only natural that the heaviest gamers will stick together.

If I were to suddenly stop typing this blog post and fancy a game of L4D, it’s more than likely that 2 of my best L4D buddies would be available to play.  The fairweather L4D friends would not.  They’d be off doing something else, like running through a meadow, doing charity work or cooing at kittens.

MAD

I’m sure you can see the problem with the clash of the two groups.  The problem is exacerbated because it’s a form of mutally assured destruction.  Just as it’s no fun having your random team systematically decimated by a bunch of autonomous friend-bots who second guess your every move, neither is it fun to get 5 minutes into a game, only for the other team to all rage because it’s a total mismatch.

To give you an idea of just how bad it can be, in one campaign, my friends and I went through 50+ players on the opposite side.  The campaign lasts much less than an hour in a one sided game, and yet we still had 50+ folk who joined and left.  That’s the team filled and re-filled about 12 times over (look at the steam friends “recent games” player list if you want to see this).

I thought L4D had reached its nadir with the No Mercy 1 ragers (NM1 is like Counter-Strike’s “de_dust” in that it’s probably the only map a lot of people ever see…), but I was mistaken.  In the last week, my friends and I have started countless games on the new maps (mainly Dead Air).  We’ve completed about two of them.  In every single other instance, the whole team quits after maybe 2 rounds.  It usually goes like this:

Dead Air 1: They play infected; we make it as survivor.  They play survivor and die at the first plank.  One or two of them leave.  Maybe the spots are filled, maybe not.

Dead Air 2: They play infected; we make it as survivor.  They all leave.

This is not an exaggeration.  Sometimes they all leave after a single round. We then go back and start a new lobby and the process repeats itself.  We play the same maps time and time again, rarely even getting to play infected twice, let alone see the third map.

So, what can Valve do?

Firstly, and this should’ve been in from the start: introduce a better matchmaking system!  This should be cater to two main groups:

1. Randoms.  Some people have no L4D friends and no time to make ‘em (or they have a few friends or acquaintances, but don’t spend a lot of time playing with them).  If 4 random players can be matched up against 4 other random players and the game isn’t a total mismatch, then that’s good enough.  Whether this is done by win/loss percentage, average score per round or some combination of multiple things, I don’t care.

2. Friends.  If you have two or three mates and want to have a fair / challenging game, then this would allow you to create a lobby with four people, then look for a matching group to play against.  This is a very simple idea, but it doesn’t yet exist.  Somebody made a good post about it on the Steam forums and it received widespread support, so let’s hope it happens.

Regardless of who is playing, Valve really needs to encourage people both to stay in games and to be nice to one another.  I don’t care how good someone is; if they are abusive then it is no fun playing with them.  Case in point: Last week, a random joined my friends and me for a game.  This guy was an OK player, but he spent a good 10 minutes berating a friend of mine.  Now, the person he was spouting abuse at doesn’t play too much, but at least he doesn’t act like a moron on the Internet.  We votekicked the abuser which was a decent fix, but some other unfortunate team were no doubt landed with him shortly thereafter.

If someone is repeatedly kicked from games or constantly leaves after a single map, it’s usually a sign that they’re not worth playing with.  I’ve been friends with a few people like this in the past and their outlook is that they’re right, everyone is against them and they must gob off at every opportunity.  Instead of speaking to people in a respectful manner, they just resort to abuse.  It has become their default setting.  A lucky shot or acquired skill becomes an instant cheat accusation.  Any kind of misunderstanding turns to flames.  It’s no fun to be around and, given a choice, nobody wants to play with these people.

Valve already delists servers that offer a bad player experience so I’d be very interested to see how they’re tackling the problem.  Could they apply many of the same principles to players?  Could they delist or at least categorise players that offer a bad player experience?  I think they could.

Valve can change your access levels to games when you cheat, but what about when you’re a twat?  Will they actually do it?  It’s debatable.  I’d love to see it happen, but the system would have to be robust, else folk would just game it.

The bottom line is that whatever happens, things have got to improve because it’s getting harder and harder to have fun.  It’s taking the shine off a remarkable game.

The big block method

Posted in c#, debugging, testing on May 2nd, 2009 by Mark Simpson – Be the first to comment

Have you ever been in this situation? You have thousands of tests in scores of assemblies.  All of the tests pass.  However, when you run the test suite a second time without closing NUnit (or your test runner of choice) you find hundreds of failures occur in a specific area.  I’m not talking about in the same fixture or even the same assembly; this is NUnit wide. Something is trashing the environment, but there are no obvious warning signs.

So, we have thousands of tests — the problem could be anywhere.  The answer is obviously not “look through all the tests” or “disable one project at a time”, there has to be an easier way…

Unrelated, but applicable

This just happened to me, but tracking down the culprit wasn’t as bad as you’d think.

Something I learned as a budding level designer (circa ~1999) was how to find a leak in my level.  A leak in level design occurs when the world is not sealed from the void.  A decent analogy is to imagine the inside of the level is a submarine and the walls are the ‘hull’ — if there is a gap anywhere in the hull the water will get in; it will leak.

A leak could be something as tiny and obscure as a 1 unit gap between a wall and a floor.  Most walls are 128 to 256 units, so a 1 unit gap is very small.  Even now, it’s not really feasible to find one in the editor unless you know exactly where it is.

Half-Life’s goldsrc engine was BSP based; the visibility computations were performed at map compile time.  A failure to build VIS data meant that your level caused the game to run at about 3 frames per second.

Unfortunately, tools were really … rudimentary back then.  Picture Borat’s wife ploughing a field and then contrast the image with that of a bloody huge combine harvester.  That’s how far we’ve come.  These days pretty much every editor has built in pointfile loading (meaning it will take you directly to the leak!) but back then, you had to be creative.

The big block method

Back when tools weren’t so great, to find leaks in a level, I used the big block method.  It’s a very simple technique.  Say we have a rubbish, leaky level like so (top down view):

A level, yesterday.

A leaky level, yesterday.

If one of those connections between walls/floors/ceilings/whatever is not tight, it will leak.  We cannot see the site of the leak using our eyes.  We cannot be sure where the leak is by simply scrutinising each wall joint or entity.  What we can do instead, though, is place a big block over ~50% of the level.

50% of the map is now covered

The red area is a newly created solid block

If we compile and find that the leak has disappeared, we know that the leak was definitely in the area that is now covered by the block.  On the other hand, if the leak is still present, it’s in the other 50% of the level that remains uncovered.  To hone in on the problem area, all we have to do is recursively add blocks to the problem area:

Recursively adding blocks half the size of the previous block...

A smaller block has been added

We then recompile and check to see if the leak has disappeared as before.  Notice that in two steps, we’ve narrowed down the problem’s location to an area of 25% of the original size!  The next step will yield a further 12.5% reduction.  We quickly hone in on the problem.

Same thing, different discipline

Applying the same principle to finding problem tests or code is simple!  Divide and conquer.

Open the NUnit test project file and remove 50% of the projects (though in my case, I kept the assembly with the tests that failed, as I needed to see them fail on the second run to know the problem had occurred).  Run the tests twice to see if the failure occurs on repeated runs.  If they fail, you know your problem is in that group of assemblies.  If they pass, you know the problem is in the other half.

It’s then a case of whittling it down in the same way — disable a further 25% of your assemblies, run the tests twice and check the result.  Rinse, repeat.

Eventually you will (most likely!) be down to two assemblies — the assembly that exposes the problem and the problem itself.  If there’s a large amount of tests and fixtures in the assembly you’re scrutinising, disable half of the fixtures and repeat the process.  You will rapidly converge on a fixture and, finally, the test that causes the problem.  From then on it, it’s just standard debugging.

In my case, the culprit ended up being a single line of code calling into a method that has been a long-standing part of our code base.  It looks totally innocuous, and there is absolutely no way I’d have found so quickly without dividing and conquering.

From the top level, with scores of assemblies and thousands of tests, it may as well be a 1 unit gap.