Category Archives: debugging

Functional-style programming – Easy debugging!

Ever since LINQ arrived in C# land, a lot of people have been using a functional style of programming without even perhaps being conscious of it. 

I always chuckle when I read the words “functional style”, because I picture grim marches across a barren landscape, brutalist architecture and people with beards. 

One particular facet of programming in a functional style is where, rather than mutating an existing piece of data, input data is transformed / projected into something new via a pure function.  Since this style of programming limits the number of side-effects, it makes debugging a comparatively wonderful experience. 

Read more »

The wonders of Debugger.Launch()

Ever worked on a project that involved spawning new .NET processes? (as in, one [arbitrary] program launches another .NET executable)  I’ve had to do this on quite a few occasions over the years and the one thing that always saves my bacon when it comes to understanding and fixing bugs is Debugger.Launch().

A common scenario is as follows:

  • Program / Script A is launched
  • Program / Script A makes a call to launch .NET Program B (e.g. via Process.Start())
  • .NET Program B throws an exception and it’s not immediately clear why.
  • The world ends. 

If you’re the author of program B, simply insert a call to Debugger.Launch() inside Main().  The program will halt execution and prompt you to attach the debugger.  You can then examine the conditions and fix the bug. 

Also, another thing to try is to create a handler that wraps the program’s invocation in a try/catch block, complete with an #if wrapped call to Debugger.Launch().  This allows you to connect to the crashing process without requiring lots of boilerplate code.

I wouldn’t recommend using this for production code in case you forget to remove it (add conditional #if guards and whatnot), but it’s nice to have in the toolbag nonetheless.

E.g.:

public static class AttachableRunner
{   
    public static void RunWithDebugger(Action _action)
    {
        try
        {
            _action();
        }
        catch(Exception e)
        {
            Debugger.Launch();
            throw;
        }
    }
}

This does just fine for debugging most problems. 

Note: It won’t attach properly if your program causes an unhandled exception to be thrown on a a thread other than the main thread. I’ll leave that as an exercise for the reader (*cough* AppDomain.UnhandledException can help, but won’t give you as much useful information beyond standard callstack info).

Unity3d – Profiling calls to Destroy

Destroy All … Things

I’ve recently been investigating some performance problems in our Unity3d app.  In one specific instance (running full screen on a rubbish laptop), there were numerous large performance spikes caused by the nebulous-sounding “Destroy”.  After prodding a little bit, Destroy is related to calling GameObject.Destroy().

Unfortunately, Unity3d’s profiler won’t give you any information related to what types of objects are being destroyed, and wrapping your GameObject.Destroy calls in profiler sections doesn’t help, as Unity3d defers the work for later in the frame (so the methods return nigh-on immediately).  As such, you get told “Destroy is taking x ms this frame”, and that’s about it.

Finding the culprits with the profiler

I managed to work around this limitation by (temporarily) changing all GameObject.Destroy() method calls to GameObject.DestroyImmediate() then wrapping the calls in Profiler.BeginSample() / Profiler.EndSample() pairs.

Note: If you access your unity instances in the same frame after calling destroy, this probably won’t work for you.

It was then possible to see which resources were doing the damage on the laptop.  All of our resource cleanup code was in one place, so it was trivial to do.

The temporarily instrumented code ended up looking something like this, and immediately let us know the culprits.  Note this code is just a simplified mockup, but it should give you the gist of the idea:

// centralised resource cleanup makes profiling simple
private void CleanupResources<TResource>()
{
    Profiler.BeginSample("Destroy: " + typeof(TResource).Name);
    IEnumerable<TResource> resources = FindResourceOfType(typeof(TResource));
    foreach(var resource in resources)
    {
        resource.Dispose();
    }
    Profiler.EndSample();   
}

//… and each Resource type inherits from a common base class, implementing IDisposable.
public abstract class Resource : IDisposable
{
    protected abstract void CleanupUnityResources();
   
    public void Dispose()
    {
        CleanupUnityResources();
    }
}

public class SomeResource : Resource
{
    private Mesh m_unityMesh; // gets set when resource is locked in
   
    protected override void CleanupUnityResources()
    {
        // GameObject.Destroy(m_unityMesh);
        GameObject.DestroyImmediately(m_unityMesh);
    }
}

Unity3d–Threadpool Exceptions

A quickie, but something to be very wary of.  I’ve been using Unity3d of late (I recommend it – it’s a very opinionated and sometimes quirky bit of software, but it generally works well) and I was recently tasked to parallelise some CPU intensive work. 

I decided, quite reasonably, to use the built-in ThreadPool rather than doing my own explicit management of threads, mainly because the work we’re parallelising is sporadic in nature, and it’s easier to use the ThreadPool as a quick first implementation.  So far, so good.  Everything was going swimmingly, and it appeared to work as advertised.  In fact, the main implementation took less than a day.

Most .NET developers who are familiar with doing threading with the ThreadPool will know that, post .NET 1.1, if an unhandled exception occurs on a ThreadPool thread, the default behaviour of the CLR runtime is to kill the application.  This makes total sense, as you can no longer guarantee a program’s consistency once an unhandled exception occurs. 

To cut a long story short, I spent about three days debugging a very subtle bug with a 1 in 20 reproduction rate (oh threading, how I love thee).  Some methods were running and never returning a result, yet no exceptions were reported. 

Eventually I reached a point where I’d covered nearly everything in my code and was staring at a Select Is Broken situation (in that Unity3d had to be doing something weird).

Unity3d silently eats ThreadPool exceptions!  I proved this by immediately throwing an exception in my worker method and looking at the Unity3d editor for any warnings of exceptions – none was reported (at the very least, they should be sending these exceptions to the editor). 

I then wrapped my worker code in a try catch and, when I finally got the 1 in 20 case to occur, sure enough, there was an exception that was killing my application’s consistency.  So yes, I did have a threading bug in my application code, but Unity3d’s cheerful gobbling of exceptions meant the issue was hidden from me.

I’ve bugged the issue, so hopefully it’ll get fixed in future.

Note: To the people who say “you should be handling those exceptions in the first place”, I would say, “not during development”. When developing, I generally want my programs to die horribly when something goes wrong, and that is the expected behaviour for the .NET ThreadPool.  Unhandled exceptions make it clear that there’s a problem and it means the problem must be fixed.

The big block method (binary search)

Have you ever been in this situation? You have thousands of tests in scores of assemblies.  All of the tests pass.  However, when you run the test suite a second time without closing NUnit (or your test runner of choice) you find hundreds of failures occur in a specific area.  I’m not talking about in the same fixture or even the same assembly; this is NUnit wide. Something is trashing the environment, but there are no obvious warning signs.

So, we have thousands of tests — the problem could be anywhere.  The answer is obviously not “look through all the tests” or “disable one project at a time”, there has to be an easier way…

Unrelated, but applicable

This just happened to me, but tracking down the culprit wasn’t as bad as you’d think.

Something I learned as a budding level designer (circa ~1999) was how to find a leak in my level.  A leak in level design occurs when the world is not sealed from the void.  A decent analogy is to imagine the inside of the level is a submarine and the walls are the ‘hull’ — if there is a gap anywhere in the hull the water will get in; it will leak.

A leak could be something as tiny and obscure as a 1 unit gap between a wall and a floor.  Most walls are 128 to 256 units, so a 1 unit gap is very small.  Even now, it’s not really feasible to find one in the editor unless you know exactly where it is.

Half-Life’s goldsrc engine was BSP based; the visibility computations were performed at map compile time.  A failure to build VIS data meant that your level caused the game to run at about 3 frames per second.

Unfortunately, tools were really … rudimentary back then. These days pretty much every editor has built in pointfile loading (meaning it will take you directly to the leak!) but back then, you had to be creative.

The big block method

Back when tools weren’t so great, to find leaks in a level, I used the big block method.  It’s a very simple technique.  Say we have a rubbish, leaky level like so (top down view):

A level, yesterday.

A leaky level, yesterday.

If one of those connections between walls/floors/ceilings/whatever is not tight, it will leak.  We cannot see the site of the leak using our eyes.  We cannot be sure where the leak is by simply scrutinising each wall joint or entity.  What we can do instead, though, is place a big block over ~50% of the level.

50% of the map is now covered

The red area is a newly created solid block

If we compile and find that the leak has disappeared, we know that the leak was definitely in the area that is now covered by the block.  On the other hand, if the leak is still present, it’s in the other 50% of the level that remains uncovered.  To hone in on the problem area, all we have to do is recursively add blocks to the problem area:

Recursively adding blocks half the size of the previous block...

A smaller block has been added

We then recompile and check to see if the leak has disappeared as before.  Notice that in two steps, we’ve narrowed down the problem’s location to an area of 25% of the original size!  The next step will yield a further 12.5% reduction.  We quickly hone in on the problem.

After I started programming, I realised that the leak-finding method I used as a level designer is a simple binary search.

Same thing, different discipline

Applying the same principle to finding problem tests or code is simple!  Divide and conquer.

Open the NUnit test project file and remove 50% of the projects (though in my case, I kept the assembly with the tests that failed, as I needed to see them fail on the second run to know the problem had occurred).  Run the tests twice to see if the failure occurs on repeated runs.  If they fail, you know your problem is in that group of assemblies.  If they pass, you know the problem is in the other half.

It’s then a case of whittling it down in the same way — disable a further 25% of your assemblies, run the tests twice and check the result.  Rinse, repeat.

Eventually you will (most likely!) be down to two assemblies — the assembly that exposes the problem and the problem itself.  If there’s a large amount of tests and fixtures in the assembly you’re scrutinising, disable half of the fixtures and repeat the process.  You will rapidly converge on a fixture and, finally, the test that causes the problem.  From then on it, it’s just standard debugging.

In my case, the culprit ended up being a single line of code calling into a method that has been a long-standing part of our code base.  It looks totally innocuous, and there is absolutely no way I’d have found so quickly without dividing and conquering.

From the top level, with scores of assemblies and thousands of tests, it may as well be a 1 unit gap.