Data oriented design links

Posted in games, patterns on July 21st, 2010 by Mark Simpson – Be the first to comment

I’ve been doing a little reading around data oriented design of late and thought it was worth sharing some interesting links. Here’s my distillation of the reading I’ve done so far (caveat: I may be talking balls).

Prelude: Battling Dogma

All too often, games programmers butt up against dogmatic catch-all declarations of “virtual functions are slow!”  For the general case, this can be proved as a nonsense as virtual function calls are blatantly not slow!  They’re very, very fast.  However, if someone said instead, “virtual functions are slow when iterating over large collections of heterogeneous types because of cache misses” then that’s another matter, entirely.  Unfortunately, we all too often hear the former declaration rather than the latter.  It’s neither compelling (as we can prove it is incorrect in the general case) nor edifying.  Most programmers like to learn things, so it’s nice to read some illuminating articles about a touchy subject.

The gist of it

Data oriented design is based on examining the access patterns and transformations performed on data.  The code is then structured to make it data-centric by using a combination of changes to the type of data stored, the way it’s laid out in memory and the ordering of the data, amongst other things.  An instructive example of this is given in the BitSquid article where animation data is ordered by time, as this best fits the common access pattern that a game would use.  Another particularly useful example is where data structures are broken up into multiple parts so that more of the data used by common operations fits into the cache.

The main benefit is reducing cache misses, but a nice side effect is the increased opportunities for parallelisation.  A lot of this stuff is well-known and as old as the hills, but in my experience it’s often been bundled with dubious practices, so it’s nice to see some practical, tangible examples of when and why you should apply such techniques.

Links

Games from Within Article — A high level article; Noel works through some disadvantages of object oriented design and then cites some examples where data oriented design can be employed to speed things up.

Pitfalls of Object Oriented Programming – Tony Albrecht of Sony has some very interesting diagrams, slides, timings and statistics that lay out the costs of cache misses and branch prediction failures with very specific examples, then optimises via various means.

Practical Examples in Data Oriented Design — Bitsquid engine programmers dish out some examples of designing with data access in mind.  Higher level that the Sony presentation, but also very useful.

GameDev discussion thread — Generally useful discussion thread.  Has some code examples, too.

Typical C++ Bullshit — Code annotated by cranky post-it notes.  Not exactly an illuminating discussion or an article as such, but worth including for completeness.

Related

Game Entity Systems – The T=Machine blog has a series of posts on designing game entity systems.  One part of the series deals with processing homogeneous data and why this makes it fast / more easily parallelisable.

Terms that mildly displease me #243 & #244

Posted in misc, rants on June 9th, 2010 by Mark Simpson – Be the first to comment

#243 “Concretion”

Concretion sounds stupid.  Please, just say concrete type.

#244 “Action”, as in “he’s Actioning it as we speak”

Action is a noun.  I will not stand for this!

Castle DynamicProxy2 quirks

Posted in c#, gotchas, patterns, software on May 15th, 2010 by Mark Simpson – Be the first to comment

I’ve been faffing around with Castle.DynamicProxy2 a bit lately and it’s a pretty interesting bit of kit.  Castle Dynamic Proxy (CDP) allows you to dynamically generate proxies at runtime to weave aspects of behaviour into existing types.  Aspect oriented programming is typically employed for implementing crosscutting concerns such as logging, performance measuring, raising INotifyPropertyChanged and various other types of repetitive and/or orthogonal concerns.  I’m a newbie to this stuff so I won’t say much more on AOP.

While I really like CDP, I’ve found that the documentation and tutorials (the best of which is Krzysztof Koźmic‘s excellent tutorial series) aren’t particularly explicit on how CDP achieves its effects, and sometimes these details are important.

There are two main ways of creating proxies that most developers will encounter.

CreateClassProxy

This is nearly always the first demonstrated method in tutorials.  ProxyGenerator.CreateClassProxy dynamically subclasses the target class, so if you have a class named Pogo and you call ProxyGenerator.CreateClassProxy, what you’ll get back is an instance of a subclass of Pogo (i.e. the new type is-a Pogo) that weaves in the interception behaviour via overriding methods.  This is why it is a stipulation that methods / properties must be virtual when they’re intercepted.

With class based interceptors, you cannot intercept non virtual methods because unlike Java, C# does not make methods virtual by default.  If you try to intercept a non-virtual method, nothing will happen, though mechanisms do exist to allow you to identify these situations and warn the developer (the most common example of this is that NHibernate will die on its arse if you try to use lazy loading with a non-virtual member).

CreateInterfaceProxyWithTarget

The second method is ProxyGenerator.CreateInterfaceProxyWithTarget, and it is the primary reason for writing this blog post!  CreateInterfaceProxyWithTarget does not dynamically subclass target types, it simply creates a dynamically generated class, implements the same target interface and then passes through to it.  I.e. it’s an implementation of the decorator pattern.  This has two effects, one of which is very important!

  1. You don’t need to mark your methods/properties as virtual
  2. Since it is a proxy working via decoration rather than subclassing, for the effects of the interceptor to be applied, all calls must be made on the proxy instance.  Think about it.

The most salient point is #2.  I’ll elaborate.

A worked example: Rocky Balboa

Say you have an interface called IBoxer like this:

… and you implement it like this:

If you then turn to aspect oriented programming and decide to gather statistics on punches thrown for the duration of a boxing round, it’s a reasonable assumption that you can simply proxy the IBoxer interface and intercept only the StraightLeft/StraightRight punch calls, tally them up and report the metrics (ignore whether this is a good idea to be doing this, it’s a contrived example).  On the face of it this isn’t a horrible idea.  However, it won’t work as expected.

The key here is that OneTwo() calls through to StraightLeft() and StraightRight().  Once the proxy has delegated to the decorated type it loses the ability to intercept the calls.  We can follow the call graph easily enough.  We have a reference to the proxy via the IBoxer interface.   We call “OneTwo()” on it and when the invocation proceeds, it delegates to the decorated Rocky instance.  The Rocky instance then calls StraightLeft(), StraightRight().  Both of these calls will immediately go straight to the ‘real’ implementation, bypassing the proxy.

Just as with the normal decorator pattern, the decorator (the IBoxer dynamic proxy in this case) loses its influence when the decorated object calls methods declared in its own public interface.  In this particular situation we could write an interceptor that knows to add two punches when OneTwo() is called on the proxy, but compare this approach to one using class based proxy.  If we were using a class proxy we could rest safe in the knowledge that all calls to StraightLeft() and StraightRight() will always be intercepted, as the extra stuff we’ve bolted on resides in a method override.

The results vary depending on the type of proxy we generate and the way the types are written. In hindsight it’s pretty obvious, but it still caught me out.

Data-driven testing tricks

Posted in Uncategorized, c#, patterns, testing, tips on May 8th, 2010 by Mark Simpson – Be the first to comment

It’s a fairly common occurrence — somebody wants to use NUnit’s data driven testing, but they want to vary either the action under test, or the expectation.  I.e. they’re not parametrising simple data, they’re parametrising the actions.

You cannot encode these things via normal data-driven testing (short of doing really nasty things like passing string names of methods to be invoked or using enums and a dictionary of methods) and even if you use a hackish workaround, it’s unlikely to be flexible or terse.

Test readability is paramount, so if you have some tests written in an unfamiliar style, it is very important to express the intent clearly, too.

NUnit’s data-driven testing

NUnit uses a few mechanisms to parametrise tests.  Firstly, for simple test cases, it offers the [TestCase] attribute which takes a params object[] array in its constructor.  Each argument passed to the TestCaseAttribute’s constructor is stored, ready for retrieval by the framework.  NUnit does the heavy lifting for us and casts/converts each argument to the test method’s parameter types.  Here’s an example where three ints are passed, then correctly mapped to a test method:

The main limitation here is that we can only store intrinsic types.  Strings, ints, shorts, bools etc.  We can’t new up classes or structs because .NET doesn’t allow it.  How the devil do we do something more complicated?

Passing more complicated types

It would appear we’re screwed, but fortunately, we can use the [TestCaseSource] attribute.  There are numerous options for yielding the data, and one of them is to define an IEnumerable<TestCaseData> as a public method of your test class (it works if it’s private, but since it’s accessed via reflection it’s a good idea to keep it public so that ReSharper or other tools do not flag it as unused).  You can then fill up and yield individual TestCaseData instances in the same fashion as before.  Once again, NUnit does the mapping and the heavy lifting for us.

If you do not require any of the fancy SetDescription, ExpectedException etc. stuff associated with the TestCaseData type, you can skip one piece of ceremony by simply yielding your own arbitrary type instead (i.e. change the IEnumerable<TestCaseData> to IEnumerable<MyType> and then simply yield return new MyType()).

Passing a delegate as a parameter (simple)

The simplest case is that you want to vary which methods are called.  For example, if you have multiple types implementing the same interface or multiple static methods, encoding which method to call is very simple.

Here’s an example from Stackoverflow that I answered recently where the author wanted to call one of three different static methods, each with the same signature and asserts.  The solution was to examine the method signature of the call and then use the appropriate Func<> type (Funcs and Actions are convenience delegates provided by the .NET framework).  It was then easy to parametrise the test by passing in a delegates targeting the appropriate methods.

More advanced applications

Beyond calling simple, stateless methods via delegates or passing non-intrinsic types, you can do a lot of creative and cool stuff.  For example, you could new up an instance of a type T in the test body and pass in an Action<T> to call.  The test body would create an instance of type T, then apply the action to it.  You can even go as far as expressing Act/Assert pairs via a combination of Actions and mocking frameworks.  E.g. you could say “when I call method X on the controller, I expect method Y on the model to be called”, and so forth.

The caveat is that as you do use more and more ‘creative’ types of data-driven testing, it gets less and less readable for other programmers.  Always keep checking what you’re doing and determine whether there is a better way to implement the type of testing you’re doing.  It’s easy to get carried away when applying new techniques, but it’s often the case that a more verbose but familiar pattern is a better choice.

Steam is testing my patience

Posted in rants on March 4th, 2010 by Mark Simpson – Be the first to comment

Steam is great; Steam is good.  What’s not to love about the venerable, all-singing, all-dancing Steam?  Well, how about the fact that they shut down their servers for ‘routine’ maintenance at peak UK gaming hours without forewarning anyone?  How about the fact that, without these facilities, you cannot play online or chat to friends?  Moreover, how about the fact that Valve doesn’t even have the ability to warn its customers to expect the downtime despite having a client installed on every steam user’s machine?

I was in the middle of a close game of L4D2 team versus and suffered from some hardware lag, so I quickly restarted L4D2 to try and get a smooth framerate, as did one of my friends.  When we attempted to rejoin the game it was not possible; the friends service was down and connecting via the console/server browser was also a non-starter.  So, it’s game over for tonight.  Why do we accept this?  I’ve lauded Valve and generally think Steam is really good, but this is totally unacceptable and it’s the second time it’s happened in a single week which is making me irritable.

Including the beta phase, Steam has been knocking around for the best part of a decade now and yet we — the paying customers — are expected to go onto the steampowered sub-forum and check for scheduled maintenance prior to settling down for an hour or two of gaming.  Does anyone really think that’s an acceptable practice?  I can forgive Valve for doing maintenance during peak hours (as it’s always peak hours for someone in the world), but considering how tied into steam we are, the least they could do is add the functionality to warn gamers when scheduled maintenance is approaching so they can take appropriate action.

Just think!  We could get a little pop-up in the corner of the screen via the Steam UI saying “Sorry, but friends will be going down for scheduled maintenance in x minutes; you will not be able to connect to games during this time.  We expect maintenance to take 30-60 minutes”.

While waiting for maintenance to conclude, we could read a book, watch TV or complain on a forum where your constructive, good natured thread will be closed / deleted without any explanation (as well as the follow up that complained about your constructive thread disappearing).  Er, nice job mods. I can’t play a nice relaxing game, nor can I (politely) vent about it on your forums.  While I’m not exactly harking back to the days of manual patching and won.net, at least we didn’t have to put up with this particular kind of bullshit back then.

Still, at least Mac users will soon get the chance to not play the games they’ve paid for!  That’s good news, right?  Anyway, resume the festivities.  The viral campaign is fantastic and we all love Valve.  Just ignore this blemish and airbrush it from history.

Clownshoes DRM coming soon

Posted in rants on February 18th, 2010 by Mark Simpson – 1 Comment

Can you believe this horse shit?  Yes, you read correctly.  DRM that requires you to be online.  At all times.  Not at all irritating.  Check the calendar, folks — it’s not the first of April.

I’ve already had enough of the likes of Securom messing up my (paid for) games.  My trials and tribulations included misfiring versions of Fallout 3 (randomly stopped working and required a reboot) and Race Driver: GRID (I had to eject the disc prior to starting the game every single time).

Unfortunately, Ubisoft’s latest ‘solution’ looks far more invasive.  Do these companies honestly believe that penalising their paying customers will lead to improved sales?  It’s a guaranteed road to ruin.  There’ll likely be a cracked (read: superior) version on torrent sites within the week of release — possibly a day zero effort.  I don’t advocate game piracy, but I could swear that certain game publishers do.

Along similar lines: I’m already fed up with the amount of games that require extra bullshit.  Every single extra step that I have to take is a barrier between me and your game.  Recently, many games are starting to demand a Windows Live account, even when I don’t care about it.  Batman was bad enough for it, but GTA IV on the PC requires me to sign up for a Windows Live account and then constantly nags me to sign up for the Rockstar Social Club.  This adds an extra two screens to click through every goddamn time I want to launch the game.  I don’t want to watch some user generated bollocks video about a clown on a moped, I just want to play the game.

I feel like William Shatner in “I can’t get behind that”.  Leave me the hell alone!

Here’s the old way of playing a game you’ve recently purchased:

  1. Install
  2. Click on shortcut to launch game

Sounds pretty good to me.

More readable data-driven tests

Posted in c#, testing, tips on February 13th, 2010 by Mark Simpson – Be the first to comment

When the logic of a test method remains constant but the data varies, data-driven testing is a great tool.  It allows you, the test author, to write compact code and to add new test cases rapidly.  Unfortunately, data-driven tests have a disadvantage: The inputs are often less readable.

A simple example

Let’s take an example; testing Rob Conery’s PagedList implementation.  A page is basically a slice of the data returned by a linq query.  If more data exists beyond the ‘slice’ represented by the PagedList<T> instance, its “HasNextPage” property should return true to indicate that it is available.  Now, suppose we want to test whether a particular page has a next available page.  Three things spring to mind that can influence the result: The page size, the current page index and the number of items in the list.

Here’s a quick data-driven test for HasNextPage:

As you can see, the method itself is readable, but the parametrised values fed in via the [TestCase] attribute are not.  It’s really hard to keep everything in your head and remember what each number maps to in the function, especially when the method parameter types are all identical.  If you have a list of 20 [TestCase] attributes, you start wondering what’s for your tea and forget that the second value is the (checks image) page size.  Mince.  Mince for tea.

Hmm, if only we could those TestCases more readable; something like object initializers would be ideal.

A simple trick: Subclass TestCase

My friend Hughel helped me come up with this one and it works quite well.  Simply subclass the TestCaseAttribute class and add your own properties to represent the test parameters.  It gets a little bit hairy when you have to access the Arguments array directly (especially since Attributes can be weird), but in practice, it works fine.   In most of the tests, we’re only interested in parametrising three things, so it’s simple to add them as properties.

The end result

Finally, we apply these attributes to our data-driven test, significantly improving the readability!

I would hasten to add that I don’t recommend using this approach willy-nilly — only when you have a large amount of tests that are parametrised by the same data types, causing the test cases to become hard to follow.  I’ve used the [TestCaseSource] attribute and the [TestCase] attribute a lot in the past and most of the time it’s not a problem.

Making sense of your codebase : NDepend

Posted in c#, software on January 18th, 2010 by Mark Simpson – Be the first to comment

I was given an NDepend license to play around with (thanks to Patrick) and said I’d blog about it.  Apologies for my tardiness!

What is NDepend?

NDepend is a piece of software that allows developers to analyse and visualise their code in many interesting ways.  Here’s an ancient relic from my hard drive, revived in NDepend:

As you can see, there’s quite a lot of functionality included.  NDepend includes features that allow you to trawl your code for quality problems, identify architectural constraints that have been breached, find untested, complicated code or pinpoint changes between builds and much more.  This post will merely scratch the surface, but will hopefully provide some decent information on what NDepend can do for you and your team.

NDepend works with Visual Studio solution files and integrates with Visual Studio, meaning it’s simple to generate an NDepend project based on a solution.  You can then press the analyse button and watch NDepend do its thing — it generates a nice HTML report with various visualisations and statistics.

CQL all the way

NDepend provide a lot of pre-defined metrics out of the box, but its best feature by far (and the one on which most of its functionality is built) is the CQL query language.  The SQL / Linq-esque query language allows you to search, filter and order data in a simple fashion.  Once you get accustomed to the query language, it is simple to start bashing out meaningful queries.

You enter queries into a command window and the results are displayed instantly.  For example, you could write something like this:

At the time of writing, 82 separate metrics are available; the metrics are grouped by categories including: assembly, namespace, method etc.  If you have an idea, then you can probably express that idea via CQL.   Furthermore, your queries can be saved and reused/applied to projects as required.

To write a query, you just tap it into the query editor:

Don’t be Paralysed by Choice

While it is a hugely powerful program, I found it overwhelming to begin with.  The default analysis of your code will result in a verbose report including every metric under the sun; some of these are unarguably very useful (Cyclomatic complexity, IL nesting depth) whereas others have a more narrow purpose (suggesting which attributes should be sealed, boxing and unboxing warnings and so on).

Due to the mammoth amount of information in the report, it’s akin to pulling the FxCop lever for the first time.  You get metrics-shotgunned in the face and potentially paralysed by choice.  As a result, I chose to view the default report as a rough guide for how to use the CQL query language and to give me ideas of how I could form queries that were tailored to my needs.  There’s an excellent placemat PDF available for download, too.

I would encourage first time users to skim the main report for interesting nuggets then immediately begin to play around with the CQL query language :).  It’s really cool to think, “I wonder if…”, type a few words and immediately see the question answered.

Manual trawling versus NDepend

Back when I was a Test Engineer, I was assigned to look at an unfamiliar part of our codebase.  I decided to do a little experiment using NDepend to determine its effectiveness (we have a few licenses at work too and are looking to integrate it into our build process at some stage soon).  The test was simply this:  I would work through the code manually checking each file/type for problems, then I would do a sweep using NDepend to see if it could pick out the same problems (and potentially some others I’d missed).

The sort of things I was looking for included:

  • Big balls of mud / God classes
  • Types with the most complicated functionality (high cyclomatic complexity / code nesting level)
  • Types that are heavily used (‘arteries’ of the codebase — failure would be more costly)
  • Types that have poor coverage and high complexity

The results were good.  The majority of the problems I had identified during a manual sweep showed up near the top of the query results. It took me a few hours to manually trawl a relatively small portion of the codebase.  It took me about an hour to get to grips with the NDepend basics, write some CQL and make sense of the results.  I’d say that’s very good going considering I hadn’t used it before.

One thing to watch out for is that some manual tweaking is often required as, if you have some ugly utility classes or use open source software in source form (such as the command line argument parser class that is ubiquitous at work), these sorts of thing monopolise the results.  To get around this problem you can ignore these types, either by applying attributes to your types in code, or by adding an explicit exclude via your query (some examples of which are included below).

Sample Queries

Here’s a few examples of how to express the aforementioned concerns as queries.  Note: I’m an NDepend newbie, so there’s probably better ways to do this.

Types that have so many lines of code that it makes your head spin

Complicated nesting

Complicated methods

High complexity with poor test coverage

‘Popular’ types with poor test coverage

None of these are ‘hard’ rules — you have to play around with them while browsing your codebase (which is easy as NDepend’s CQL editor is constantly updating as you enter the query).  You may find that one project has really simple code that doesn’t even register on a solution-wide analysis, whereas another project may hog the results pane.  If you have a specific goal in mind, you can iteratively tailor the query to get what you want :).

The role of NDepend?

NDepend is not a silver bullet and doesn’t claim to be — it’s a complementary tool that can be used in addition to buddy systems, code reviews and so forth.  Having said that, the amount of information you can mine from your codebase is pretty impressive.

In terms of practical usage, we plan to integrate it into our build system to aid us in identifying potential problems, including code changes not backed by tests, architectural rules (project x is not allowed to reference project y) and general rules of thumb that should be respected (cyclomatic complexity, nesting depth and so forth).

In short, it’s well worth checking out NDepend.

AutoMapper and Test Data Builders

Posted in c#, patterns, testing, tips on January 11th, 2010 by Mark Simpson – Be the first to comment

I’ve recently been tinkering with WCF and, as many people already know, writing data transfer objects is a pain in the balls.  Nobody likes writing repetitive, duplicate and tedious code, so I was delighted when I read about AutoMapper.  It works really nicely;  with convention over configuration, you can bang out the entity => data transfer code in no time, the conversions are less error prone, the tests stay in sync and you’re left to concentrate on more important things.

Anyway, I immediately realised that I’ve used the same pattern in testing — with property initializers & test data builders.  I’ve posted before about Test Data Builders and I’d recommend you read that post first.

For small test data builder classes, it’s really not that big a deal.  For larger classes, using AutoMapper is quite useful.  For example, for testing purposes we’ve got an exception details class that is sent over to an exception logging service.

Every time the app dies, we create an exception data transfer object, fill it out and then send it over the wire.  When unit testing the service, I use a Test Data Builder to create the exception report so that I can vary its properties easily.  Guess what?  The test data builder’s properties map 1:1 with the exception report — hmm!

So, rather than create the same boilerplate code to map the 10+ properties on the exception builder => data transfer object, I just used AutoMapper to handle the mapping for me :)

public class ExceptionReportDto
{
    public string ExceptionType { get; set; }
    public string StackTrace { get; set; }
    public string AssemblyName { get; set; }
    public string EntryPoint { get; set; }
    public string UserName { get; set; }
    public string MachineName { get; set; }
    // etc
}
public class ExceptionReportBuilder
{
   public string ExceptionType { get; set; }
   public string StackTrace { get; set; }
   public string AssemblyName { get; set; }
   public string EntryPoint { get; set; }
   public string UserName { get; set; }
   public string MachineName { get; set; }
   // etc

// create the mapping when the static ctor is invoked
static ExceptionReportBuilder()
}
   Mapper.CreateMap<ExceptionReportBuilder, ExceptionReportDto>();
}

public void ExceptionReportDto()
{
    // set up defaults
    ExceptionType = "System.ArgumentException";
    StackTrace = "Oh no I am a stack trace";
    //etc.
}

 public ExceptionReportDto Build()
 {
     // go go automagic!
     return Mapper.Map<ExceptionReportBuilder, ExceptionReportDto>(this);
 }
}

I’ve had good results with this approach.  The only bit I’m remotely concerned about is creating the mapping in the static constructor.  Any AutoMapper gurus out there who can say whether there’s any reason I shouldn’t do that?

Debug.Assert vs. Exceptions

Posted in Uncategorized on January 9th, 2010 by Mark Simpson – Be the first to comment

“When should I use Debug.Assert and when should I use exceptions?” — It’s a fairly sensible question to ask, but you’ve got to sift through a lot of articles to get anything resembling solid guidance on it (Eric Lippert’s stack overflow post is particularly enlightening).  I’ve wrestled with it quite a bit as a programmer and test engineer, so here’s my 2 pence.

Good rules of thumb I’ve arrived at:

  1. Asserts are not a replacement for robust code that functions correctly independent of configuration. They are complementary debugging aids.
  2. Asserts should never be tripped during a unit test run, even when feeding in invalid values or testing error conditions. The code should anticipate and handle these conditions without an assert occurring!
  3. If an assert trips (either in a unit test or during normal application usage), the class containing the assert is the prime suspect, as it has somehow managed to get into an invalid state (i.e. it’s bugged).

For all other errors — typically down to environment (network connection lost) or misuse (caller passed a null value) — it’s much nicer and more understandable to use hard checks & exceptions. If an exception occurs, the caller knows it’s likely their fault.  This is what makes the .NET base class libraries a joy to develop with — it usually clear when you are misusing an API, resulting in fewer “select is broken” moments.  It fails early and clearly communicates the reason for failure.

You should be able to test and use your class with erroneous input, bad state, invalid order of operations and any other conceivable error condition and an assert should never trip expectedly.  Each assert is checking something should always be true regardless of the inputs or computations performed.  If something should always be true, then the number of asserts used shouldn’t be a barrier to thorough unit testing.  If an assert occurs, the caller knows it’s likely a bug in the code where the assert is located.

If you stick to this level of separation, things are a bit easier.