Threading in Real-time Systems

In embedded systems, every cycle counts. It would seem that in this day and age of high performance mobile devices, there would be no need to concern ourselves with every cycle, like we did as little as a decade ago. Yet there are still many situations where this is the case. Even many of today’s games and applications could be improved significantly with some tweaks to the underpinnings of their codebase.

For example, the often overlooked thread system. Most of the time, making your application threadable is an instant win. Suddenly you can make use of the often easily available multiple core devices that are available. However, it is well known that preemptive multitasking does come at a certain cost. There is some overhead for the processor to decide when to swap tasks, as well as a cost for the actual swap itself. Some of this cannot be avoided, but much of this can be considerably optimized, with very little additional work.

Consider cooperative multitasking. Not at all a new concept, and it has drawbacks to be sure. But it is worth considering when you are writing a time-critical codebase that is limited to a known number of cores at compile time. Rather than forcing the OS to guess at what is the most important code for you, managing it yourself allows you the freedom to use every cycle exactly how you see fit.

Let us assume your game has some logic modules that can largely be independent of each other. They are able to read from each other’s state, but there is no need to write to each other. For argument’s sake, we will use the following modules: player (P), enemy (A), enemy (B), and rendering (R). We will add overhead (O) for the threading needs.

In a preemtive threading model, on a single processor, time gets sliced between all of the modules in a fairly haphazard way.

Frame Start                                                               Frame End
OPPPPPOOOAAAAOOBBBBOOAAAAAAOBBBBBOOPPPPPOOAAAAOOPPPPOOBBBBOORRRRRRRRROOPPOORRRRRRRR

Obviously this haphazardness can be reduced by using mutexes to indicate that certain things need to be complete before other things, but that still ends up creating more overhead.

On the other hand, if you were to use cooperative multitasking, you can optimize the overhead.

Frame Start                                                               Frame End
OPPPPPPPPPPPPPPPPOAAAAAAAAAAAAAAOOBBBBBBBBBBBBBOORRRRRRRRRRRRRRRRROO---------------

Look at all that time you’re saving, simply by reducing the overhead. All those -‘s of extra time that you could use to support more objects onscreen at once.

Now consider how much simpler your code can be, knowing that you no longer have to manage mutexes to keep your data in sync across modules. This leads to simpler code, often leading to less potential bugs.

What’s even more exciting is because scheduling is now under your control, you can support features that are difficult to express in current operating system threading environments. You can prioritize certain modules, or even delay other modules to be run every other frame so that you can effectively extend the available frame time.

Actually implementing a cooperative system is fairly easy. Almost all platforms come with the necessary stdlib calls to save and restore processor register state. Even without, setting up caller expectations around those scheduler calls would be simple to do.

There’s nothing stopping you from supporting both styles of thread management in the same codebase. You simply design your modules with this expectation, calling a method to let the scheduler know that you are at a safe point to be interrupted. Wrap the mutex and thread calls in your own wrapper. On systems that work better with threads, they simply pass through to the OS. On systems where you only run on a single thread, they translate to No-ops and don’t affect anything at all.

Just because you are on a machine with multiple cores does not automatically mean you are allowed to use them all. Acting in a single threaded way in such a machine may actually turn out to be more efficient than trying to be multithread aware.

Incidentally, John Carmack certainly considers this model. Rather than splitting everything up for any core possible, the game code is split across rendering vs. game logic. This separates and simplifies both systems considerably in the Doom 3 engine.

Many ideas from the past are still just as relevant today as they once were. It would be a mistake to underestimate cooperative multithreading environments just because we have all these cores available.