Task-Centric Development
Why do one thing when you can do 12 things at the same time?
Many of our games utilize only a single CPU core. This is to keep things simple, and also make it easier to port to web (most of our smaller games have a playable web version). But Asteroid Arena and our current witch-themed game are exceptions.
In Asteroid Arena, using multiple cores was an afterthough brought on by necessity. Our minimum spec machine was having issues hitting 60 FPS consistently.
Divide et impera
How do you turn a single-threaded software to multi-threaded without re-writing the entire thing?
Easy. Check which loops are slow, delegate iterations to different threads and then wait for all iterations to finish. Of course you have to look out for the usual multi-threaded concurrency related issues such as false sharing, dead-locks, race conditions, etc.
In practice this approach means: call a function, the function delegates work to tasks, wait for the tasks to complete and then return. This makes updating the previous single-threaded code to multi-threaded not that hard, as the "flow" is pretty much unmodified.
Although this approach is easy to implement, it is not a good solution. One of the main issues can be observed at the end of each task; once all but one iteration has been completed, all other cores are idle — just waiting for that one core to finish. These issues can be seen as "holes" in the profiling data and can be quite big.
Asteroid Arena was running at over 60 FPS, so I was "happy enough" using this method…
Time waits for no one
… but no way I would ever do it that way again!
The witch game deals with concurrency very differently. Except for the debug UI, any non-trivial work is packaged as a task that is later executed. The tasks don't only package work to be done, but also keep track of their dependencies. The task system that runs the tasks simply picks one that doesn't have any unfinished dependency, and executes it.
This method fixes most of the holes that the divide-and-conquer method imposed. There might still be holes, but those are usually caused by all enqueued tasks depending on a singular one (that's still being executed). This issue can only be solved by the programmer observing the profiling data and changing the tasks to avoid such issues.
The future is now!
But how to specify the task dependencies? In practice, specifying a dependency to a function is easy. It's called function parameters (dah). This is also how most of the task dependencies are tracked — you pass them as arguments. But how do you pass a value to a task before the value is created? Meet the future (not to be confused with C++ std::future
).
void DrawWorld(...)
{
Future_SortedVisibleModels* sortedModels = CreateFuture_SortedVisibleModels();
Future_VisibleModels* visibleModels = CreateFuture_VisibleModels();
Future_VisibleEntities* visibleEntities = CreateFuture_VisibleEntities();
$push_task(GatherVisibleEntities,
WriteToFuture(visibleEntities),
visibleWorldRegion,
spatialPartitioning);
$push_task(DrawEntities,
WriteToFuture(visibleModels),
ReadFromFuture(visibleEntities),
WriteToFuture(animations));
$push_task(SortBackToFront,
WriteToFuture(sortedModels),
ReadFromFuture(visibleModels),
camera);
...
}
Any task that reads a future (using ReadFromFuture
) will depend on all tasks that writes to that future (using WriteToFuture
). This makes the task dependency chain quite intuitive to set up. But sometimes, data is not enough to determine which tasks should execute before others. After all, there is no data that says the debug UI should be rendered after the game.
void RenderEverything(...)
{
Future* gameIsRendered = CreateFuture();
$push_task(RenderGame, gameWorld)
{
Signal(gameIsRendered);
};
$push_task(RenderDebugUI, debugUI)
{
After(gameIsRendered);
};
...
}
The Signal
adds the RenderGame
-task as a writer to the future. The After
function adds the writers of the future as dependencies of RenderDebugUI
. This causes RenderGame
to always execute fully before RenderDebugUI
starts.
If memory serves me right…
One issue with this approach, that the previous one didn't have is regarding memory. As Asteroid Arena tasks were all executed before the function returned, the entire context of the tasks could be allocated on the stack. This is not possible using this approach as most likely the function that enqueued the task returns before that task completes (or even starts).
One solution to handle this would be to use reference counting. When the last reader of the future is completed, it can be deleted. As the allocations and deallocations are done in arbitrary order, a heap allocator would probably be required.
I went for another approach. All task parameters and futures are allocated from an arena that gets cleared every frame. In other words, all the memory is "leaked" but it's cleaned up the next frame. How much is this leakage? Just a couple of kilobytes.
Macros and conclusions
That is how the task system works in our latest in-progress game. There are some implementation details like how to make it lock-free and how to avoid false sharing that I haven't included here, but those are implemented using mostly well-known techniques.
Oh and erhm regarding the $
prefix… I may have implemented my own macro language 😏
- Jens the Programmer, July 2024