30 8 / 2013

Let me amend that: The STL doesn’t suck. It’s an extremely high-quality library, and the APIs are mostly very well designed by some extremely clever people.

But it has its limitations. Some of those are more than acceptable under some circumstances, and completely intolerable under others. In game development and a number of other scenarios, these shortcomings become more than a nuisance.

1. The names

A “vector” in video game development always refers to linear algebra and geometry. While “vector” is definitely the technically correct term for the class, it isn’t at all helpful when the domain terminology is in direct conflict with a fundamental data structure.

A less dire example is std::map, which for some games clashes with their concept of a “level”.

Solution? Most game engines (and other non-STL libraries) use the term “Array” for a vector, even if it clashes with terminology that defines “array” as something that cannot grow. The C++11 concept of std::array (which is a compile-time fixed-size array) and the C++14 concept of std::dynarray (which is a runtime fixed-size array) can be named more intuitively “MaxArray” and/or “FixedArray”.

2. STL Allocators

They’re a hassle to work with. Game engines usually need extremely tight control over memory usage, to the point where some games disallow dynamic allocation during the game loop entirely. That’s often quite inconvenient, but with std::vector you don’t even get the choice. Different types of objects with different well-defined lifetimes benefit immensely from the ability to customize memory allocation; during the game loop, Scope Stack Allocator is usually what you want to strive for, while static metadata, game world object data, and assets benefit from altogether different allocation schemes. In short: We need custom allocators.

But STL allocators are difficult to implement, and they can’t carry state, which forces programmers into shady practices such as global variables. C++11 improves this a bit, but there’s still the issue of the allocator being part of the object’s type, causing template creep where suddenly all functions interacting with the object need to take this template parameter into account, moving code to headers instead of implementation files, which in turn is detrimental to build time, especially in big projects.

Solution? While less efficient, I tend to prefer defining an IAllocator interface and passing instances of classes implementing it to containers and other users of memory. The overhead of a virtual function call on alloc/free is negligible compared to the calculations involved in finding memory on the system heap, and for custom allocators the potential cache miss would most likely happen anyway quite soon when the allocator’s internal structures are modified to accommodate the allocation.

Unfortunately, this approach cannot be integrated with STL allocators, since the pointer to the IAllocator is state. Another drawback is that it can be tricky to design interfaces so that memory always ends up being owned by an object with shorter lifespan than its allocator, but in reality these considerations are well worth the performance boost and flexibility. YMMV.

3. Range slicing

… is basically really difficult with STL. If you don’t want to pass around needless copies, the only way to reliably work on a subrange of an array or a string is by pairs of iterators. It’s cumbersome, pollutes APIs, and error-prone (did you remember to check that your begin is <= your end? Are you 100% sure that the iterators for this particular collection won’t be invalidated while you’re working on them? How about if you change the collection type, are they still valid? Do both belong to the same collection?).

Unfortunately, the type system of C++ is not capable of distinguishing iterators originating from different collections of the same type at compile time — at least not without some seriously dark template magic. That would solve one of the more troubling problems with iterators.

More problematic is the situation with strings. std::string is required to keep its internal representation of a string in a format that’s compatible with C, i.e. zero-terminated. This means that a substring of another string must always be a copy of it, because otherwise the substring will not be zero-terminated. If you’re doing even a moderate amount of string processing (such as parsing, be it file names or semantic text), this adds up to a lot of copying. Most implementations of std::string make up for this by using the Small String Optimization, but as the name suggests it only covers small strings, and only avoids heap allocation, not copying. In short, it’d be lovely to be able to operate on a substring without having to own its memory.

Solution? For passing lists of things, running search algorithms, and everything else that doesn’t require modifications to the original collection, I go for an ArrayRef class, which is essentially an iterator pair that satisfies a number of invariants (both iterators belong to the same collection, begin is always <= end, and users don’t need to know which collection it comes from, as long as it’s in linear memory). It works for arrays, queues, as well as sets and maps implemented on top of arrays.

For string processing, I go for a StringRef class. It needs to be slightly different from ArrayRef<char>, mostly because string literals are always zero-terminated, so initialization with a string constant is strlen(constant) for StringRef and strlen(constant)+1 for ArrayRef<char>.

These issues apply to iterators in general, but are particularly worrying in game development where you might not be able to do any heap allocation at all, but you still need to be able to work with ranges of things.

4. Maps and dictionaries

STL maps are either hash tables (std::unordered_map) or red-black trees (std::map). These are great general-purpose choices, because they perform the best when the size of each collection grows. However, most maps are really small (n < 100), and the constants begin to play a significant role.

For hash tables, choosing the best hashing function is crucial. If there are many collision, performance degrades very rapidly, potentially creating stalls. Hash tables can also have poor locality, to the point where they get outperformed on small collections by binary search. Finally, hash tables usually need to overallocate memory to stay efficient. This is especially a problem if there are a lot of small hash tables, each eating up more memory than they need to.

Red-black trees, on the other hand, are almost equivalent to binary search in terms of memory pressure, and they also perform quite well in both lookups and insertion. Now, one of the many wonderful things about the STL is that it tells you precisely which performance guarantees each algorithm makes. In the case of std::map, it is required to erase elements in (amortized) constant time, as well as maintain iterator validity across during erase. On the surface that seems great, but it means something about the implementation of std::map: It must necessarily store its nodes independently of each other and carry with it an amount of metadata per node designating its position in the tree. That in itself is not a problem, but it can very likely ruin locality, especially if using the system heap as allocator.

Solution? Other than red-black trees, there’s another way to maintain an invariant of a perfectly balanced tree: by storing the map as a priority queue, where elements are sorted upon insertion, and then found again via binary search. This has very different performance characteristics than std::map and std::unordered_map, and whether or not it performs better depends on the specific circumstances. If you have lots of insertions and deletions, it’s not great, because it turns both into O(n + log n), although for small collections that fit in L1 cache, this may still perform better than a less locality-oriented implementation. In most cases, however, associative maps are much more often read than modified, and the improved locality can quite often more than make up for the lost algorithmic efficiency.

Conclusion

I use the STL all the time, and I don’t think developers should abandon it, even for video game development and other pseudo-realtime applications. I’m especially a fan of the <algorithm> part of the library, which makes using the optimal algorithm for a given situation a breeze rather than an afterthought.

But these are domain-specific issues that may contribute to the fact that the STL isn’t particularly widespread in the video game industry.

  1. saramq reblogged this from simonask
  2. simonask posted this