Modern programming languages require generics
A few weeks ago I wrote about the Hare language and its lack of generic data structures. I don’t want to talk about this topic again, instead I want to discuss something more generic (pun intended). In my view, any modern programming language that aims for high performance should have some form of generics in it. To not have that in place is a major mistake and a huge cause for additional complexity and loss of performance. One aspect of that is the fact that generic data structures get a lot more optimizations than one-off implementations. But I already talked about that in the previous post.
The other issue is that by not having generics, there is a huge barrier for optimizations in front of you. You lack the ability to build certain facilities at all. Case in point, let us take a topic that is near and dear to my heart, sorting. Working on sorted data is pretty much the one thing that makes databases work. Everything else is just details on top of that, nothing more. Let’s consider how you sort data (in memory) using a few programming languages, using their definitions
Using C:
void qsort (void *array, size_t count, size_t size, comparison_fn_t compare);
int comparison_fn_t (const void *, const void *);
Using C++:
template <class RandomAccessIterator>
void sort (RandomAccessIterator first, RandomAccessIterator last);
Using Java:
public static void sort(int [] a);
public static void sort(long[] a);
public static void sort(Object[] a);
Using C#:
public static void Sort<T> (T[] array);
Using Hare:
type cmpfunc = fn(a: const *void , b: const *void ) int ;
fn sort([]void , size, *cmpfunc) void ;
Using Rust:
impl<T> [T] {
pub fn sort(&mut self)
where
T: Ord,}
Using Zig:
pub fn sort(
comptime T: type,
items: []T,
context: anytype,
comptime lessThan: fn (context: @TypeOf(context), lhs: T, rhs: T) bool,
) void
I’m looking only at the method declaration, not the implementation. In fact, I don’t care about how this is implemented at this point. Let’s assume that I want to sort an array of integers, what would be the result in all of those languages?
Well, they generally fall into one of a few groups:
C & Hare – will require you to write something like this:
In other words, we are passing a function pointer to the sorting routine and we’ll invoke that on each comparison.
C++, C#, Rust, Zig – will specialize the routine for the call. On invocation, this will look like this:
The idea is that the compiler is able to emit code specifically for the invocation we use. Instead of having to emit a function call on each invocation, the compare call will usually be inlined and the cost of invocation is completely eliminated.
Java is the only one on this list that has a different approach. Instead of using generics at compile time, it is actually doing a dispatch of the code to optimized routines based on runtime types. That does mean that they had to write the same sort code multiple times, of course.
Note that this isn’t anything new or novel. Here is a discussion on the topic when Go got generics, in the benchmark there, there is a 20% performance improvement from moving to the generics version. That results from avoiding the call overhead as well as giving the compiler more optimization opportunities.
Going back to the premise of this post, you can see how a relatively straightforward decision (having generics in the language) can have a huge impact on the performance of what is one of the most common scenarios in computer science.
The counter to this argument is that we can always specialize the code for our needs, right? Except… that this isn’t something that happens. If you have generics, you get this behavior for free. If you don’t, well, this isn’t being done.
I write databases for a living, and the performance of our sorting code is something that we analyze at the assembly level. Pretty much every database developer will have the same behavior, I believe. The performance of sorting is pretty key to everything a database does. I run into this post, talking about performance optimizations in Postgres, and one of the interesting ones there was exactly this topic. Changing the implementation of sorting from using function pointers to direct calls. You can see the commit here. Here is what the code looks like:
Postgres is 25 years old(!) and this is a very well known weakness of C vs. C++. Postgres is also making a lot of sorting calls, and this is the sort of thing that is a low hanging fruit for performance optimization.
As for the effect, this blog post shows 4% – 6% improvement in overall performance as a result of this change. That means that for those particular routines, the effect is pretty amazing.
I can think of very few scenarios where a relatively simple change can bring about 6% performance improvement on a well-maintained and actively worked-on 25-year-old codebase.
Why am I calling it out in this manner, however?
Because when I ran into this blog post and the optimization, it very strongly resonated with the previous discussion on generics. It is a great case study for the issue. Because the language (C, in the case of Postgres) isn’t supporting generics in any meaningful way, those sorts of changes aren’t happening, and they are very costly.
A modern language that is aiming for performance should take this very important aspect of language design into account. To not do so means that your users will have to do something similar to what Postgres is doing. And as we just saw, that sort of stuff isn’t done.
Not having generics means that you are forcing your users to leave performance on the table.
Indeed, pretty much all the modern languages that care for high performance have generics. The one exception that I can think of is Java, and that is because it chose backward compatibility when it added generics.
Adding this conclusion to the previous post about generics data structure, I think that the final result is glaringly obvious. If you want high-performance system, you should choose a language that allows you to express it easily and succinctly. And generics are mandatory tooling in the box for that.
Comments
I'm wondering why you used GitHub embeds for two of the code samples. They require third-party JS which means that they don't work for users with JS disabled or RSS readers. The regular code blogs work great for all users with better performance and security.
I was quite confused in my RSS reader when some code was referenced but didn't exist when earlier code blocks did!
Kevin,
It makes it easier to handle code formatting and update just the code. Mostly, just because it is the workflow I have.
I don't mind switching from my RSS reader to a web browser with JS enabled to see the code samples inline with the discussion.
It would be nice for reference if you included Go in your examples, esp. since it recently got generics and its much more popular than Zig or Hare.
Sorry, but the particular pg example does not support you statement.
dispatching by the type of the sort key happens during runtime (not compile time!). So any language with generics would have the same piece of code that you put in the article. Only difference would be
qsort_tuple<signed>(..)
instead ofqsort_tuple_signed(...)
if you look at the code in the commit you will see that the pg delevopers successfully use generic, yes, in plain c! Okey, it is makeshit macros+include_template way of compile-time code generation - but this way exists in plain cee for the ethernity. This means the plain c has enough support for that style of generics. So i am sure that with the generic-aware language it still takes the same 25 years to enhance perfomance - because the runtime dispatching (see the item 1) does not appear automagically, it must be written explicitly.
And psyhologically-wise, relying on the "sufficiently-fast standard library with generics" can be show-stopper for reseach for a perfomance improvement. Like why should i try to use the radix sort for integer type or to use database statistics in some way or whathever? - the generic sort is surely fast-enough.
I probably missed something but it seems you use a C example to show how that specialization improved performance and at the same time claim this is impossible in C.
The problem with monomorphization which is how generics are implemented in most of these languages is that you always get specialized functions all the time. This is great in the few cases where specialization is good and it will always look great in microbenchmarks, but will create a lot of bloat overall and there is no easy way to go back. A compiler can always easily inline and specialize code, but it is much more difficult to unify generics which have been expanded too much. So I think all these languages do it wrong.
The point is likely that the work wasn't done for 25 years, and other similar work is not done for similar time periods, not that it can't be done.
voidlogic,
If you'll look here, you can see that in Go, the
sort
is not using generics, see:https://github.com/golang/go/blob/master/src/sort/sort.goYou are paying the cost of indirection, and you also have: https://eli.thegreenplace.net/2022/faster-sorting-with-go-generics/ That covers what happens with generics impl for sorting.
Garcha,
You are sorting an array - so the issue here is whatever you are specialized by the array type or not. Having separate calls per type is fine, you are paying that once per array.Having a non generic version means you are paying the cost _per item in the array_.
Support for generics in C via macros is... possible, but really really bad. You can do that, but it is awkward, it goes against the grain, the tooling around it are really bad, etc.Note that the same can be said for any language, I remember distinctly writing code generation code to get type safe collections in C# 1.1 days, for example. It was a hassle, so it wasn't usually done.That is pretty much my _point_, you have enough of a barrier that even a pretty obvious optimization got delayed by 25 years.
If you are writing a general purpose application, sure, you can use whatever you have in the system. But a database lives & dies by sorting.Just to point out, pg is not using the sort routine from the std lib. They built their own, and they didn't specialize that for a long time.
Martin,
I'm saying that you can do that (with a lot of hassle) for C, but pointing out that a good scenario for doing just that got delayed by a lot because the hassle is enough that it wasn't done. This is a case of missed optimization just because the language doesn't support it, so you need to do that manually.
As for monomorphism. Generating a separate copy of the function for each specialized type can be a problem (although certain compilers can merge similar functions), but that is the nature of the beast. You need distinct machine code implementations to support the best performance, after all.
For example, in C#, we use generics quite a lot to provide enough information for the JIT to be able to do proper inlining and code elimination.
Martin,
the problem with monomorphization is that once code is written this way it (usually) forces creating a separate copy and inlining. I think this more often a problem than not. After all, why would optimizers be so careful with inlining and specialization if this were always the right choice? The answer is that this would create a combinatorial code explosion where specific cases are faster but the overall performance suffers (which you do not see in microbenchmarks). Monomorphization largely takes this choice away from optimizers and also from the programmer, so I think this is a mistake.
That you can easily create generic and optimal code in C can be seen in the following toy example. This was not a hassle to write. https://godbolt.org/z/3Yv7oz59b
One can rightfully complain about the lack of type safety by using void pointers and the lack of control about inlining decisions. Both can be addressed using macros and non-standard extensions, and I agree that this is far from ideal in C. But I still think this is better than generics based on monomorphization. (what go does with shapes is interesting though)
(sorry, Oren, apparently I like talking to myself)
Martin,
Now try to do that across compilation units? And why did you need to do this specifically for postgres?
Oren,
you can do this across TU by moving a function to a header with "inline" (which allows but does not force inlining in C) or by using link-time optimization. In my code, I would tend to leave it in a single TU and manually create special functions that call (and inline) the generic ones. I then use the special functions to manually optimize specific cases in other TUs, but only where it matters.
I would have to look at what postgres did exactly to comment on it, but the patch also does not look too complicated. The question is whether it would have better performance if it used generics everywhere. Somehow I doubt it. The point is that you want to do this (after profiling) in the 10% of cases where inlining makes sense and to be able to avoid it in 90% where it does not. Generics force it in 100% cases, which I suspect is a bad trade-off overall. But maybe I am wrong.
Martin,
You seem to be claiming that there isn't a performance benefit to generics, because for very simple code, the compiler can inline the whole thing.But that is not how it actually works in real codebases. And I think that code bloat due to generics is something that we are mostly beyond us now. I remember that being a serious concern a while ago, but given the size / speed of modern machines, and compiler optimizations with sharing identical code (also LTO), I haven't seen that being a real concern recently.
On the other hand, having to do non trivial amount of work to get it working is a hurdle.
Interesting talk guys. To support Oren's reasoning: the possibility of sharing memory pages with executable code between processes is still there in OSes, and it was a cool thing in the past, but it's no longer that important - now the standard approach is to install every program with its own copy of all dependencies and dlls.
Orene,
Then i have another question - does RavenDB have such style of optimization for sorting (by runtime dispatching to a specialized version of a generic routine)? Or RavenDB optimizes full sorting in other way, for example by direct code generation via Expression API?
Garcha,
Sorting is something that we care about, yes. Here is a good example of how this is done:https://github.com/ravendb/ravendb/blob/450c995c4327dcead6e7cafa6283084b768ae23f/src/Sparrow/Sorter.cs#L10
We are using generics and forced inline of struct generics to generate optimal code, yes.
Externalizing the compare method might be beneficial. In fact C++, C# etc. do have templates/generics in which the compare is explicitly present. What if the object in case is quite complex and swaping objects in array sorting is very costly, involving assignment overloading, memory re-allocation etc. In this case the cost of a call is negligeable, compared to the flexibility it offers. For example, to avoid swaping within the sort, you might built an index (array of int), sort the index but compare the objects refered by the ints. In other words, it depends on the actual need you have.
Gheorghe,
It is fairly easy to do this without any overhead. You sort the indexes array based on the source array. You still benefit from the generics ability to generate specialized code here.
@Oren: One should not exaggerate the cost of a function call. In a Intel native context, the array will be in one data segment, an the compiler will optimize this, assuming DS register is already set. Pointers will be kept in two registers e.g. ESI and EDI (32 bits). With no calling optimization, this means two PUSH instructions and one ADD ESP, N after the call (if "C" calling is used) or a RET N in the procedure (which is faster, if "Pascal" calling is used), where N is 8 in case of 32 bits. If the cmp function is defined in the same allocation unit as main code, the compiler will implement the call as a NEAR one, i.e. pushing only the instruction pointer onto the stack. So the overhead will be: PUSH EDI
PUSH ESI
CALL NEAR PTR CmpFunc
ADD ESP, 8 / RETN 8 If count the machine cycles, this is pretty low. More over, since there are only two pointer parameters to be passed, the compiler will most likely further optimize this by using two registers instead of PUSH.
Gheorghe,
I didn't do the analysis in this manner. Instead, I ran the code and benchmarked it. I can tell you that there is a measurable difference between those options.
@Oren: I would benchmark C vs. C++ on a complex object that would require C++ overloading of operators <, >, == and !=. In such a case, results might be quite similar. A small typo in one of the examples: I have no idea about Hare, but in C the compare should be:
if (*(int *)a > *(int )b) return +1; else if ((int *) < *(int *)b) return -1; else return 0;
If people don't bother to create multiple versions of things like sorting algorithms in languages like C that don't have generics, that is a failure of the community, not the language. There are many such failures unrelated to generics - like the failure of the Java community to embrace micro service libraries that are actually micro, rather than using large monolithic frameworks like Spring and JPA that use a ton of memory.
I don't know why people think of features as a zero or one - either you have generics/oop/fp/... or you don't. It's never that simple, languages have degrees of these things. C has generic programming via void * - it could be described accurately as the lowest degree of generic programming, but it it still generic programming. You don't HAVE to have generics as a language feature.
And even when you do have generics as a language feature, there are lots of edge cases and complexity, that causes most programmers to not actually write their own generic apis. EG, Java generics have upper and lower bounds, you can't instantiate generic arrays with the new keyword, you have to use java.lang.Void to indicate a void generic return type, the reflection API for determining generic type definitions at runtime is very difficult to use, it is not always obvious when type inferencing cannot be used, some problems still require reflection, etc. I bet most Java devs don't even know you can determine generic type definitions at runtime, something all frameworks of any real complexity do.
C++ is dramatically worse than Java for complexity of generics, with template specializations, and type constraints that are off the charts hard to understand, never mind all that copy constructor and return optimization stuff. Try reading the implementation of something as simple as Optional - it is very long, over 1,000 lines of code, with densely packed combinations of language features only the top 1% of C++ devs can actually understand.
If generics wind up being so damn difficult to write correctly that most devs give up and just use what some other genius provided, then what's the point? If generics wind up with a whole bunch of specializations for primitive types, then what's the point?
If writing a number of functions that are essentially copies with minor changes is easier to read and maintain, then why not use that? Why not use a code generator with some templating engine - could be as simple as m4 - so that you only have to maintain one actual set of code? How is that not generic programming?
Greg,
You can have generics in C via
void*
, sure. But there are costs associated with that. There are ways around that (macros, code generation, etc). They are awkward, they dramatically increase the cost of a feature. That means that things aren't done because of the additional friction.Generics aren't _simple_, no. But you can get away with a lot without really digging into the deep end. FWIW, I consider the generics implementation for both Zig and C# to be excellent examples of solutions that work very well and have low cognitive load.
But
std::optional
is actually a great example. Leave aside the implementation, look at the usage for this feature. And that makes things a lot easier. The key here is that you get the appropriate optimizations without friction at usage time.Code copy & paste is a well known anti pattern. And code generation via templates is _possible_, sure. But consider why there isn't broad adoption to it?
In C# 1.0, there were a LOT of templates for type safe collections. All of them were chucked when we got generics, and good riddance for that.
In the same sense, you can say that you can commute to work by walking three hours in each direction. It is certainly within most people's capabilities, it is actually healthy to do so, etc. But the additional costs associated with that means it isn't done, and for good reasons.
Greg,
I think the point here is not about generic programming but about generic types. You focused on examples of problems with generics in particular languages but you didn't mention the fact that there are languages like C# where generics are actually very friendly and the static typing they enable in is priceless. In terms of performance it's especially important in C# where there is a distinction between value and reference types. Text-based templating is something I would very much avoid if possible because then you are working outside of the language so you basically give up the real-time help from a compiler and IDE.
Btw. in this rare situations when you would be forced to use code generation, in C# compiler Roslyn there is a feature called "Source generators" which integrates code generation into the compiler pipeline and plays nicely with IDE while templating is built-in in language in form of string interpolation.
Comment preview