Filtering negative numbers, fast: Unroll

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB Workshops - Deep dive into practical use of Document Data Modeling

Sep 12 2023

Filtering negative numbers, fastUnroll

time to read 10 min | 1985 words

In the previous post, we looked into what it would take to reduce the cost of filtering negative numbers. We got into the assembly and analyzed exactly what was going on. In terms of this directly, I don’t think that even hand-optimized assembly would take us further. Let’s see if there are other options that are available for us to get better speed.

The first thing that pops to mind here is to do a loop unrolling. After all, we have a very tight loop, if we can do more work per loop iteration, we might get better performance, no? Here is my first version:

And here are the benchmark results:

Method	N	Mean	Error	StdDev	Ratio	Code Size
FilterCmp	23	274.6 ns	0.40 ns	0.35 ns	1.00	411 B
FilterCmp_Unroll	23	257.5 ns	0.94 ns	0.83 ns	0.94	606 B

FilterCmp	1047	748.1 ns	2.91 ns	2.58 ns	1.00	411 B
FilterCmp_Unroll	1047	702.5 ns	5.23 ns	4.89 ns	0.94	606 B

FilterCmp	1048599	501,545.2 ns	4,985.42 ns	4,419.45 ns	1.00	411 B
FilterCmp_Unroll	1048599	446,311.1 ns	3,131.42 ns	2,929.14 ns	0.89	606 B

FilterCmp	33554455	29,637,052.2 ns	184,796.17 ns	163,817.00 ns	1.00	411 B
FilterCmp_Unroll	33554455	29,275,060.6 ns	145,756.53 ns	121,713.31 ns	0.99	606 B

That is quite a jump, 6% – 11% savings is no joke. Let’s look at what is actually going on at the assembly level and see if we can optimize this further.

As expected, the code size is bigger, 264 bytes versus the 55 we previously got. But more importantly, we got the range check back, and a lot of them:

The JIT isn’t able to reason about our first for loop and see that all our accesses are within bounds, which leads to doing a lot of range checks, and likely slows us down. Even with that, we are still showing significant improvements here.

Let’s see what we can do with this:

With that, we expect to have no range checks and still be able to benefit from the unrolling.

Method	N	Mean	Error	StdDev	Ratio	RatioSD	Code Size
FilterCmp	23	275.4 ns	2.31 ns	2.05 ns	1.00	0.00	411 B
FilterCmp_Unroll	23	253.6 ns	2.59 ns	2.42 ns	0.92	0.01	563 B

FilterCmp	1047	741.6 ns	5.95 ns	5.28 ns	1.00	0.00	411 B
FilterCmp_Unroll	1047	665.5 ns	2.38 ns	2.22 ns	0.90	0.01	563 B

FilterCmp	1048599	497,624.9 ns	3,904.39 ns	3,652.17 ns	1.00	0.00	411 B
FilterCmp_Unroll	1048599	444,489.0 ns	2,524.45 ns	2,361.38 ns	0.89	0.01	563 B

FilterCmp	33554455	29,781,164.3 ns	361,625.63 ns	320,571.70 ns	1.00	0.00	411 B
FilterCmp_Unroll	33554455	29,954,093.9 ns	588,614.32 ns	916,401.59 ns	1.01	0.04	563 B

That helped, by quite a lot, it seems, for most cases, the 32M items case, however, was slightly slower, which is quite a surprise.

Looking at the assembly, I can see that we still have branches, like so:

And here is why this is the case:

Now, can we do better here? It turns out that we can, by using a branchless version of the operation. Here is another way to write the same thing:

What happens here is that we are unconditionally setting the value in the array, but only increment if the value is greater than or equal to zero. That saves us in branches and will likely result in less code. In fact, let’s see what sort of assembly the JIT will output:

What about the performance? I decided to pit the two versions (normal and branchless) head to head and see what this will give us:

Method	N	Mean	Error	StdDev	Ratio	Code Size
FilterCmp_Unroll	23	276.3 ns	4.13 ns	3.86 ns	1.00	411 B
FilterCmp_Unroll_Branchleses	23	263.6 ns	0.95 ns	0.84 ns	0.96	547 B

FilterCmp_Unroll	1047	743.7 ns	9.41 ns	8.80 ns	1.00	411 B
FilterCmp_Unroll_Branchleses	1047	733.3 ns	3.54 ns	3.31 ns	0.99	547 B

FilterCmp_Unroll	1048599	502,631.1 ns	3,641.47 ns	3,406.23 ns	1.00	411 B
FilterCmp_Unroll_Branchleses	1048599	495,590.9 ns	335.33 ns	297.26 ns	0.99	547 B

FilterCmp_Unroll	33554455	29,356,331.7 ns	207,133.86 ns	172,966.15 ns	1.00	411 B
FilterCmp_Unroll_Branchleses	33554455	29,709,835.1 ns	86,129.58 ns	71,922.10 ns	1.01	547 B

Surprisingly enough, it looks like the branchless version is very slightly slower. That is a surprise, since I would expect reducing the branches to be more efficient.

Looking at the assembly of those two, the branchless version is slightly bigger (10 bytes, not that meaningful). I think that the key here is that there is a 0.5% chance of actually hitting the branch, which is pretty low. That means that the branch predictor can likely do a really good job and we aren’t going to see any big benefits from the branchless version.

That said… what would happen if we tested that with 5% negatives? That difference in behavior may cause us to see a different result. I tried that, and the results were quite surprising. In the case of the 1K and 32M items, we see a slightl cost for the branchless version (additional 1% – 4%) while for the 1M entries there is an 18% reduction in latency for the branchless version.

I ran the tests again with a 15% change of negative, to see what would happen. In that case, we get:

Method	N	Mean	Error	StdDev	Ratio	RatioSD	Code Size
FilterCmp_Unroll	23	273.5 ns	3.66 ns	3.42 ns	1.00	0.00	537 B
FilterCmp_Unroll_Branchleses	23	280.2 ns	4.85 ns	4.30 ns	1.03	0.02	547 B

FilterCmp_Unroll	1047	1,675.7 ns	29.55 ns	27.64 ns	1.00	0.00	537 B
FilterCmp_Unroll_Branchleses	1047	1,676.3 ns	16.97 ns	14.17 ns	1.00	0.02	547 B

FilterCmp_Unroll	1048599	2,206,354.4 ns	6,141.19 ns	5,444.01 ns	1.00	0.00	537 B
FilterCmp_Unroll_Branchleses	1048599	1,688,677.3 ns	11,584.00 ns	10,835.68 ns	0.77	0.01	547 B

FilterCmp_Unroll	33554455	205,320,736.1 ns	2,757,108.01 ns	2,152,568.58 ns	1.00	0.00	537 B
FilterCmp_Unroll_Branchleses	33554455	199,520,169.4 ns	2,097,285.87 ns	1,637,422.86 ns	0.97	0.01	547 B

As you can see, we have basically the same cost under 15% negatives for small values, a big improvement on the 1M scenario and not much improvement on the 32M scenario.

All in all, that is very interesting information. Digging into the exact why and how of that means pulling a CPU instruction profiler and starting to look at where we have stalls, which is a bit further that I want to invest in this scenario.

What if we’ll try to rearrange the code a little bit. The code looks like this (load the value and AddToOutput() immediately):

AddToOutput(ref itemsRef, Unsafe.Add(ref itemsRef, i + 0));

What if we split it a little bit, so the code will look like this:

The idea here is that we are trying to get the JIT / CPU to fetch the items before they are actually needed, so there would be more time for the memory to arrive.

Remember that for the 1M scenario, we are dealing with 8MB of memory and for the 32M scenario, we have 256MB. Here is what happens when we look at the loop prolog, we can see that it is indeed first fetching all the items from memory, then doing the work:

In terms of performance, that gives us a small win (1% – 2% range) for the 1M and 32M entries scenario.

The one last thing that I wanted to test is if we’ll unroll the loop even further, what would happen if we did 8 items per loop, instead of 4.

There is some improvement, (4% in the 1K scenario, 1% in the 32M scenario) but also slowdowns (2% in the 1M scenario).

I think that this is probably roughly the end of the line as far as we can get for scalar code.

We already made quite a few strides in trying to parallelize the work the CPU is doing by just laying out the code as we would like it to be. We tried to control the manner in which it touches memory and in general, those are pretty advanced techniques.

To close this post, I would like to take a look at the gains we got. I’m comparing the first version of the code, the last version we had on the previous post and the unrolled version for both branchy and branchless with 8 operations at once and memory prefetching.

Method	N	Mean	Error	StdDev	Ratio	RatioSD	Code Size
FilterCmp	23	277.3 ns	0.69 ns	0.64 ns	1.00	0.00	411 B
FilterCmp_NoRangeCheck	23	270.7 ns	0.42 ns	0.38 ns	0.98	0.00	397 B
FilterCmp_Unroll_8	23	257.6 ns	1.45 ns	1.21 ns	0.93	0.00	672 B
FilterCmp_Unroll_8_Branchless	23	259.9 ns	1.96 ns	1.84 ns	0.94	0.01	682 B

FilterCmp	1047	754.3 ns	1.38 ns	1.22 ns	1.00	0.00	411 B
FilterCmp_NoRangeCheck	1047	749.0 ns	1.81 ns	1.69 ns	0.99	0.00	397 B
FilterCmp_Unroll_8	1047	647.2 ns	2.23 ns	2.09 ns	0.86	0.00	672 B
FilterCmp_Unroll_8_Branchless	1047	721.2 ns	1.23 ns	1.09 ns	0.96	0.00	682 B

FilterCmp	1048599	499,675.6 ns	2,639.97 ns	2,469.43 ns	1.00	0.00	411 B
FilterCmp_NoRangeCheck	1048599	494,388.4 ns	600.46 ns	532.29 ns	0.99	0.01	397 B
FilterCmp_Unroll_8	1048599	426,940.7 ns	1,858.57 ns	1,551.99 ns	0.85	0.01	672 B
FilterCmp_Unroll_8_Branchless	1048599	483,940.8 ns	517.14 ns	458.43 ns	0.97	0.00	682 B

FilterCmp	33554455	30,282,334.8 ns	599,306.15 ns	531,269.30 ns	1.00	0.00	411 B
FilterCmp_NoRangeCheck	33554455	29,410,612.5 ns	29,583.56 ns	24,703.61 ns	0.97	0.02	397 B
FilterCmp_Unroll_8	33554455	29,102,708.3 ns	42,824.78 ns	40,058.32 ns	0.96	0.02	672 B
FilterCmp_Unroll_8_Branchless	33554455	29,761,841.1 ns	48,108.03 ns	42,646.51 ns	0.98	0.02	682 B

The unrolled 8 version is the winner by far, in this scenario (0.5% negatives). Since that is the scenario we have in the real code, that is what I’m focusing on.

Is there anything left to do here?

My next step is to explore whether using vector instructions will be a good option for us.

Tweet Share Share 6 comments

Tags:

Comments

12 Sep 2023
23:25 PM

Catalin Pop

I'm not an expert in SIMD code but I took this as a fun challenge, and it wasn't that hard to write. Could still benefit from some optimizations but it uses Avx2 to copy 4xlong at once which is beneficial for sparse arrays.

public unsafe static int FilterCmp_Avx256(Span<long> items)
{
    if (items.Length <= 0)
        return 0;
        int outputIdx = 0;
        int vectorSize = Vector256<long>.Count;
        int i = 0;
        ref var output = ref items[0];
        
        for (; i + 3 < items.Length; i += 4)
        {
            Vector256<long> inputVector = Vector256.LoadUnsafe(ref items[i]);
        
            Vector256<long> mask = Vector256.LessThan(inputVector, Vector256<long>.Zero);
            var iMask = Avx.MoveMask(mask.AsDouble());

            if (iMask == 0) // all elements need copying
            {
                inputVector.CopyTo(items[outputIdx..]);
                outputIdx += vectorSize;
            }
            else
            {
                var i0 = items[i + 0];
                var i1 = items[i + 1];
                var i2 = items[i + 2];
                var i3 = items[i + 3];

                AddToOutput(ref output, i0);
                AddToOutput(ref output, i1);
                AddToOutput(ref output, i2);
                AddToOutput(ref output, i3);
            }
        }
        
        for (; i < items.Length; i++)
        {
            AddToOutput(ref output, items[i]);
        }
        
        return outputIdx;

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        void AddToOutput(ref long output, long input)
        {
            if (input < 0)
                return;
            Unsafe.Add(ref output, outputIdx++) = input;
        }
}

13 Sep 2023
06:33 AM

Catalin Pop

Here are my results (I was too late in the night to wait for one more benchmark run and post them yesterday). There doesn't seem to be that big of a difference for large arrays, maybe the next step would be to try aligned memory access, which is definitely faster on most hardware.

| Method              | N        | Mean            | Error         | StdDev        | Median          | Ratio | RatioSD | Code Size |
|-------------------- |--------- |----------------:|--------------:|--------------:|----------------:|------:|--------:|----------:|
| Filter_NoRangeCheck | 1047     |        627.9 ns |      12.42 ns |      20.05 ns |        619.0 ns |  1.00 |    0.00 |     132 B |
| Filter_Unroll       | 1047     |        667.9 ns |      13.38 ns |      38.40 ns |        652.0 ns |  1.09 |    0.07 |     323 B |
| Filter_Avx256       | 1047     |        499.9 ns |       9.43 ns |      10.48 ns |        500.5 ns |  0.79 |    0.03 |     454 B |
|                     |          |                 |               |               |                 |       |         |           |
| Filter_NoRangeCheck | 1048599  |    695,466.9 ns |   8,839.76 ns |   9,825.37 ns |    692,224.5 ns |  1.00 |    0.00 |     132 B |
| Filter_Unroll       | 1048599  |    723,640.2 ns |  13,598.36 ns |  12,719.92 ns |    721,777.8 ns |  1.04 |    0.03 |     323 B |
| Filter_Avx256       | 1048599  |    634,292.6 ns |  12,651.26 ns |  30,067.05 ns |    621,601.1 ns |  0.93 |    0.05 |     454 B |
|                     |          |                 |               |               |                 |       |         |           |
| Filter_NoRangeCheck | 33554455 | 24,583,319.1 ns | 482,678.28 ns | 918,345.60 ns | 24,441,831.2 ns |  1.00 |    0.00 |     132 B |
| Filter_Unroll       | 33554455 | 24,683,017.9 ns | 387,303.13 ns | 414,410.01 ns | 24,539,785.9 ns |  0.98 |    0.04 |     323 B |
| Filter_Avx256       | 33554455 | 23,156,429.8 ns | 459,420.61 ns | 728,689.77 ns | 22,986,559.4 ns |  0.94 |    0.05 |     454 B |

13 Sep 2023
06:58 AM

Catalin Pop

And my final version after some optimizations like using GreaterThanOrEqualAll instead of bit mask, AVX store instead of copy, and some pointer arithmetic:

        public unsafe static int FilterCmp_Avx256(Span<long> items)
        {
            if (items.Length <= 0)
                return 0;
            int outputIdx = 0;
            int vectorSize = Vector256<long>.Count;
            int i = 0;
            ref var output = ref items[0];
            fixed (long* spanPtr = items) {
                for (; i + 3 < items.Length; i += 4)
                {
                    Vector256<long> inputVector = Vector256.Load(spanPtr + i);
                    var allPositive = Vector256.GreaterThanOrEqualAll(inputVector, Vector256<long>.Zero);
                    if (allPositive) // all elements need copying
                    {
                        inputVector.Store(spanPtr + outputIdx);
                        outputIdx += vectorSize;
                    }
                    else
                    {
                        var i0 = items[i + 0];
                        var i1 = items[i + 1];
                        var i2 = items[i + 2];
                        var i3 = items[i + 3];

                        AddToOutput(ref output, i0);
                        AddToOutput(ref output, i1);
                        AddToOutput(ref output, i2);
                        AddToOutput(ref output, i3);
                    }
                }
            }

            for (; i < items.Length; i++)
            {
                AddToOutput(ref output, items[i]);
            }

            return outputIdx;

            [MethodImpl(MethodImplOptions.AggressiveInlining)]
            void AddToOutput(ref long output, long input)
            {
                if (input < 0)
                    return;
                Unsafe.Add(ref output, outputIdx++) = input;
            }
        }

and the results:

| Method              | N        | Mean            | Error         | StdDev        | Ratio | RatioSD | Code Size |
|-------------------- |--------- |----------------:|--------------:|--------------:|------:|--------:|----------:|
| Filter_NoRangeCheck | 1047     |        643.9 ns |      12.90 ns |      22.24 ns |  1.00 |    0.00 |     132 B |
| Filter_Unroll       | 1047     |        642.9 ns |       9.78 ns |       8.67 ns |  1.01 |    0.03 |     323 B |
| Filter_Avx256       | 1047     |        320.0 ns |       4.61 ns |       4.08 ns |  0.50 |    0.02 |     436 B |
|                     |          |                 |               |               |       |         |           |
| Filter_NoRangeCheck | 1048599  |    710,245.3 ns |  14,199.63 ns |  20,813.63 ns |  1.00 |    0.00 |     132 B |
| Filter_Unroll       | 1048599  |    729,553.3 ns |  14,256.92 ns |  14,640.81 ns |  1.02 |    0.05 |     323 B |
| Filter_Avx256       | 1048599  |    550,674.0 ns |  11,013.52 ns |  20,414.27 ns |  0.77 |    0.04 |     436 B |
|                     |          |                 |               |               |       |         |           |
| Filter_NoRangeCheck | 33554455 | 24,136,476.8 ns | 477,273.06 ns | 468,745.96 ns |  1.00 |    0.00 |     132 B |
| Filter_Unroll       | 33554455 | 24,233,517.5 ns | 241,751.39 ns | 226,134.40 ns |  1.00 |    0.02 |     323 B |
| Filter_Avx256       | 33554455 | 21,832,882.6 ns | 398,029.35 ns | 352,842.65 ns |  0.90 |    0.02 |     436 B |

13 Sep 2023
19:33 PM

Oren Eini

Catalin,

Interesting first draft, because you are optimizing (correctly) for the case where they are equal. And the rest is handled using the Note however, that you'll do a lot of range checks with this code.

13 Sep 2023
19:34 PM

Oren Eini

Catalin,

About alignment, that isn't actually true for x64 and most modern ARM systems.There may be negligible performance difference, but not much. What you are observing here is that you are hitting the memory bandwidth limitations on your machine.

13 Sep 2023
19:35 PM

Oren Eini

Catalin,

Interesting, that is almost exactly where I started with :-)Note that you can skip a step here by just getting MoveMask(), since you'll get the right values from the sign bit.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Filtering negative numbers, fastUnroll

More posts in "Filtering negative numbers, fast" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Filtering negative numbers, fast" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication