Struct memory layout and memory optimizations

time to read 35 min | 6841 words

Consider a warehouse that needs to keep track of items. For the purpose of discussion, we have quite a few fields that we need to keep track of. Here is how this looks like in code:

public struct WarehouseItem
{
public Dimensions? ProductDimensions;
public long? ExternalSku;
public TimeSpan? ShelfLife;
public float? AlcoholContent;
public DateTime? ProductionDate;
public int? RgbColor;
public bool? IsHazardous;
public float? Weight;
public int? Quantity;
public DateTime? ArrivalDate;
public bool? Fragile;
public DateTime? LastStockCheckDate;
public struct Dimensions
{
public float Length;
public float Width;
public float Height;
}
}

And the actual Warehouse class looks like this:

public class Warehouse
{
private List<WarehouseItem> _items= new ();
public int Add(WarehouseItem item);
public WarehouseItem Get(int itemId);
}
view raw Warehouse.cs hosted with ❤ by GitHub

The idea is that this is simply a wrapper to the list of items. We use a struct to make sure that we have good locality, etc.

The question is, what is the cost of this? Let’s say that we have a million items in the warehouse. That would be over 137MB of memory. In fact, a single struct instance is going to consume a total of 144 bytes.

That is… a big struct, I have to admit. Using ObjectLayoutInspector I was able to get the details on what exactly is going on:

Type layout for 'WarehouseItem'
    Size: 144 bytes. Paddings: 62 bytes (%43 of empty space)
    07:Int64ticks8bytes07:UInt64dateData8bytes07:UInt64dateData8bytes07:UInt64dateData8bytes015:Nullable`1ProductDimensions16bytes0:BooleanhasValue1byte13:padding3bytes415:Dimensionsvalue12bytes03:SingleLength4bytes47:SingleWidth4bytes811:SingleHeight4bytes1631:Nullable`1ExternalSku16bytes0:BooleanhasValue1byte17:padding7bytes815:Int64value8bytes3247:Nullable`1ShelfLife16bytes0:BooleanhasValue1byte17:padding7bytes815:TimeSpanvalue8bytes4855:Nullable`1AlcoholContent8bytes0:BooleanhasValue1byte13:padding3bytes47:Singlevalue4bytes5671:Nullable`1ProductionDate16bytes0:BooleanhasValue1byte17:padding7bytes815:DateTimevalue8bytes7279:Nullable`1RgbColor8bytes0:BooleanhasValue1byte13:padding3bytes47:Int32value4bytes8081:Nullable`1IsHazardous2bytes0:BooleanhasValue1byte1:Booleanvalue1byte8283:padding2bytes8491:Nullable`1Weight8bytes0:BooleanhasValue1byte13:padding3bytes47:Singlevalue4bytes9299:Nullable`1Quantity8bytes0:BooleanhasValue1byte13:padding3bytes47:Int32value4bytes100103:padding4bytes104119:Nullable`1ArrivalDate16bytes0:BooleanhasValue1byte17:padding7bytes815:DateTimevalue8bytes120121:Nullable`1Fragile2bytes0:BooleanhasValue1byte1:Booleanvalue1byte122127:padding6bytes128143:Nullable`1LastStockCheckDate16bytes0:BooleanhasValue1byte17:padding7bytes815:DateTimevalue8bytes

As you can see, there is a huge amount of wasted space here. Most of which is because of the nullability. That injects an additional byte, and padding and layout issues really explode the size of the struct.

Here is an alternative layout, which conveys the same information, much more compactly. The idea is that instead of having a full byte for each nullable field (with the impact on padding, etc), we’ll have a single bitmap for all nullable fields. Here is how this looks like:

public struct WarehouseItem
{
public Dimensions ProductDimensions;
public bool HasProductDimensions => (_nullability & (1 << 0)) != 0;
public long ExternalSku;
public bool HasExternalSku => (_nullability & (1 << 1)) != 0;
public TimeSpan ShelfLife;
public bool HasShelfLife => (_nullability & (1 << 2)) != 0;
public float AlcoholContent;
public bool HasAlcoholContent => (_nullability & (1 << 3)) != 0;
public DateTime ProductionDate;
public bool HasProductionDate => (_nullability & (1 << 4)) != 0;
public int RgbColor;
public bool HasRgbColor => (_nullability & (1 << 5)) != 0;
public bool IsHazardous;
public bool HasIsHazardous => (_nullability & (1 << 6)) != 0;
public float Weight;
public bool HasWeight => (_nullability & (1 << 7)) != 0;
public int Quantity;
public bool HasQuantity => (_nullability & (1 << 8)) != 0;
public DateTime ArrivalDate;
public bool HasArrivalDate => (_nullability & (1 << 9)) != 0;
public bool Fragile;
public bool HasFragile => (_nullability & (1 << 10)) != 0;
public DateTime LastStockCheckDate;
public bool HasLastStockCheckDate => (_nullability & (1 << 11)) != 0;
private ushort _nullability;
public struct Dimensions
{
public float Length;
public float Width;
public float Height;
}
}
view raw Smaller.cs hosted with ❤ by GitHub

If we look deeper into this, we’ll see that this saved a lot, the struct size is now 96 bytes in size. It’s a massive space-savings, but…

Type layout for 'WarehouseItem'
Size: 96 bytes. Paddings: 24 bytes (%25 of empty space)

We still have a lot of wasted space. This is because we haven’t organized the struct to eliminate padding. Let’s reorganize the structs fields to see what we can achieve. The only change I did was re-arrange the fields, and we have:

public struct WarehouseItem
{
public Dimensions ProductDimensions;
public float AlcoholContent;
public long ExternalSku;
public TimeSpan ShelfLife;
public DateTime ProductionDate;
public DateTime ArrivalDate;
public DateTime LastStockCheckDate;
public float Weight;
public int Quantity;
public int RgbColor;
public bool Fragile;
public bool IsHazardous;
private ushort _nullability;
public bool HasProductDimensions => (_nullability & (1 << 0)) != 0;
public bool HasExternalSku => (_nullability & (1 << 1)) != 0;
public bool HasShelfLife => (_nullability & (1 << 2)) != 0;
public bool HasAlcoholContent => (_nullability & (1 << 3)) != 0;
public bool HasProductionDate => (_nullability & (1 << 4)) != 0;
public bool HasRgbColor => (_nullability & (1 << 5)) != 0;
public bool HasIsHazardous => (_nullability & (1 << 6)) != 0;
public bool HasWeight => (_nullability & (1 << 7)) != 0;
public bool HasQuantity => (_nullability & (1 << 8)) != 0;
public bool HasArrivalDate => (_nullability & (1 << 9)) != 0;
public bool HasFragile => (_nullability & (1 << 10)) != 0;
public bool HasLastStockCheckDate => (_nullability & (1 << 11)) != 0;
public struct Dimensions
{
public float Length;
public float Width;
public float Height;
}
}
view raw Smallest.cs hosted with ❤ by GitHub

And the struct layout is now:

Typelayoutfor'WarehouseItem'Size:72bytes.Paddings:0bytes%0ofemptyspace011:DimensionsProductDimensions12bytes03:SingleLength4bytes47:SingleWidth4bytes811:SingleHeight4bytes1215:SingleAlcoholContent4bytes1623:Int64ExternalSku8bytes2431:TimeSpanShelfLife8bytes3239:DateTimeProductionDate8bytes4047:DateTimeArrivalDate8bytes4855:DateTimeLastStockCheckDate8bytes5659:SingleWeight4bytes6063:Int32Quantity4bytes6467:Int32RgbColor4bytes68:BooleanFragile1byte69:BooleanIsHazardous1byte7071:UInt16nullability2bytes

We have no wasted space, and we are 50% of the previous size.

We can actually do better, note that Fragile and IsHazarous are Booleans, and we have some free bits on _nullability that we can repurpose.

For that matter, RgbColor only needs 24 bits, not 32. Do we need alcohol content to be a float, or can we use a byte? If that is the case, can we shove both of them together into the same 4 bytes?

For dates, can we use DateOnly instead of DateTime? What about ShelfLife, can we measure that in hours and use a short for that (giving us a maximum of 7 years)?

After all of that, we end up with the following structure:

public struct WarehouseItem
{
public Dimensions ProductDimensions;
public float Weight;
public long ExternalSku;
public DateOnly ProductionDate;
public DateOnly ArrivalDate;
public DateOnly LastStockCheckDate;
public int Quantity;
private int _rgbColorAndAlcoholContentBacking;
private ushort _nullability;
public ushort ShelfLifeInHours;
public float AlcoholContent => (float)(byte)_rgbColorAndAlcoholContentBacking;
public int RgbColor => _rgbColorAndAlcoholContentBacking >> 8;
public bool Fragile => (_nullability & (1 << 12)) != 0;
public bool IsHazardous => (_nullability & (1 << 13)) != 0;
public bool HasProductDimensions => (_nullability & (1 << 0)) != 0;
public bool HasExternalSku => (_nullability & (1 << 1)) != 0;
public bool HasShelfLife => (_nullability & (1 << 2)) != 0;
public bool HasAlcoholContent => (_nullability & (1 << 3)) != 0;
public bool HasProductionDate => (_nullability & (1 << 4)) != 0;
public bool HasRgbColor => (_nullability & (1 << 5)) != 0;
public bool HasIsHazardous => (_nullability & (1 << 6)) != 0;
public bool HasWeight => (_nullability & (1 << 7)) != 0;
public bool HasQuantity => (_nullability & (1 << 8)) != 0;
public bool HasArrivalDate => (_nullability & (1 << 9)) != 0;
public bool HasFragile => (_nullability & (1 << 10)) != 0;
public bool HasLastStockCheckDate => (_nullability & (1 << 11)) != 0;
public struct Dimensions
{
public float Length;
public float Width;
public float Height;
}
}
view raw Packed.cs hosted with ❤ by GitHub

And with the following layout:

03:Int32dayNumber4bytes03:Int32dayNumber4bytes03:Int32dayNumber4bytesTypelayoutfor'WarehouseItem'Size:48bytes.Paddings:0bytes%0ofemptyspace011:DimensionsProductDimensions12bytes03:SingleLength4bytes47:SingleWidth4bytes811:SingleHeight4bytes1215:SingleWeight4bytes1623:Int64ExternalSku8bytes2427:DateOnlyProductionDate4bytes2831:DateOnlyArrivalDate4bytes3235:DateOnlyLastStockCheckDate4bytes3639:Int32Quantity4bytes4043:Int32rgbColorAndAlcoholContentBacking4bytes4445:UInt16nullability2bytes4647:UInt16ShelfLifeInHours2bytes

In other words, we are now packing everything into  48 bytes, which means that we are one-third of the initial cost. Still representing the same data. Our previous Warehouse class? It used to take 137MB for a million items, it would now take 45.7 MB only.

In RavenDB’s case, we had the following:

That is the backing store of the dictionary, and as you can see, it isn’t a nice one. Using similar techniques we are able to massively reduce the amount of storage that is required to process indexing.

Here is what this same scenario looks like now:

But we aren’t done yet , there is still more that we can do.