About U++ heap: Memory allocator of U++ is I believe the most optimal possible algorithm&implementation (inspired by Boehm's GC). In fact, if I would have time, it would be worth to publish paper just about techniques used there
Just some highlights:
- Small-block fast allocation+dealocation path (used in majority of cases) has about 20+20 assembler instruction (plus synchronization in MT).
- There is less than one byte of management data overhead per small block. That also means that the smallest block size can be 4 bytes without problems (on 32-bit platform, on 64 it is 8 bytes). Actually, smaller blocks have lower overhead than larger ones, unlike classic allocators.
- Small-block fragmentation for real-world cases is limited by absolute value (I am not sure at the moment, but if I remember last tests well, averge maximum fragmentation limit is about 100KB).
Now USE_MALLOC is development macro that turns this high-efficient U++ heap off and uses regular malloc instead...