snmalloc is an efficient multiprocessor memory allocator optimized for the asymmetric allocation / deallocation pattern found in pipelined programs.
It uses lock-free data structures, uses radix-trees to collect objects so that they can be sent in batches of 1MB by a curious multi-hop communication scheme, distinguish large (16MB+), medium (64kB+) and small objects.
Linked lists of free objects are added to input queues of other allocators which are checked on malloc/free calls. These objects are tagged with the destination allocator; if this is different from the allocator that receives them, it queues them to be sent to another allocator. It can take up to 7 hops to get to the true destination. (In practice, there is probably one allocator per thread and the number of threads is low enough that most objects just take one hop.)