For some time I have been testing a different memory management system than that in the standard libraries of various C compilers. This started for two reasons. The first was because I wanted to find a system where the distribution of blocks on the free list over time matched the distribution of requests, rather than a distribution dependant upon the manager's internal organization. The second was more research-based where I wanted to implement a special version of realloc that would support some work I was doing with pre-emptive operating systems in dynamically expanding stack frames.
One way I could see to change the free list distribution was to make it independent of the memory structure, allowing the simple solution of placing returned blocks at the head of the list. Of course the code size was also a criterion. My early attempts showed a doubly linked free list was the basis of a solution. A search of the web eventually brought me to an article by Bruno Preiss (http://www.brpreiss.com) on memory pools where he describes a C++ class that did exactly what I wanted. This article describes a C implementation of this and an example port to the Atmel AVR range of microprocessors.
The following discussion is copied almost verbatim from Preiss' description of his DoublyLinkedPool class. I have edited where necessary to describe the changes to convert a C++ class to a C module with additional changes for use in small systems. An example is shown for the Atmel AVR processors and the avrgcc version of gcc.
Many memory managers for small microprocessors use a singly linked list to manage the free list. The list is kept sorted in order to facilitate the coalescing of adjacent free areas. The only way to determine whether a particular area is free is to see if it appears in the free list. Since a requirement is to examine adjacent areas to determine whether they can be combined, the free list is kept sorted so adjacent areas appear next to one another.
Unfortunately, since the free list must be kept sorted, every time an area is released it must be inserted into the list at the correct position. This means that the running time of the free operation is O(n) in the worst case, where n is the number of blocks in the storage pool.
We cannot tell by looking at an area whether it is reserved or free, so the solution is to record the allocation status of the area in the area itself. A secondary problem with the use of a singly linked free list is that, given a pointer to an area, we cannot extract that area from the free list without traversing the list. This is because in order to extract an element from a linked list, we need to know the predecessor of that element, and in a singly linked list we must search from the head of the list to find the predecessor. One solution is to use a doubly linked free list.
Finally, because we need to traverse the free list in order to coalesce adjacent free areas, and since we must traverse the free list in the malloc operation in order to find a free area of a suitable size, we shall do the coalescing in the malloc operation and not in the free operation. While this does increase slightly the running time for malloc, it means that free can run in constant time.
The storage pool is implemented as an array of Blocks. The structure of a Block is shown in Listing 1. A sequence of consecutive, contiguous blocks in the array constitutes an area. Only the first block in each area is used to keep track of the entire area.
//-----------------------------------------------------------------------------
// A memory block is made up of a header and either a user data part or a pair
// of pointers to the previous and next free area. A memory area is made up
// of one or more of these blocks. Only the first block will have the header.
// The header contains the number of blocks contained within the area.
//
typedef struct _Block
{
BlockSizeType blocks;
union
{
uint8 userPart[BLOCK_SIZE - sizeof(BlockSizeType)];
struct
{
struct _Block *prev;
struct _Block *next;
};
};
} Block;
Listing 1. Block Structure Layout.
We encode two pieces of information into the BlockSizeType block header variable called blocks - a single bit is used to indicate whether the area is reserved or free, and the remaining least significant bits are used to record the length of the area (in blocks). By packing this information into a single word we have limited the maximum sized block that can be allocated to 2(sizeof(BlockSizeType)-1) bytes. But for small embedded systems where BlockSizeType is typically a 16-bit integer, this should be more than adequate.
typedef uint16 BlockSizeType; // size of header
Free areas are linked in a doubly linked free list. Two pointers are required to accomplish this - prev and next. The effect of the extra pointer over a singly linked list is to increase the minimum block size. However, since the pointers occupy space in the pool that would otherwise be unused, this does not require any additional space.
The Block structure contains a union. The union overlays an instance of the link structure, which is used to hold pointers to the next and previous elements of the free list when the block is free with the userPart of the block which is the space given to the user when the block is reserved.
In this implementation the size of a block (in bytes) is required to be at least sizeof(BlockSizeType) + 2 * sizeof(Block*). On a 16-bit machine the block size is typically 8 or 16 bytes. For an 8-bit machine the minimum size is 8 bytes. A number of divisions in the code are implemented as shifts, so the block size must be a power of 2.
Since the storage pool starts out empty, the free list contains initially just a single entry that treats the entire pool as a one large free area.
The manager has the three standard public functions malloc, free and realloc and private support functions InitPool, MergeBlocks and SplitBlock. In addition, two more private functions Unlink and InsertAfter implement common linked-list manipulations.
//-----------------------------------------------------------------------------
// Compute the number of blocks that will fit in the memory area defined.
// Allocate the pool of blocks. Note this includes the sentinel area that is
// attached to the end and is always only one block. The first entry in the
// free list pool is set to include all available blocks. The sentinel is
// initialised to point back to the start of the pool.
//
static void InitPool(void)
{
Block* head;
char* heapStart = (char*)&__heap_start;
char* heapEnd = (char*)&__heap_end;
mPool = (Block*)heapStart;
mNumberOfBlocks = (uint16)(((heapEnd - (char*)mPool) >> SHIFT_VALUE) - 1);
mSentinel = mPool + mNumberOfBlocks;
mSentinel->blocks = RESERVED; // now cannot be used
mSentinel->prev = mSentinel;
mSentinel->next = mSentinel;
// Entire pool is initially a single unallocated area.
//
head = &mPool[0];
head->blocks = mNumberOfBlocks; // initially all of free memeory
InsertAfter(head); // link the sentinel
}
Listing 2. List Initialisation.
The initialisation function InitPool shown in Listing 2 must initialise the three static local variables mNumberOfBlocks, mPool and mSentinel. The pointer mPool is initialised to point to the beginning of the heap area. The beginning and end of this area are defined by the heapStart and heapEnd pointers, which are initialised from the gcc compiler constants __heap_start and __heap_end respectively. Most compilers have similar names and where they haven't, you can define them as simple constants. The global variables are declared as:
// Local global variables. // static Block* mPool; static Block* mSentinel; static uint16 mNumberOfBlocks;
The number of blocks is effectively set to N / sizeof(Block), where N is the desired size of the storage pool in bytes. The mPool variable can therefore be used as an array of Blocks with mNumberOfBlocks entries within the heap area.
The last block in the array is used as the sentinel for the doubly linked free list. The static variable mSentinel is initialised as a reference to this last block. Since the sentinel block is never to be allocated, it is marked as reserved. The next and prev pointers of the sentinel are both initialised to point at the sentinel itself. The free list is therefore initially empty.
The entire pool is initially a single unallocated area. We represent this area by the first block in the pool, mPool[0], and assign the head pointer to this location. The block is marked free (by default) and the blocks field is set to mNumberOfBlocks. The block is then inserted into the doubly linked free list by calling the private function InsertAfter. The InsertAfter function can do the insertion in constant time. The worst-case running time of InitPool is O(1).
Listing 3 gives the code for the free member function. This takes as its one argument a pointer to the user part of the memory area to be released.
//-----------------------------------------------------------------------------
// Return this memory block to the free list.
//
void free(void* pntr)
{
// Check for errors.
//
STATE_REG state = SaveState();
Block* top = mPool + mNumberOfBlocks;
Block* baseArea = TO_BLOCK_PTR(pntr); // convert to a block address
if ( (pntr == NULL) || (baseArea > mPool) || (baseArea >= top) )
{
return;
}
// Very simple - we insert the area to be freed at the start
// of the free list. This runs in constant time. Since the free
// list is not kept sorted, there is less of a tendency for small
// areas to accumulate at the head of the free list.
//
baseArea->blocks &= ~RESERVED; // mark as now free
InsertAfter(baseArea);
RestoreState(state);
}
Listing 3. Free Allocated Memory.
The implementation shown is both simple and fast. The free function makes no attempt to combine adjacent free areas. And since the free list is not sorted, it simply inserts the area to be freed at the front of the free list by inserting it into the free list immediately after the sentinel.
#define TO_BLOCK_PTR(p) (Block*)((BlockSizeType*)p - 1)
The TO_BLOCK_POINTER macro converts the supplied address to a Block pointer. After checking for the do-nothing case of a NULL pointer and a check that the pointer is at least somewhere within the heap, the block is marked free and inserted at the head of the free list by calling the local function InsertAfter. The macros are defined as:
// Working macros. // #define TO_BLOCK_PTR(p) (Block*)((BlockSizeType*)p - 1) #define BLOCKS_TO_BYTES(n) (((n & ~RESERVED) << SHIFT_VALUE) - sizeof(BlockSizeType))
Because the free list is a doubly linked list, the insertion can be done in constant time. Therefore, the worst-case running time of the free function is O(1).
The malloc function is shown in Listing 4. This takes a single integer argument that specifies the size in bytes of the area to be allocated. malloc returns a pointer to the storage if there is sufficient space left in the pool to accommodate the request. Otherwise it returns a NULL.
//-----------------------------------------------------------------------------
// Return a pointer to an area of memory that is at least the right size.
//
void* malloc(size_t size)
{
// Check for errors.
//
if (size < 0)
{
return NULL;
}
STATE_REG state = SaveState();
// Compute the number of blocks to satisfy the request.
//
uint16 reqBlocks = (size + sizeof(BlockSizeType) + sizeof(Block) - 1) >> SHIFT_VALUE;
Block* block;
// Initialise the heap for a first-time use.
//
if (mSentinel == NULL)
{
InitPool();
}
// Traverse the free list looking for the first block that will fit the
// request. This is a "first-fit" strategy.
//
for (block = mSentinel->next; block != mSentinel; block = block->next)
{
block = MergeBlocks(block);
// Is this free area (which could have just expanded somewhat)
// large enough to satisfy the request.
//
if (block->blocks >= reqBlocks)
{
break;
}
}
// If we are pointing at the sentinel then all blocks are allocated.
//
if (block == mSentinel)
{
RestoreState(state);
return NULL;
}
// If this free area is larger than required, it is split in two. The
// size of the first area is set to that required and the second area
// to the blocks remaining. The second area is then inserted into
// the free list.
//
if (block->blocks > reqBlocks)
{
SplitBlock(block, reqBlocks);
}
// Unlink the now correctly sized area from the free list and mark it
// as reserved.
//
Unlink(block);
block->blocks |= RESERVED;
RestoreState(state);
return block->userPart;
}
Listing 4. Allocating Memory.
The function must traverse the free list in order to find an area large enough to match the request. The allocation strategy chosen is first-fit, i.e. storage is allocated in the first area that is large enough.
Recall that the free function given in the preceding section does not combine adjacent free areas. Therefore it is the responsibility of the malloc function to do this. The algorithm is this: Adjacent free areas are recombined while the free list is traversed in search of a free area large enough to accommodate the allocation request.
The function begins by computing the number of blocks required using the integer formula
Blocks = (requestedbytes + sizeof(BlockSizeType) + sizeof(Block) - 1) / sizeof(Block)
i.e. we need enough space for the requested number of bytes plus a Header. Note that in the code we use a shift operation to save the necessity for division.
The malloc function then traverses the free list in search of an area that is large enough. As each area in the free list is visited, the area that immediately follows that area in memory is examined to see if it too is free and therefore if it can be combined with the given area (private function MergeBlocks shown in Listing 5).
//-----------------------------------------------------------------------------
// As each area is examined for a fit, we also examine the following area.
// If it is free then it must also be on the free list. Being a doubly-linked
// list, we can combine these two areas in constant time. If an area is
// combined, the procedure then looks again at the following area, thus
// repeatedly combining areas until a reserved area is found. In terminal
// cases this will be the sentinel block.
//
static Block* MergeBlocks(Block* block)
{
while (TRUE)
{
Block* successor = block + block->blocks; // point to next area
if (successor->blocks & RESERVED) // done if reserved
{
return block;
}
Unlink(successor);
block->blocks += successor->blocks; // add in its blocks
}
}
Listing 5. The MergeBlocks Function.
Since the size of an area is recorded in its header, we can easily find the area that follows it in memory as shown in Figure 1. And since the status of an area (free or reserved) is recorded in the area itself, we can determine whether the following area is free.

Figure 1. Double Link Structure.
If the area that follows a given area is indeed free, then it must in be the free list! And because the free list is doubly linked, we can easily extract that area from the free list in constant time. This is what the Unlink function does. When combining a given area with the area that follows it in memory, we obtain an area the size of which is equal to the sum of the sizes of the areas that were combined. After the combining it is possible that the area that follows the new larger area is also free. Therefore we need to repeat the combining process again. This is what the loop in MergeBlocks does - it keeps combining adjacent free areas until the area that follows is a reserved area.
The search for a free area to satisfy the malloc request terminates at the first area that is large enough. Notice that if the entire free list is traversed, all the adjacent areas that can be merged have been merged. If the free list is traversed and an area is not found that is large enough to satisfy the request, then the request cannot be satisfied because there is insufficient contiguous memory to do so. In this case the function returns a NULL pointer that the calling program should naturally always test for.
When the free area is larger than needed, it is split in two in the SplitBlock function shown in Listing 5.
//-----------------------------------------------------------------------------
//
static void SplitBlock(Block* block, uint16 reqBlocks)
{
Block* newBlock = block + reqBlocks; // create a remainder area
newBlock->blocks = block->blocks - reqBlocks; // set its size and mark as free
block->blocks = reqBlocks; // set us to requested size
InsertAfter(newBlock); // stitch remainder into free list
}
Listing 5. The SplitBlock Function.
The size of the first area is set to the number of blocks requested and the size of the second area equal to the number of blocks that remain. The second area is then inserted into the free list by calling the InsertAfter function.
In all cases, by the time execution reaches the Unlink call at the end of malloc, the free area is the correct size modulo the block size. The area is unlinked from the free list and marked reserved. Finally, a pointer to the userPart of the area is returned.
At first glance it might seem that the nested loops would result in a worst-case running time of O(n^2) where n is the number of blocks in the storage pool. However, the outer loop traverses the free list while every complete iteration of the inner loop removes at most one element from the free list. As a result, the worst-case running time for the nested loops is actually only O(n).
The nice thing about this approach is that the asymptotic running time of the malloc function would be O(n) even if it did not combine adjacent free areas. In particular, this asymptotic bound is the same as for a singly linked storage pool. On the other hand, the running time of the free function is O(1) for the doubly-linked pool which is certainly better than the O(n) worst-case running time of the singly-linked pool. Finally, since the free list is not kept sorted, there is not the same tendency for short areas to accumulate at the head of the free list as there is in the singly linked pool. In fact, over time the free list will take on a distribution that matches the requests.
The implementation of the standard function realloc to extend the size of an existing area without losing the current contents is shown in Listing 6.
//-----------------------------------------------------------------------------
// Re-allocate the buffer to a new area the requested size. If possible the
// existing area is simply expanded. Otherwise a new area is allocated and
// the current contents copied.
//
void* realloc(void* pntr, uint16 newSize)
{
// Check for errors.
//
if (pntr == NULL)
{
return malloc(newSize);
}
else if (newSize == 0)
{
free(pntr);
return NULL;
}
else
{
STATE_REG state = SaveState();
Block* block = TO_BLOCK_PTR(pntr); // convert user to block address
uint16 reqBlocks = (newSize + sizeof(BlockSizeType) + sizeof(Block) - 1) >> SHIFT_VALUE;
block->blocks &= ~RESERVED; // expose the size
// The fastest option is to merge this block with any free blocks
// that are contiguous with this block.
//
block = MergeBlocks(block);
if (block->blocks > reqBlocks)
{
// The merge produced a larger block than required, so split it
// into two blocks. This also takes care of the case where the
// new size is less than the old.
//
SplitBlock(block, reqBlocks);
}
else if (block->blocks < reqBlocks)
{
// Could not expand this block. Must attempt to allocate
// a new one the correct size and copy the current contents.
//
uint16 oldSize = BLOCKS_TO_BYTES(block->blocks);
if ((block = (Block*)malloc(newSize)))
{
// A new block large enough has been allocated. Copy the
// existing data and then discard the old block.
//
block = TO_BLOCK_PTR(block);
memcpy(block->userPart, (Block*)(TO_BLOCK_PTR(pntr))->userPart, oldSize);
free(pntr); // done with the old
}
else
{
// Cannot re-allocate this block. Note the old pointer
// is still valid.
//
RestoreState(state);
return NULL; // no valid options
}
}
block->blocks |= RESERVED;
RestoreState(state);
return block->userPart;
}
}
Listing 6. The Realloc function.
The standard defines that if the pointer is null the call is treated as a call to malloc, and if the size is zero it is treated as a call to free, with the pointer left as it was. After these tests it first computes the number of blocks required to satisfy the new requirement. It then checks if the immediately contiguous block(s) are free by calling MergeBlocks. What happens next depends on the result of this call. If a merge did occur and the block size now exceeds the requirement, the now too large block is split in two parts, with the unused part being put on the free list. If the block size is unchanged or is still less than the requirement a new block of the required size is requested via malloc. If this is successful the data in the old block is copied to the new area and the old area returned to the free list. Otherwise the request cannot be granted and the function returns a NULL.
A number of support functions provide standard doubly linked list operations.
//-----------------------------------------------------------------------------
//
static void InsertAfter(Block* block)
{
Block* p = mSentinel->next;
mSentinel->next = block;
block->prev = mSentinel;
block->next = p;
p->prev = block;
}
//-----------------------------------------------------------------------------
//
static void Unlink(Block* ptr)
{
ptr->prev->next = ptr->next;
ptr->next->prev = ptr->prev;
}
The current memory allocator in avr-gcc does not set aside the entire area above the bss area up to the bottom of the system stack area, but will expand up to this area if necessary. In my opinion, the heap may as well use all this area from the beginning, with the __heap_end constant defining the bottom of (and consequently the size of) the system stack area.
The __heap_start constant is configured by default by the compiler to begin immediately after the end of the bss area. The __heap_end constant is not set to anything by default, so it is necessary to define and include it in the link line. A typical makefile entry setting the top of heap for an ATMega128 at 0x107f would be:
LDFLAGS += -Wl,--defsym=__heap_end=0x80107f
This gives a system stack size of a very comfortable 128 bytes.
A look at the Map file for a typical project shows the flash usage for the original avr-gcc malloc/free/realloc is around 940 locations. The DoubleLinkPool version (using -Os) uses around 800 locations, so there is no price to pay.
The functions are, of course, not re-entrant. I have been using them with a number of preemptive operating systems and this explains the SaveState and RestoreState functions. These are inline calls to save and restore the interrupt bit on the respective processors, necessary to prevent a process switch while allocating memory. This will be adequate in many cases where interrupt response time is not critical.
However, one of the problems with memory allocators is their running time is indeterminate, and just turning off interrupts can be a problem in situations where interrupt response time is critical. In these cases a solution I have used is to protect the allocator by making it a serial resource protected by a semaphore. The semaphore ceiling priority is set to the highest possible setting, and no other process is allowed to use this priority. For example, if the highest RTOS priority is 10, then this is given to the allocator semaphore. The highest priority now available for other processes is 9. In a situation where you don't have full control of setting process priorities, assign the highest to the allocator as above, and reset the highest value available to processes to one less.
Any thread running in the allocator code can therefore be interrupted to restore rapid interrupt response times, but the allocator function itself cannot be interrupted by another process and will execute as fast as possible (I am assuming no interrupt routine uses the allocator!).
The DoubleLinkedPool heap memory manager described here does, in my experience, perform better than versions provided with a number of C compilers, particularly when under pressure. It continues to function, in a general sense, when a large percentage of the heap area is allocated and at high levels of fragmentation.
Here is a version of DoubleLinkPool for the AVR processors available for download. The public functions have the same signature as the library versions, so no specific header is required. The included Types header is my own ideosyncratic set of type names and can be easily altered. I have not included a test program as most testers will have their own set of torture tools for testing.