A Doubly Linked Memory Manager

by Ron Kreymborg
8-Nov-2005

Introduction

For some time I have been testing a different memory management system than that in the standard libraries of various C compilers. This started for two reasons. The first was because I wanted to find a system where the distribution of blocks on the free list over time matched the distribution of requests, rather than a distribution dependant upon the manager's internal organization. The second was more research-based where I wanted to implement a special version of realloc that would support some work I was doing with pre-emptive operating systems in dynamically expanding stack frames.

One way I could see to change the free list distribution was to make it independent of the memory structure, allowing the simple solution of placing returned blocks at the head of the list. Of course the code size was also a criterion. My early attempts showed a doubly linked free list was the basis of a solution. A search of the web eventually brought me to an article by Bruno Preiss (http://www.brpreiss.com) on memory pools where he describes a C++ class that did exactly what I wanted. This article describes a C implementation of this and an example port to the Atmel AVR range of microprocessors.

The Doubly Linked List

The following discussion is copied almost verbatim from Preiss' description of his DoublyLinkedPool class. I have edited where necessary to describe the changes to convert a C++ class to a C module with additional changes for use in small systems. An example is shown for the Atmel AVR processors and the avrgcc version of gcc.

Many memory managers for small microprocessors use a singly linked list to manage the free list. The list is kept sorted in order to facilitate the coalescing of adjacent free areas. The only way to determine whether a particular area is free is to see if it appears in the free list. Since a requirement is to examine adjacent areas to determine whether they can be combined, the free list is kept sorted so adjacent areas appear next to one another.

Unfortunately, since the free list must be kept sorted, every time an area is released it must be inserted into the list at the correct position. This means that the running time of the free operation is O(n) in the worst case, where n is the number of blocks in the storage pool.

We cannot tell by looking at an area whether it is reserved or free, so the solution is to record the allocation status of the area in the area itself. A secondary problem with the use of a singly linked free list is that, given a pointer to an area, we cannot extract that area from the free list without traversing the list. This is because in order to extract an element from a linked list, we need to know the predecessor of that element, and in a singly linked list we must search from the head of the list to find the predecessor. One solution is to use a doubly linked free list.

Finally, because we need to traverse the free list in order to coalesce adjacent free areas, and since we must traverse the free list in the malloc operation in order to find a free area of a suitable size, we shall do the coalescing in the malloc operation and not in the free operation. While this does increase slightly the running time for malloc, it means that free can run in constant time.

The storage pool is implemented as an array of Blocks. The structure of a Block is shown in Listing 1. A sequence of consecutive, contiguous blocks in the array constitutes an area. Only the first block in each area is used to keep track of the entire area.

1 //-----------------------------------------------------------------------------
2 // A memory block is made up of a header and either a user data part or a pair
3 // of pointers to the previous and next free area. A memory area is made up
4 // of one or more of these blocks. Only the first block will have the header.
5 // The header contains the number of blocks contained within the area.
6 //
7 typedef struct _Block
8 {
9     BlockSizeType blocks;
10     union
11     {
12         uint8 userPart[BLOCK_SIZE - sizeof(BlockSizeType)];
13         struct 
14         {
15             struct _Block *prev;
16             struct _Block *next;
17         };
18     };
19 } Block;

Listing 1. Block Structure Layout.

We encode two pieces of information into the BlockSizeType block header variable called blocks - a single bit is used to indicate whether the area is reserved or free, and the remaining least significant bits are used to record the length of the area (in blocks). By packing this information into a single word we have limited the maximum sized block that can be allocated to 2(sizeof(BlockSizeType)-1) bytes. But for small embedded systems where BlockSizeType is typically a 16-bit integer, this should be more than adequate.

typedef uint16    BlockSizeType;  // size of header

Free areas are linked in a doubly linked free list. Two pointers are required to accomplish this - prev and next. The effect of the extra pointer over a singly linked list is to increase the minimum block size. However, since the pointers occupy space in the pool that would otherwise be unused, this does not require any additional space.

The Block structure contains a union. The union overlays an instance of the link structure, which is used to hold pointers to the next and previous elements of the free list when the block is free with the userPart of the block which is the space given to the user when the block is reserved.

In this implementation the size of a block (in bytes) is required to be at least sizeof(BlockSizeType) + 2 * sizeof(Block*). On a 16-bit machine the block size is typically 8 or 16 bytes. For an 8-bit machine the minimum size is 8 bytes. A number of divisions in the code are implemented as shifts, so the block size must be a power of 2.

Since the storage pool starts out empty, the free list contains initially just a single entry that treats the entire pool as a one large free area.

The manager has the three standard public functions malloc, free and realloc and support functions InitPool, MergeBlocks and SplitBlock. In addition, two private functions Unlink and InsertAfter implement common linked-list manipulations.

20 //-----------------------------------------------------------------------------
21 // Compute the number of blocks that will fit in the memory area defined.
22 // Allocate the pool of blocks. Note this includes the sentinel area that is 
23 // attached to the end and is always only one block. The first entry in the 
24 // free list pool is set to include all available blocks. The sentinel is 
25 // initialised to point back to the start of the pool.
26 //
27 static void InitPool(void)
28 {
29     Block* head;
30     char* heapStart = (char*)&__heap_start;
31     char* heapEnd = (char*)&__heap_end;
32 
33     mPool = (Block*)heapStart;
34     mNumberOfBlocks = (uint16)(((heapEnd - (char*)mPool) >> SHIFT_VALUE) - 1);
35     mSentinel = mPool + mNumberOfBlocks;
36 
37     mSentinel->blocks = RESERVED;           // now cannot be used
38     mSentinel->prev = mSentinel;
39     mSentinel->next = mSentinel;
40 
41     // Entire pool is initially a single unallocated area.
42     //
43     head = &mPool[0];
44     head->blocks = mNumberOfBlocks;         // initially all of free memeory
45     InsertAfter(head);                      // link the sentinel
46 }

Listing 2. List Initialisation.

The initialisation function InitPool shown in Listing 2 must initialise the three static local variables mNumberOfBlocks, mPool and mSentinel. The pointer mPool is initialised to point to the beginning of the heap area (line 33). The beginning and end of this area are defined by the heapStart and heapEnd pointers, which are initialised from the gcc compiler constants __heap_start and __heap_end respectively (lines 30 and 31). Most compilers have similar names and where they haven't, you can define them as simple constants.

The number of blocks is effectively set to N / sizeof(Block), where N is the desired size of the storage pool in bytes (34). The mPool variable can therefore be used as an array of Blocks with mNumberOfBlocks entries within the heap area.

The last block in the array is used as the sentinel for the doubly linked free list. The static variable mSentinel is initialised as a reference to this last block (35). Since the sentinel block is never to be allocated, it is marked as reserved (37). The next and prev pointers of the sentinel are both initialised to point at the sentinel itself (38, 39). The free list is therefore initially empty.

The entire pool is initially a single unallocated area. We represent this area by the first block in the pool, mPool[0], and assign the head pointer to this location (43). The block is marked free (by default) and the blocks field is set to mNumberOfBlocks (44). Then, the block is inserted into the doubly linked free list by calling the private function InsertAfter (45). The InsertAfter function can do the insertion in constant time. The worst-case running time of InitPool is O(1).

Free Function

Listing 3 gives the code for the free member function. This takes as its one argument a pointer to the user part of the memory area to be released.

47 //-----------------------------------------------------------------------------
48 // Return this memory block to the free list.
49 //
50 void free(void* pntr)
51 {
52     // Check for errors.
53     //
54     STATE_REG state = SaveState();
55     Block* top = mPool + mNumberOfBlocks;
56     Block* baseArea = TO_BLOCK_PTR(pntr);   // convert to a block address
57     if ( (pntr == NULL) || (baseArea > mPool) || (baseArea >= top) ) return;
58 
59     // Very simple - we insert the area to be freed at the start
60     // of the free list. This runs in constant time. Since the free
61     // list is not kept sorted, there is less of a tendency for small
62     // areas to accumulate at the head of the free list.
63     //
64     baseArea->blocks &= ~RESERVED;          // mark as now free
65     InsertAfter(baseArea);
66     RestoreState(state);
67 }

Listing 3. Free Allocated Memory.

The implementation shown is both simple and fast. The free function makes no attempt to combine adjacent free areas. And since the free list is not sorted, it simply inserts the area to be freed at the front of the free list by inserting it into the free list immediately after the sentinel.

#define TO_BLOCK_PTR(p)     (Block*)((BlockSizeType*)p - 1)

The TO_BLOCK_POINTER macro converts the supplied address to a Block pointer (line 56). After checking for the do-nothing case of a NULL pointer and a check that the pointer is at least somewhere within the heap (57), the block is marked free (64) and inserted at the head of the free list (65) by calling the local function InsertAfter.

Because the free list is a doubly linked list, the insertion can be done in constant time. Therefore, the worst-case running time of the free function is O(1).

Malloc Function

The malloc function is shown in Listing 4. This takes a single integer argument that specifies the size in bytes of the area to be allocated. malloc returns a pointer to the storage if there is sufficient space left in the pool to accommodate the request. Otherwise it returns a NULL.

68 //-----------------------------------------------------------------------------
69 // Return a pointer to an area of memory that is at least the right size.
70 //
71 void* malloc(size_t size)
72 {
73     // Check for errors.
74     //
75     if (size < 0) return NULL;
76 
77     STATE_REG state = SaveState();
78 
79     // Compute the number of blocks to satisfy the request.
80     //
81     uint16 reqBlocks = (size + sizeof(BlockSizeType) + sizeof(Block) - 1) >> SHIFT_VALUE;
82     Block* block;
83 
84     // Initialise the heap for a first-time use.
85     //
86     if (mSentinel == NULL)
87     {
88         InitPool();
89     }
90     
91     // Traverse the free list looking for the first block that will fit the
92     // request. This is a "first-fit" strategy.
93     //
94     for (block = mSentinel->next; block != mSentinel; block = block->next)
95     {
96         block = MergeBlocks(block);
97 
98         // Is this free area (which could have just expanded somewhat)
99         // large enough to satisfy the request.
100         //
101         if (block->blocks >= reqBlocks)
102         {
103             break;
104         }
105     }
106 
107     // If we are pointing at the sentinel then all blocks are allocated.
108     //
109     if (block == mSentinel)
110     {
111         RestoreState(state);
112         return NULL;
113     }
114 
115     // If this free area is larger than required, it is split in two. The
116     // size of the first area is set to that required and the second area
117     // to the blocks remaining. The second area is then inserted into 
118     // the free list.
119     //
120     if (block->blocks > reqBlocks)
121     {
122         SplitBlock(block, reqBlocks);
123     }
124 
125     // Unlink the now correctly sized area from the free list and mark it 
126     // as reserved.
127     //
128     Unlink(block);
129     block->blocks |= RESERVED;
130     RestoreState(state);
131     return block->userPart;
132 }

Listing 4. Allocating Memory.

The function must traverse the free list in order to find an area large enough to match the request. The allocation strategy chosen is first-fit, i.e. storage is allocated in the first area that is large enough.

Recall that the free function given in the preceding section does not combine adjacent free areas. Therefore it is the responsibility of the malloc function to do this. The algorithm is this: Adjacent free areas are recombined while the free list is traversed in search of a free area large enough to accommodate the allocation request.

The function begins by computing the number of blocks required using the integer formula

Blocks = (requestedbytes + sizeof(BlockSizeType) + sizeof(Block) -  1) / sizeof(Block)

i.e. we need enough space for the requested number of bytes plus a Header (line 81). Note that in the code we use a shift operation to save the necessity for division.

The malloc function then traverses the free list in search of an area that is large enough (lines 94-105). As each area in the free list is visited, the area that immediately follows that area in memory is examined to see if it too is free and therefore if it can be combined with the given area (private function MergeBlocks shown in Listing 5).

133 //-----------------------------------------------------------------------------
134 // As each area is examined for a fit, we also examine the following area. 
135 // If it is free then it must also be on the free list. Being a doubly-linked 
136 // list, we can combine these two areas in constant time. If an area is 
137 // combined, the procedure then looks again at the following area, thus 
138 // repeatedly combining areas until a reserved area is found. In terminal 
139 // cases this will be the sentinel block.
140 //
141 static Block* MergeBlocks(Block* block)
142 {
143     while (TRUE)
144     {
145         Block* successor = block + block->blocks;   // point to next area
146         if (successor->blocks & RESERVED)           // done if reserved
147             return block;
148         Unlink(successor);
149         block->blocks += successor->blocks;         // add in its blocks
150     }
151 }

Listing 5. The MergeBlocks Function.

Since the size of an area is recorded in its header, we can easily find the area that follows it in memory as shown in Figure 1. And since the status of an area (free or reserved) is recorded in the area itself, we can determine whether the following area is free.

Figure 1. Double Link Structure.

If the area that follows a given area is indeed free, then it must in be the free list! And because the free list is doubly linked, we can easily extract that area from the free list in constant time. This is what the Unlink function does on line 148. When combining a given area with the area that follows it in memory, we obtain an area the size of which is equal to the sum of the sizes of the areas that were combined (149). After the combining it is possible that the area that follows the new larger area is also free. Therefore we need to repeat the combining process again. This is what the loop in MergeBlocks does - it keeps combining adjacent free areas until the area that follows is a reserved area.

The search for a free area to satisfy the malloc request terminates at the first area that is large enough (line 103). Notice that if the entire free list is traversed, all the adjacent areas that can be merged have been merged. If the free list is traversed and an area is not found that is large enough to satisfy the request, then the request cannot be satisfied because there is insufficient contiguous memory to do so. In this case the function returns a NULL pointer that the calling program should naturally always test for (112).

When the free area is larger than needed, it is split in two in the SplitBlock function shown in Listing 5.

152 //-----------------------------------------------------------------------------
153 //
154 static void SplitBlock(Block* block, uint16 reqBlocks)
155 {
156     Block* newBlock = block + reqBlocks;            // create a remainder area
157     newBlock->blocks = block->blocks - reqBlocks;   // set its size and mark as free
158     block->blocks = reqBlocks;                      // set us to requested size
159     InsertAfter(newBlock);                          // stitch remainder into free list
160 }

Listing 5. The SplitBlock Function.

The size of the first area is set to the number of blocks requested and the size of the second area equal to the number of blocks that remain. The second area is then inserted into the free list by calling the InsertAfter function (line 159).

In all cases, by the time execution reaches line 128, the free area is the correct size modulo the block size. The area is unlinked from the free list and marked reserved (129). Finally, a pointer to the userPart of the area is returned (line 131).

At first glance it might seem that the nested loops would result in a worst-case running time of O(n^2) where n is the number of blocks in the storage pool. However, the outer loop traverses the free list while every complete iteration of the inner loop removes at most one element from the free list. As a result, the worst-case running time for the nested loops is actually only O(n).

The nice thing about this approach is that the asymptotic running time of the malloc function would be O(n) even if it did not combine adjacent free areas. In particular, this asymptotic bound is the same as for a singly linked storage pool. On the other hand, the running time of the free function is O(1) for the doubly-linked pool which is certainly better than the O(n) worst-case running time of the singly-linked pool. Finally, since the free list is not kept sorted, there is not the same tendency for short areas to accumulate at the head of the free list as there is in the singly linked pool. In fact, over time the free list will take on a distribution that matches the requests.

Realloc Function

The implementation of the standard function realloc to extend the size of an existing area without losing the current contents is shown in Listing 6.

161 //-----------------------------------------------------------------------------
162 // Re-allocate the buffer to a new area the requested size. If possible the
163 // existing area is simply expanded. Otherwise a new area is allocated and
164 // the current contents copied.
165 //
166 void* realloc(void* pntr, uint16 newSize)
167 {
168     // Check for errors.
169     //
170     if ( (pntr == NULL ) || (newSize < 0) ) return pntr;
171 
172     STATE_REG state = SaveState();
173     Block* block = TO_BLOCK_PTR(pntr);   // convert user to block address
174     uint16 reqBlocks = (newSize + sizeof(BlockSizeType) + sizeof(Block) - 1) >> SHIFT_VALUE;
175 
176     block->blocks &= ~RESERVED;          // expose the size
177 
178     // The fastest option is to merge this block with any free blocks
179     // that are contiguous with this block.
180     //
181     block = MergeBlocks(block);
182 
183     if (block->blocks > reqBlocks)
184     {
185         // The merge produced a larger block than required, so split it
186         // into two blocks. This also takes care of the case where the
187         // new size is less than the old.
188         //
189         SplitBlock(block, reqBlocks);
190     }
191     else if (block->blocks < reqBlocks)
192     {
193         // Could not expand this block. Must attempt to allocate
194         // a new one the correct size and copy the current contents.
195         //
196         uint16 oldSize = BLOCKS_TO_BYTES(block->blocks);
197         if ((block = (Block*)malloc(newSize)))
198         {
199             // A new block large enough has been allocated. Copy the
200             // existing data and then discard the old block.
201             //
202             block = TO_BLOCK_PTR(block);
203             memcpy(block->userPart, (Block*)(TO_BLOCK_PTR(pntr))->userPart, oldSize);
204             free(pntr);          // done with the old
205         }
206         else
207         {
208             // Cannot re-allocate this block. Note the old pointer
209             // is still valid.
210             //
211             RestoreState(state);
212             return NULL;        // no valid options
213         }
214     }
215 
216     block->blocks |= RESERVED;
217     RestoreState(state);
218     return block->userPart;
219 }

Listing 6. The Realloc function.

It first computes the number of blocks required to satisfy the new requirement (line 174). It then checks if the immediately contiguous block(s) are free by calling MergeBlocks. What happens next depends on the result of this call. If a merge did occur and the block size now exceeds the requirement, the now too large block is split in two parts, with the unused part being put on the free list (189). If the block size is unchanged or is still less than the requirement a new block of the required size is requested via malloc ( line 197). If this is successful the data in the old block is copied to the new area (203) and the old area returned to the free list (204). Otherwise the request cannot be granted and the function returns a NULL (212).

Support Functions

A number of support functions provide standard doubly linked list operations.

220 //-----------------------------------------------------------------------------
221 //
222 static void InsertAfter(Block* block)
223 {
224     Block* p = mSentinel->next;
225     mSentinel->next = block;
226     block->prev = mSentinel;
227     block->next = p;
228     p->prev = block;
229 }
230 //-----------------------------------------------------------------------------
231 //
232 static void Unlink(Block* ptr)
233 {
234     ptr->prev->next = ptr->next;
235     ptr->next->prev = ptr->prev;
236 }

AVRgcc Implementation

The current memory allocator in AVRgcc does not set aside the entire area above the bss area up to the bottom of the system stack area, but will expand up to this area if necessary. In my opinion, the heap may as well use all this area from the beginning, with the __heap_end constant defining the bottom of (and consequently the size of) the system stack area.

The __heap_start constant is configured by default by the compiler to begin immediately after the end of the bss area. The __heap_end constant is not set to anything by default, so it is necessary to define and include it in the link line. A typical makefile entry setting the top of heap for an ATMega128 at 0x107f would be:

LDFLAGS += -Wl,--defsym=__heap_end=0x80107f

This gives a system stack size of a very comfortable 129 bytes.

A look at the Map file for a typical project shows the flash usage for the original AVRgcc malloc/free/realloc is 994 locations. The DoubleLinkPool version (using -Os) uses 1006 locations, so no great price to pay.

MultiThreading

The functions are, of course, not re-entrant. I have been using them with a number of preemptive operating systems and this explains the SaveState and RestoreState functions. These are inline calls to save and restore the interrupt bit on the respective processors, necessary to prevent a process switch while allocating memory. This will be adequate in many cases where interrupt response time is not critical.

However, one of the problems with memory allocators is their running time is indeterminate, and just turning off interrupts can be a problem in situations where interrupt response time is critical. In these cases a solution I have used is to protect the allocator by making it a serial resource protected by a semaphore. The semaphore ceiling priority is set to the highest possible setting, and no other process is allowed to use this priority. For example, if the highest RTOS priority is 10, then this is given to the allocator semaphore. The highest priority now available for other processes is 9. In a situation where you don't have full control of setting process priorities, assign the highest to the allocator as above, and reset the highest value available to processes to one less.

Any thread running in the allocator code can therefore be interrupted to restore rapid interrupt response times, but the allocator function itself cannot be interrupted by another process and will execute as fast as possible (I am assuming no interrupt routine uses the allocator!).

Conclusion

The DoubleLinkedPool heap memory manager described here does, in my experience, perform better than versions provided with a number of C compilers, particularly when under pressure. It continues to function, in a general sense, when a large percentage of the heap area is allocated and at high levels of fragmentation.

Here is a version of DoubleLinkPool for the AVR processors available for download. The public functions have the same signature as the library versions, so no specific header is required. The included Types header is my own ideosyncratic set of type names and can be easily altered. I have not included a test program as most testers will have their own set of torture tools for testing.