MemPools rewrite from Andres Kroonmaa on 2000-10-18 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Wed, 18 Oct 2000 20:20:12 +0200

While thinking of ways to optimise memPools I started to write down how
its done today and how it could look like. While doing so I came to one
solution, and as it was quite simple to convert my notes for a general
description I thought I would through it to squid-dev for review.

I plan to implement this, and while I've not yet screwed it up, I hope
to hear your comments and useful suggestions on that matter. Maybe you
have much better ideas on how to do it.

Current MemPools implementation:

MemPool {
  descr
  objsize
  Stack {
    capacity /* current size of freelist */
    freeCount /* objects on the freelist */
    *freeList /* array of pointers to xcalloced objects */
  }
}

freeList:
  Array[0..x] of: {
    0: -> xmalloc(objsize)
    1: -> xmalloc(objsize)
    2: -> xmalloc(objsize)
    ..
    x: -> xmalloc(objsize)
  }

if freeList gets full, it is realloc()ed to add 16 more pointers, and
Stack capacity is updated.

Upon Pool creation freeList Array is Null. Upon first Free to the pool,
freeList Array is filled from start with pointers to freed memory blocks.

Upon Alloc, if freelist is empty, object is malloc()ed from system and
given to caller. Pooling cuts in only at a time of Free.

Memory objects themselves are not clustered, they are individually
malloced from system, and memory fragmentation is totally a matter of
libmalloc implementation.

Pool initial state: empty,
Object locality: pretty random eventually
Fragmentation: uncontrollable (read: high)

Proposed new method:

MemPool {
  descr
  objsize
  capacity /* all chunks have same capacity */
  *Chunk
}

Chunk {
   usedCount /* number of objects returned to callers */
   *freeList /* list of pointers to free objects within objCache */
   *objCache /* plain array of objsize'd objects */
   *nextChunk
}

objCache:
   objArray[0..x] of {
      sizeof(objsize)
   } = xmalloc(x * objsize)

freeList:
  Array[0..x] of * {
    0: -> objArray[0]
    1: -> objArray[1]
    2: -> objArray[2]
    ..
    x: -> objArray[x]
  } = xmalloc(x * sizeof(*) )

Upon Pool creation, first Chunk is initialised, objCache is xmalloced from
OS, it is sized so that ideally full multiples of OS VM_PAGE are used up,
and divided by objsize results in count of objects per objCache "x".
Then freeList is initialised as if all sequential objects in objCache
have been Freed to it.

Upon PoolAlloc, first Chunk is checked to see if its usedCount has reached
capacity, meaning this chunk has been used up to its full. Then New Chunk
is created, initialised and linked to the first Chunk.

Upon typical PoolAlloc, all Chunks in linkedlist are checked, and Chunk
with nonzero free objects in it is found. Then freeList is reduced by one
pointer to objCache array item, thus returning free object to the caller.

Sequence of objects within objCache and freeList are not related in any way,
it is only made sure that obj count in objCache and freeList size are the
same. This allows freeList to be used as stack and objCache as a random
array.

Upon PoolFree, object pointer is noted, and Chunk is searched where given
freeable pointer is within bounds of objCache Array. This chunk is where
the free pointer is to be returned. Pointer is appended to the freeList
and object cleared.

When a Chunk's usedCount reaches capacity, meaning that all objects in
given Chunk's objCaches has been returned to freeList, Pool library may
decide to destroy given Chunk and return its alloced memory to the system.

Constants.
It seems most OS-friendly to preallocate objCache in multiples of VM_PAGE.
Ideally, even if very many pools are used, alloced and returned to system,
this should allow system to fill the holes. Alos, mmap based malloc can
be used easily.
As MemPools should be able to work with different sized objects, number
of objects in freeList and objCache must vary. For very small sizes number
of objects in objCache gets high although memory usage from OS may stay low.
On the other hand, for large objects, like 16K or 64K buffers, freeList may
be very small yet memory used in a single chunk can become very large.
As Chunk cannot be destroyed until all objects are returned to it, this
may attribute to fragmentation of unused memory.
For this reason it seems reasonable to limit
1) minimal size of freeList, and
2) maximum size of objCache
at the same time.
It is difficult to pick best objCache size. For best fragmentation reduction
this size should be tried to be same for all Pool-allocation sizes.

Benefits.
In theory this pooling should reduce alot load onto libmalloc, by precisely
and knowingly clustering small allocs into close proximity. Also as first
match in freelist is returned, most used Chunks tend to be first chunks,
meaning that deepest nested chunks have much higher probability to get free
and returned to the system.
With clustering shortlived objects, one Chunk become a hotspot of activity
It is possible to request number of consequtive objects from this pool,
as in calloc(count, size), if this has any usefulness.

Initial state: 1 full Chunk,
Object locality: Chunk is 1 object.
Fragmentation: within Chunk uncontrollable, reduced systemwide

Problems.
MemPoolShrink is nearly impossible to implement. memory can be returned
only in Chunks, and any Chunk is locked as long as there is single object
in use. Current implementation expects that MemPoolShrink is guaranteed.
Slightly higher overhead than with previous memPools due to linkedlists
and preallocation.
Pools for large objects may need huge chunks and thus can result in quite
large fragmentation of free space.

Maybe we should make MemPool and Chunk a single struct. Then any Chunk
is actually a fullblown MemPool.

------------------------------------
Andres Kroonmaa <andre@online.ee>
Delfi Online
Tel: 6501 731, Fax: 6501 708
Pärnu mnt. 158, Tallinn,
11317 Estonia
Received on Wed Oct 18 2000 - 12:23:44 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:51 MST