Heapmonitor Documentation

Introduction

The Heapmonitor is a facility delivering insight into the memory distribution of a Python program. It can introspect memory consumption of certain classes and objects. Facilities are provided to track and size individual objects or all instances of certain classes. Tracked objects are sized recursively to provide an overview of memory distribution between the different tracked objects.

Usage

Let’s start with a simple example. Suppose you have this module:

>>> class Employee:
...    pass
...
>>> class Factory:
...    pass
...
>>> def create_factory():
...    factory = Factory()
...    factory.name = "Assembly Line Unlimited"
...    factory.employees = []
...    return factory
...
>>> def populate_factory(factory):
...    for x in xrange(1000):
...        worker = Employee()
...        worker.assigned = factory.name
...        factory.employees.append(worker)
...
>>> factory = create_factory()
>>> populate_factory(factory)

The basic tools of the Heapmonitor are tracking objects or classes, taking snapshots, and printing or dumping statistics. The first step is to decide what to track. Then spots of interest for snapshot creation have to be identified. Finally, the gathered data can be printed or saved:

>>> factory = create_factory()
>>> from pympler import heapmonitor
>>> heapmonitor.track_object(factory)
>>> heapmonitor.track_class(Employee)
>>> heapmonitor.create_snapshot()
>>> populate_factory(factory)
>>> heapmonitor.create_snapshot()
>>> heapmonitor.print_stats(detailed=0)
---- SUMMARY ------------------------------------------------------------------
                                         active      1.22 MB      average   pct
  Factory                                     1    344     B    344     B    0%
  __main__.Employee                           0      0     B      0     B    0%
                                         active      1.42 MB      average   pct
  Factory                                     1      4.75 KB      4.75 KB    0%
  __main__.Employee                        1000    195.38 KB    200     B   13%
-------------------------------------------------------------------------------

Basic Functionality

Instance Tracking

The purpose of instance tracking is to observe the size and lifetime of an object of interest. Creation and destruction timestamps are recorded and the size of the object is sampled when taking a snapshot.

To track the size of an individual object:

from pympler import heapmonitor
obj = MyObject()
heapmonitor.track_object(obj)
pympler.heapmonitor.track_object(instance, name=None, resolution_level=0, keep=0, trace=0)
Track object ‘instance’ and sample size and lifetime information. Not all objects can be tracked; trackable objects are class instances and other objects that can be weakly referenced. When an object cannot be tracked, a TypeError is raised. The ‘resolution_level’ is the recursion depth up to which referents are sized individually. Resolution level 0 (default) treats the object as an opaque entity, 1 sizes all direct referents individually, 2 also sizes the referents of the referents and so forth. To prevent the object’s deletion a (strong) reference can be held with ‘keep’.

Class Tracking

Most of the time, it’s cumbersome to manually track individual instances. All instances of a class can automatically be tracked with track_class:

heapmonitor.track_class(MyClass)

All instances of MyClass (or a class that inherits from MyClass) created hereafter are tracked.

pympler.heapmonitor.track_class(cls, name=None, resolution_level=0, keep=0, trace=0)
Track all objects of the class cls. Objects of that type that already exist are not tracked. If track_class is called for a class already tracked, the tracking parameters are modified. Instantiation traces can be generated by setting trace to True. A constructor is injected to begin instance tracking on creation of the object. The constructor calls track_object internally.

Tracked Object Snapshot

Tracking alone will not reveal the size of an object. The idea of the Heapmonitor is to sample the sizes of all tracked objects at configurable instants in time. The create_snapshot function computes the size of all tracked objects:

heapmonitor.create_snapshot('Before juggling with tracked objects')
...
heapmonitor.create_snapshot('Juggling aftermath')

With this information, the distribution of the allocated memory can be apportioned to tracked classes and instances.

pympler.heapmonitor.create_snapshot(description='')
Collect current per instance statistics. Save total amount of memory consumption reported by asizeof and by the operating system. The overhead of the Heapmonitor structure is also computed.

Advanced Functionality

Per-referent Sizing

It may not be enough to know the total memory consumption of an object. Detailed per-referent statistics can be gathered recursively up to a given resolution level. Resolution level 1 means that all direct referents of an object will be sized. Level 2 also include the referents of the direct referents, and so forth. Note that the member variables of an instance are typically stored in a dictionary and are therefore second order referents.

heapmonitor.track_object(obj, resolution_level=2)

The resolution level can be changed if the object is already tracked:

heapmonitor.track_change(obj, resolution_level=2)

The new setting will become effective for the next snapshot. This can help to raise the level of detail for a specific instance of a tracked class without logging all the class’ instances with a high verbosity level. Nevertheless, the resolution level can also be set for all instances of a class:

heapmonitor.track_class(MyObject, resolution_level=1)

Warning

Please note the per-referent sizing is very memory and computationally intensive. The recorded meta-data must be stored for each referent of a tracked object which might easily quadruplicate the memory footprint of the build. Handle with care and don’t use too high resolution levels, especially if set via track_class.

Instantiation traces

Sometimes it is not trivial to observe where an object was instantiated. The Heapmonitor can remember the instantiation stack trace for later evaluation.

heapmonitor.track_class(MyObject, trace=1)

This only works with tracked classes, and not with individual objects.

Background Monitoring

The Heapmonitor can be configured to take periodic snapshots automatically. The following example will take 10 snapshots a second (approximately) until the program has exited or the periodic snapshots are stopped with stop_periodic_snapshots. Background monitoring also works if no object is tracked. In this mode, the Heapmonitor will only record the total virtual memory associated with the program. This can be useful in combination with background monitoring to detect memory usage which is transient or not associated with any tracked object.

heapmonitor.start_periodic_snapshots(interval=0.1)

Warning

Take care if you use automatic snapshots with tracked objects. The sizing of individual objects might be inconsistent when memory is allocated or freed while the snapshot is being taken.

pympler.heapmonitor.start_periodic_snapshots(interval=1.0)
Start a thread which takes snapshots periodically. The interval specifies the time in seconds the thread waits between taking snapshots. The thread is started as a daemon allowing the program to exit. If periodic snapshots are already active, the interval is updated.
pympler.heapmonitor.stop_periodic_snapshots()
Post a stop signal to the thread that takes the periodic snapshots. The function waits for the thread to terminate which can take some time depending on the configured interval.

Off-line Analysis

The more data is gathered by the Heapmonitor the more noise is produced on the console. The acquired Heapmonitor log data can also be saved to a file for off-line analysis:

heapmonitor.dump_stats('heap-profile.dat')

The MemStats class of the Heapmonitor provides means to evaluate the collected data. The API is inspired by the Stats class of the Python profiler. It is possible to sort the data based on user preferences, filter by class and limit the output noise to a manageable magnitude.

The following example reads the dumped data and prints the ten largest Node objects to the standard output:

from pympler.heapmonitor import MemStats

stats = MemStats()
stats.load('heap.dat')
stats.sort_stats('size').print_stats(limit=10, filter='Node')
class pympler.heapmonitor.MemStats(filename=None, stream=<open file '<stdout>', mode 'w' at 0xb7ec0068>, tracked=None, snapshots=None)

Presents the gathered memory statisitics based on user preferences.

load_stats(fdump)
Load the data from a dump file. The argument fdump can be either a filename or a an open file object that requires read access.
sort_stats(*args)

Sort the tracked objects according to the supplied criteria. The argument is a string identifying the basis of a sort (example: ‘size’ or ‘classname’). When more than one key is provided, then additional keys are used as secondary criteria when there is equality in all keys selected before them. For example, sort_stats(‘name’, ‘size’) will sort all the entries according to their class name, and resolve all ties (identical class names) by sorting by size. The criteria are fields in the tracked object instances. Results are stored in the self.sorted list which is used by MemStats.print_stats() and other methods. The fields available for sorting are:

‘classname’ : the name with which the class was registered ‘name’ : the classname ‘birth’ : creation timestamp ‘death’ : destruction timestamp ‘size’ : the maximum measured size of the object ‘tsize’ : the measured size during the largest snapshot ‘repr’ : string representation of the object

Note that sorts on size are in descending order (placing most memory consuming items first), whereas name, repr, and creation time searches are in ascending order (alphabetical).

The function returns self to allow calling functions on the result:

stats.sort_stats('size').reverse_order().print_stats()
print_stats(filter=None, limit=1.0)
Write tracked objects to stdout. The output can be filtered and pruned. Only objects are printed whose classname contain the substring supplied by the filter argument. The output can be pruned by passing a limit value. If limit is a float smaller than one, only the supplied percentage of the total tracked data is printed. If limit is bigger than one, this number of tracked objects are printed. Tracked objects are first filtered, and then pruned (if specified).
dump_stats(fdump, close=1)
Dump the logged data to a file. The argument file can be either a filename or a an open file object that requires write access. close controls if the file is closed before leaving this method (the default behaviour).
reverse_order()
Reverse the order of the tracked instance index self.sorted.

HTML Statistics

The Heapmonitor data can also be emitted in HTML format together with a number of charts (needs python-matplotlib). HTML statistics can be emitted directly, by specifying a file with the extension .html file as the profiling output:

heapmonitor.dump_stats('heap-profile.html')

However, you can also reprocess a previously generated dump:

from pympler.heapmonitor import HtmlStats

stats = HtmlStats('heap-profile.dat')
stats.create_html('heap-profile.html')
class pympler.heapmonitor.HtmlStats(filename=None, stream=<open file '<stdout>', mode 'w' at 0xb7ec0068>, tracked=None, snapshots=None)

Output the Heapmonitor statistics as HTML pages and graphs.

create_html(fname, title='Heapmonitor Statistics')
Create HTML page fname and additional files in a directory derived from fname.

Tracking Garbage

Garbage occurs if objects refer too each other in a circular fashion. Such reference cycles cannot be freed automatically and must be collected by the garbage collector. While it is sometimes hard to avoid creating reference cycles, preventing such cycles saves garbage collection time and limits the lifetime of objects. Moreover, some objects cannot be collected by the garbage collector.

The Heapmonitor provides functions to analyze reference cycles of collectable objects. When the garbage collector is turned off, the garbage can be kept for debugging purposes:

from pympler import heapmonitor

heapmonitor.start_debug_garbage()

l = []
l.append(l) # produce cycle

heapmonitor.print_garbage_stats()
heapmonitor.end_debug_garbage()

Reference cycles can be visualized with graphviz. A graphviz input file is generated when visualize_ref_cycles is invoked:

from pympler import heapmonitor

heapmonitor.start_debug_garbage()

l = []
l.append(l) # produce cycle

heapmonitor.visualize_ref_cycles('leakgraph.txt')
heapmonitor.end_debug_garbage()

On Linux, the graph file can be turned into a PDF with the following commands:

dot -o leakgraph.dot leakgraph.txt
dot leakgraph.dot -Tps -o leakgraph.eps
epstopdf leakgraph.eps
pympler.heapmonitor.start_debug_garbage()
Turn off garbage collector to analyze collectable reference cycles.
pympler.heapmonitor.end_debug_garbage()
Turn garbage collection on and disable debug output.
pympler.heapmonitor.print_garbage_stats(fobj=<open file '<stdout>', mode 'w' at 0xb7ec0068>)
Print statistics related to garbage/leaks. This function collects the reported garbage. Therefore, subsequent invocations of print_garbage_stats will not report the same objects again.
pympler.heapmonitor.visualize_ref_cycles(fname)
Print reference cycles of collectable garbage to a file which can be processed by Graphviz. This function collects the reported garbage. Therefore, subsequent invocations of print_garbage_stats will not report the same objects again.

Limitations and Corner Cases

Inheritance

Class tracking allows to observe multiple classes that might have an inheritance relationship. An object is only tracked once. Thus, the tracking parameters of the most specialized tracked class control the actual tracking of an instance.

Morphing objects

SCons instates the pattern of changing an instance’ class at runtime, for example to morph abstract Node objects into File or Directory nodes. The pattern looks like the following in the code:

obj.__class__ = OtherClass

If the instance which is morphed is already tracked, the instance will continue to be tracked by the Heapmonitor. If the target class is tracked but the instance is not, the instance will only be tracked if the constructor of the target class is called as part of the morphing process. The object will not be re-registered to the new class in the tracked object index. However, the new class is stored in the representation of the object as soon as the object is sized.

Shared Data

Data shared between multiple tracked object won’t lead to overestimations. Shared data will be assigned to the first (evaluated) tracked object it is referenced from, but is only counted once. Tracked objects are evaluated in the order they were announced to the Heapmonitor. This should make the assignment deterministic from one run to the next, but has two known problems. If the Heapmonitor is used concurrently from multiple threads, the announcement order will likely change and may lead to random assignment of shared data to different objects. Shared data might also be assigned to different objects during its lifetime, see the following example:

class A():
  pass

a = A()
heapmonitor.track_object(a)
b = A()
heapmonitor.track_object(b)
b.content = range(100000)
heapmonitor.create_snapshot('#1')
a.notmine = b.content
heapmonitor.create_snapshot('#2')

In the snapshot #1, b‘s size will include the size of the large list. Then the list is shared with a. The snapshot #2 will assign the list’s footprint to a because it was registered before b.

If a tracked object A is referenced from another tracked object B, A‘s size is not added to B‘s size, regardless of the order in which they are sized.

Accuracy

Heapmonitor uses the sizer module to gather size informations. Asizeof makes assumptions about the memory footprint of the various data types. As it is implemented in pure Python, there is no way to know how the actual Python implementation allocates data and lays it out in memory. Thus, the size numbers are not really accurate and there will always be a divergence between the virtual size of the SCons process as reported by the OS and the sizes asizeof estimates.

Most recent C/Python versions contain a facility to report accurate size informations of Python objects. If available, asizeof uses it to improve the accuracy.