Garbage Collection (GC)

Garbage Collection (GC) is the automatic memory management process that identifies and reclaims memory occupied by objects no longer reachable by the program — critical in AI and deep learning contexts where Python's reference counting and CUDA memory management interact in ways that cause VRAM leaks, training crashes, and subtle performance degradation.

What Is Garbage Collection?

- Definition: An automatic runtime process that tracks object lifetimes, identifies memory that is no longer referenced by any active part of the program, and reclaims it for future use — freeing programmers from manual memory management (malloc/free in C).
- Python's Approach: Python uses reference counting as the primary GC mechanism — each object tracks how many references point to it; when the count reaches zero, the object is immediately freed. A cyclic garbage collector handles reference cycles.
- CUDA Memory Management: PyTorch maintains its own GPU memory allocator (caching allocator) on top of raw CUDA memory — torch.cuda.empty_cache() releases cached but unused memory back to CUDA, while gc.collect() handles Python object cleanup.
- The Interaction Problem: A Python object holding a reference to a CUDA tensor prevents the tensor from being freed even if nothing meaningful is using it — Python GC and CUDA memory are coupled through reference counting.

Why GC Matters for AI Systems

- Training Loop Stability: Without proper tensor lifecycle management, VRAM usage grows monotonically across training steps until OOM crash — a common source of "why does my training crash at step 5,000?"
- Inference Memory Efficiency: Long-running inference services gradually accumulate tensor references in Python objects (loggers, monitoring callbacks, request history) — GC issues cause memory to grow until the pod is killed and restarted.
- Debugging Difficulty: Memory leaks from GC issues produce OOM errors far from the source of the leak — profiling tools are required to trace allocations back to the reference-holding object.
- Cycle Detection Overhead: Python's cyclic GC runs periodically and can cause latency spikes during generation — at generation boundaries, GC can pause the Python thread for milliseconds.

Python's Reference Counting

Every Python object has a reference count (ob_refcnt). When you do:
x = MyTensor() → refcount = 1
y = x → refcount = 2
del x → refcount = 1
del y → refcount = 0 → object freed immediately

Reference Cycles (not freed by reference counting alone):
class Node:
def __init__(self): self.next = None
a = Node(); b = Node()
a.next = b; b.next = a → cycle: neither freed when a and b go out of scope
del a; del b → refcount still 1 for each (cycle prevents zero)

Python's cyclic GC detects and breaks these cycles — but runs periodically, not immediately.

Common GC-Related Bugs in AI Code

Accumulating Computational Graphs:
losses = []
for batch in dataloader:
loss = model(batch)
losses.append(loss) # BUG: stores tensor + entire gradient graph

Fix: losses.append(loss.item()) # Detaches from graph, stores plain float

Storing Tensors in Class Attributes:
self.last_output = model_output # BUG: holds VRAM until next forward pass
Fix: self.last_output = model_output.detach().cpu() # Move to CPU, detach

Logging with Tensor Values:
logger.info(f"Loss: {loss}") # OK if loss is float
logger.info(f"Output: {output}") # BUG if output is a CUDA tensor — may retain graph

CUDA Memory Management

PyTorch's caching allocator optimizes CUDA malloc/free by keeping freed memory in a cache rather than returning it to CUDA immediately — improving performance by avoiding expensive CUDA mallocs on future allocations.

torch.cuda.empty_cache():
- Releases the caching allocator's freed memory back to CUDA.
- Does NOT free memory still referenced by Python objects.
- Useful after deleting large tensors to make VRAM available to other processes.
- Does NOT fix memory leaks — gc.collect() + empty_cache() together are needed.

gc.collect():
- Triggers Python's cyclic garbage collector immediately.
- Breaks reference cycles that prevent tensor deallocation.
- Combine with torch.cuda.empty_cache() for full cleanup:

import gc
del large_model
gc.collect()
torch.cuda.empty_cache()

GC Tuning for Long Training Runs

Disable automatic GC in tight training loops (prevents GC pauses):
import gc
gc.disable() # Manual control
# ... training loop ...
if step % 100 == 0:
gc.collect() # Periodic manual collection

For inference services, tune GC thresholds to reduce pause frequency:
gc.set_threshold(10000, 20, 20) # Increase collection thresholds

GC in AI systems is the invisible memory management layer that silently determines whether long training runs complete or crash — by understanding Python reference counting, CUDA caching allocation, and their interaction, AI engineers eliminate the class of frustrating "why does training OOM at step N?" bugs that consume hours of debugging time.

Want to learn more?