Home Knowledge Base Memory Systems for LLM Applications

Memory Systems for LLM Applications

Why Memory? LLMs are stateless by default. Memory systems maintain context across conversation turns and sessions, enabling coherent multi-turn interactions.

Memory Types

Short-Term (Conversation Buffer) Store recent messages in full:

class ConversationMemory:
    def __init__(self):
        self.messages = []

    def add(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})

    def get_messages(self) -> list:
        return self.messages

Window Memory Keep only last N turns:

class WindowMemory:
    def __init__(self, window_size: int = 10):
        self.messages = []
        self.window_size = window_size

    def add(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        if len(self.messages) > self.window_size:
            self.messages = self.messages[-self.window_size:]

Summary Memory Periodically summarize older messages:

class SummaryMemory:
    def __init__(self, llm):
        self.llm = llm
        self.summary = ""
        self.recent_messages = []

    def compress(self):
        if len(self.recent_messages) > 10:
            self.summary = self.llm.generate(
                f"Summarize: {self.recent_messages[:5]}"
            )
            self.recent_messages = self.recent_messages[5:]

Entity Memory Track entities mentioned in conversation:

entities = {
    "John": {"role": "customer", "mentioned": ["order #123"]},
    "Project Alpha": {"status": "in progress", "deadline": "Q2"}
}

Long-Term Memory

Vector Storage Store and retrieve past interactions by similarity:

# Store interaction embedding
embedding = embed(conversation_summary)
vector_store.add(embedding, metadata={"session_id": ...})

# Retrieve relevant history
relevant = vector_store.query(embed(current_query), top_k=5)

Key-Value Store Store structured information:

Memory in Practice

Memory TypeUse CaseTradeoff
Full bufferShort convosToken limit
WindowLong convosLoses early context
SummaryVery long convosCompression loss
VectorCross-sessionRetrieval latency
EntityFact trackingMaintenance overhead

Best Practices

memoryconversation historycontext

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.