Lucid

Minimal memory runtime for AI agents. Hindsight’s memory runtime stripped from 239,000 lines to 2,000.

Python 3.11+ License: MIT Tests

Lucid is a memory runtime for AI agents. It extracts facts from conversations, stores them in a SQLite knowledge base, retrieves the most relevant ones on demand, and synthesises answers from memory using multi-turn reasoning.

Three operations. That’s the whole API:

await retain(text, ...)   # extract + store facts
await recall(query, ...)  # retrieve relevant facts
await reflect(query, ...) # synthesise an answer from memory

No Postgres. No Redis. No Kafka. No Docker. Just SQLite and an LLM.


Architecture

flowchart LR
    I["Conversation / text"] --> T["retain()"]
    T --> X["LLM fact extraction"]
    X --> S["SQLiteMemoryStore"]
    Q["Query"] --> R["recall()"]
    S --> R
    E["Embedder / reranker"] --> R
    R --> F["Relevant facts"]
    F --> Y["reflect()"]
    Y --> A["Answer from memory"]

Lucid keeps three operations: retain facts, recall the most relevant memory, then reflect over that memory to answer.


Size comparison

Component Hindsight Lucid Reduction
API surface (FastAPI) ~8,000 LOC 100%
Storage backends (Postgres, Redis, S3) ~15,000 LOC ~430 LOC (SQLite) 97%
Cloud control plane ~20,000 LOC 100%
Ingestion pipelines ~12,000 LOC 479 LOC 96%
Retrieval engine ~10,000 LOC 317 LOC 97%
Synthesis layer ~8,000 LOC 310 LOC 96%
Total ~239,000 LOC ~2,000 LOC 99%

What was cut: REST API, database migrations, multi-tenancy, cloud infrastructure, Kafka ingestion, S3 storage, Postgres/Redis backends, authentication, billing, admin dashboards.

What remains: the memory logic.


Install

# No pip package yet — copy the folder directly
cp -r lucid/ your-project/lucid/

# Dependencies
pip install httpx  # for the built-in LLM client
lucid/
├── types.py      # Data model (Fact, Entity, Bank, Budget, etc.)
├── interfaces.py # Protocol definitions (LLMClient, Embedder, MemoryStore, Reranker)
├── retain.py     # Fact extraction + storage pipeline
├── recall.py     # 4-strategy retrieval with RRF fusion
├── reflect.py    # Multi-turn synthesis over memory
└── store.py      # SQLite-backed MemoryStore

Quick start

import asyncio
from lucid import retain, recall, reflect, SQLiteMemoryStore, Budget

# Minimal implementations of the protocol interfaces
class MyLLM:
    async def complete(self, messages, tools=None, **kwargs):
        # Use OpenAI, Ollama, Anthropic, etc.
        ...

class MyEmbedder:
    async def embed(self, text: str) -> list[float]:
        # Use nomic-embed-text, text-embedding-3-small, etc.
        ...

async def main():
    store = SQLiteMemoryStore("memory.db")
    llm = MyLLM()
    embedder = MyEmbedder()
    bank_id = "user-alice"  # partition per user

    # --- RETAIN ---
    # Extract facts from a conversation turn and store them
    result = await retain(
        text="Alice mentioned she's allergic to shellfish and prefers morning meetings.",
        store=store,
        llm=llm,
        embedder=embedder,
        bank_id=bank_id,
    )
    print(f"Stored {len(result.facts)} facts, resolved {len(result.entities)} entities")

    # --- RECALL ---
    # Retrieve the most relevant facts for a query
    results = await recall(
        query="What are Alice's dietary restrictions?",
        store=store,
        embedder=embedder,
        bank_id=bank_id,
        budget=Budget(max_facts=5, max_tokens=800),
    )
    for fact in results.facts:
        print(f"  [{fact.score:.2f}] {fact.text}")

    # --- REFLECT ---
    # Synthesise an answer from memory (multi-turn reasoning)
    answer = await reflect(
        query="What should I know before scheduling a lunch with Alice?",
        store=store,
        llm=llm,
        embedder=embedder,
        bank_id=bank_id,
    )
    print(answer.response)

asyncio.run(main())

How it works

retain()

Extracts structured facts from free text using an LLM, resolves entities, embeds facts for vector search, and persists everything:

input text
    ↓
LLM extracts Fact objects (text, kind, type, entities, temporal markers)
    ↓
Resolve entities: match against existing entities in store
    ↓
Link facts to resolved entities (cross-retain continuity)
    ↓
Embed fact text for vector retrieval
    ↓
Save Facts, Entities, and MemoryLinks to SQLite

Fact kinds:

Fact types:

recall()

Four retrieval strategies fused with Reciprocal Rank Fusion (RRF):

Strategy What it does
Semantic vector search Cosine similarity on embedded fact text
Keyword search BM25-style text matching
Entity lookup Facts linked to entities matching query
Recency boost Recent facts ranked higher

Results are filtered by a Budget (max facts, max tokens) to keep context windows manageable.

reflect()

Multi-turn agentic synthesis: the LLM reasons over recalled facts to answer a question, calling back into memory for follow-up retrievals when needed:

query
  ↓
recall() → top-k facts
  ↓
LLM synthesises answer (with optional tool call: search_memory)
  ↓
If LLM calls search_memory → recall() again with new query
  ↓
Continue until answer is complete or max_turns reached
  ↓
Return ReflectResult(response, facts_used, turns)

Data model

@dataclass
class Fact:
    id: str
    bank_id: str          # partition key
    text: str             # the fact as a natural language sentence
    kind: FactKind        # EPISODIC | SEMANTIC | PROCEDURAL
    fact_type: FactType   # preference | belief | event | instruction | ...
    embedding: list[float]
    created_at: str
    entity_ids: list[str]

@dataclass
class Entity:
    id: str
    bank_id: str
    name: str
    entity_type: EntityType   # person | place | organization | concept | other
    summary: str
    fact_ids: list[str]

@dataclass
class Budget:
    max_facts: int = 10
    max_tokens: int = 2000

SQLite schema

facts        -- extracted facts with embeddings
entities     -- resolved named entities
memory_links -- fact ↔ entity associations
observations -- raw input log (provenance)

WAL mode enabled. Vector similarity computed in Python — suitable for up to ~50k facts. Swap the MemoryStore protocol for a vector DB backend when you need more scale.


Bring your own components

Every component is a protocol — swap in your preferred implementation:

from lucid.interfaces import Embedder, Reranker, MemoryStore

class ChromaStore(MemoryStore):
    """Use ChromaDB as the backend instead of SQLite."""
    async def save_fact(self, fact): ...
    async def search_by_embedding(self, embedding, bank_id, limit): ...
    # ... implement the other methods

class CohereReranker(Reranker):
    """Re-rank results with Cohere's rerank endpoint."""
    async def rerank(self, query, facts, limit): ...

Tests

# Unit tests (no LLM required)
python3 -m pytest tests/ -q --ignore=tests/test_e2e.py

# End-to-end test (requires LLM + embedder)
OPENAI_API_KEY=sk-... python3 -m pytest tests/test_e2e.py -v -s

113 tests covering fact extraction, entity resolution, recall strategies, RRF fusion, synthesis, SQLite store.


What was removed from Hindsight

Lucid is a targeted extraction of Hindsight’s memory kernel:

The fact extraction logic, entity resolution, four-strategy retrieval, RRF fusion, and multi-turn reflection are preserved.


Part of a suite

Lucid pairs naturally with:


Requirements


License

MIT. See LICENSE.


Acknowledgments

The memory architecture, fact taxonomy, and retrieval design are inspired by Hindsight by Vectorize AI (MIT). Lucid is an independent extraction — not affiliated with Vectorize AI.

Extracted from Hindsight’s memory-runtime design. Not tracking upstream; this is a stable standalone extraction, not a rolling fork.