Skip to main content

Architecture Overview

KnowledgeFlowDB is built with a layered architecture designed for performance, scalability, and flexibility.

Layer Architecture

┌─────────────────────────────────────┐
│ kfdb CLI │
│ (User Interface) │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ kfdb-query │
│ (KQL Parser & Query Executor) │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ kfdb-graph │
│ (Graph Storage & Traversal) │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ kfdb-vector │
│ (HNSW Vector Search) │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ kfdb-storage │
│ (Storage Engine Interface) │
└─────────────────────────────────────┘

┌──────────────┬──────────────┬────────────┐
│ Memory │ RocksDB │ ScyllaDB │
│ (Testing) │ (Local) │(Production)│
└──────────────┴──────────────┴────────────┘

Core Components

kfdb-core

Foundation types and traits used across all crates:

  • NodeId, EdgeId: Type-safe identifiers
  • Value: Comprehensive value types
  • Embedding: Semantic vectors (1024-1536 dimensions)
  • Timestamp: Time-aware operations

kfdb-storage

Abstract storage layer with three implementations:

  • Memory: In-memory BTreeMap (fastest, testing)
  • RocksDB: Embedded LSM-tree (local, persistent)
  • ScyllaDB: Distributed NoSQL (production, scalable)

kfdb-graph

Graph data structures and algorithms:

  • AdjacencyList: Efficient in-memory graph
  • Traversal: BFS, DFS, shortest path
  • Multi-hop: N-hop neighborhoods, path finding

kfdb-vector

Vector similarity search:

  • HNSW Index: Fast approximate nearest neighbor search
  • Multiple Metrics: Euclidean, Cosine, Manhattan, Dot Product
  • Configurable: Tune M, efConstruction, efSearch

kfdb-query

Query parsing and execution:

  • KQL Parser: Pest-based grammar
  • Query Executor: Match, filter, project, sort
  • Optimizer: Filter pushdown, limit pushdown

kfdb (CLI)

User-facing command-line interface:

  • Interactive REPL with history
  • Direct query execution
  • Runtime backend selection

Design Principles

  1. Modularity: Each crate has a single responsibility
  2. Performance: Microsecond latencies for core operations
  3. Scalability: Linear scaling with nodes (ScyllaDB)
  4. Flexibility: Runtime backend selection
  5. Testing: Comprehensive test coverage (264 tests)

Data Flow

Query Execution

User Query (KQL)

Parser (pest) → AST

Query Executor → Match patterns

Graph Traversal → Find nodes/edges

Filter → Apply WHERE clause

Project → SELECT columns

Sort & Paginate → ORDER BY, LIMIT

Result Set

Next Steps