Skip to main content

Database Automation with LLMs

KnowledgeFlowDB provides a powerful automation system that lets you trigger LLM operations automatically when graph events occur. This enables intelligent, self-maintaining knowledge graphs that can:

  • 🤖 Auto-summarize documents as they're added
  • 🔍 Auto-embed content for semantic search
  • 🏷️ Auto-classify and tag nodes
  • 🔗 Auto-extract entities and relationships
  • Custom operations with your own prompts

How It Works

The automation system follows a simple trigger → action pattern:

  1. Trigger: An event occurs in your graph (node created, updated, etc.)
  2. Filter: Check if the event matches your rule's criteria
  3. Execute: Run an LLM operation on the matching data
  4. Output: Store the result back in the graph
graph LR
A[Graph Event] --> B{Matches Rule?}
B -->|Yes| C[Extract Input]
C --> D[Call LLM]
D --> E[Store Output]
B -->|No| F[Ignore]

Key Concepts

Triggers

Events that can activate automation rules:

  • node_created - A new node is added
  • node_updated - Node properties change
  • node_deleted - Node is removed
  • edge_created - New relationship created
  • edge_deleted - Relationship removed

Filters

Narrow down which events trigger your rule:

  • Labels: Only match nodes/edges with specific labels (e.g., ["Document", "Article"])
  • Properties: Only match when specific properties exist (e.g., has_property: "content")
  • Custom: Advanced filtering with JSON criteria

Operations

LLM tasks to perform:

  • summarize - Generate concise summaries
  • generate_embedding - Create vector embeddings
  • extract_entities - Pull out named entities
  • classify - Categorize content
  • custom - Run your own prompt template

Output Strategies

How to save LLM results:

  • update_property - Add/update a property on the same node
  • create_node - Create a new node with the result
  • create_edge - Create a relationship to another node

Supported LLM Providers

ProviderModelsUse Case
Googlegemini-2.5-flash-preview-09-2025 (recommended)Fast, accurate, cost-effective
OpenAITBDHigh-quality text generation
AnthropicTBDAgentic coding
Best Practice

For production use, we recommend Gemini 2.5 Flash (gemini-2.5-flash-preview-09-2025):

  • ⚡ Fastest response times (under 500ms avg)
  • 💰 Most cost-effective ($0.075 per 1M tokens)
  • ✅ Validated in production with 3-node ScyllaDB cluster

Interactive Playground

Try creating and managing automation rules right here in the docs! You can:

  • ✅ Use the example database to see how automation works
  • ✅ Connect to your own database (local or production)
  • ✅ Create rules from templates or build custom rules
  • ✅ Monitor executions in real-time

Automation Playground

Active Rules (0)

No automation rules yet

Click "Create Rule" to get started

Recent Executions (0)

No executions yet

Executions will appear here when rules are triggered

Connection Options

The playground connects to a production 3-node ScyllaDB cluster by default. This lets you:

  • See real automation rules in action
  • Experiment without setting up infrastructure
  • Learn by example with pre-configured rules

Just click "Create Rule" and start experimenting!

Option 2: Your Own Database

Click "Connect Your DB" to use your own KnowledgeFlowDB instance:

Local 3-Node Cluster:

API Endpoint: http://localhost:8080/api/v1
API Key: YOUR_API_KEY

Production Cluster:

API Endpoint: http://35.223.203.166/api/v1
API Key: YOUR_API_KEY
info

Your API key is stored locally in your browser only. It's never sent to our documentation server.

Quick Start: Your First Rule

Let's create a rule that auto-summarizes documents:

1. Set Trigger

  • Trigger Type: node_created
  • Match Labels: Document
  • Must Have Property: content

2. Configure LLM

  • Operation: summarize
  • Provider: google
  • Model: gemini-2.5-flash-preview-09-2025
  • Prompt Template: Summarize the following text in 2-3 sentences:\n\n{content}
  • Input Property: content

3. Set Output

  • Strategy: update_property
  • Output Property: summary

4. Activate

  • Is Active: ✅ Yes

That's it! Now whenever you create a node with label Document and a content property, it will automatically get a summary property added.

Rule Templates

Use these pre-built templates as starting points:

Auto-summarize Documents

Automatically generate summaries for new documents using Gemini.

Best for: Blog posts, articles, documentation

Trigger: Node created with label Document or Article and property content

Output: Adds summary property with 2-3 sentence summary

Auto-embed Code Files

Generate embeddings for newly created code files for semantic search.

Best for: Code repositories, documentation

Trigger: Node created with label File and property content

Output: Adds embedding property with 1024-dim vector

Extract Entities

Extract named entities (people, places, organizations) from content.

Best for: News articles, research papers

Trigger: Node created with label Document and property content

Output: Creates new Entity nodes linked to the document

Classify Content

Automatically categorize documents into predefined categories.

Best for: Content management, organization

Trigger: Node created with label Document and property content

Output: Adds category property (Technical, Business, Research, etc.)

Best Practices

1. Start with Templates

Use the built-in templates and customize them for your needs. This ensures you start with validated configurations.

2. Test with Inactive Rules First

Create rules with is_active: false, test manually, then activate once validated.

3. Monitor Token Usage

Check the executions table to track:

  • Token consumption
  • Costs per execution
  • Success/failure rates

4. Use Specific Filters

Narrow down triggers with labels and property filters to avoid unnecessary LLM calls:

✅ Good:

{
"labels": ["Document"],
"has_property": "content"
}

❌ Too broad:

{
"labels": [] // Matches ALL nodes!
}

5. Set Reasonable Limits

Configure max_tokens based on your operation:

  • Summaries: 100-200 tokens
  • Embeddings: No limit needed
  • Entity extraction: 300-500 tokens
  • Classification: 10-50 tokens

6. Handle Failures Gracefully

Monitor execution logs for failures and adjust:

  • Prompt templates that are too vague
  • Token limits that are too low
  • Missing properties in input data

API Reference

See the Automation API Reference for complete endpoint documentation.

Next Steps

  • 📖 Read the Automation API Reference
  • 🔧 Set up a local 3-node cluster (see deployment docs)
  • 🚀 Deploy to production (see deployment docs)

Need Help?