Database Automation with LLMs

KnowledgeFlowDB provides a powerful automation system that lets you trigger LLM operations automatically when graph events occur. This enables intelligent, self-maintaining knowledge graphs that can:

🤖 Auto-summarize documents as they're added
🔍 Auto-embed content for semantic search
🏷️ Auto-classify and tag nodes
🔗 Auto-extract entities and relationships
✨ Custom operations with your own prompts

How It Works

The automation system follows a simple trigger → action pattern:

Trigger: An event occurs in your graph (node created, updated, etc.)
Filter: Check if the event matches your rule's criteria
Execute: Run an LLM operation on the matching data
Output: Store the result back in the graph

graph LR
    A[Graph Event] --> B{Matches Rule?}
    B -->|Yes| C[Extract Input]
    C --> D[Call LLM]
    D --> E[Store Output]
    B -->|No| F[Ignore]

Key Concepts

Triggers

Events that can activate automation rules:

node_created - A new node is added
node_updated - Node properties change
node_deleted - Node is removed
edge_created - New relationship created
edge_deleted - Relationship removed

Filters

Narrow down which events trigger your rule:

Labels: Only match nodes/edges with specific labels (e.g., ["Document", "Article"])
Properties: Only match when specific properties exist (e.g., has_property: "content")
Custom: Advanced filtering with JSON criteria

Operations

LLM tasks to perform:

summarize - Generate concise summaries
generate_embedding - Create vector embeddings
extract_entities - Pull out named entities
classify - Categorize content
custom - Run your own prompt template

Output Strategies

How to save LLM results:

update_property - Add/update a property on the same node
create_node - Create a new node with the result
create_edge - Create a relationship to another node

Supported LLM Providers

Provider	Models	Use Case
Google	`gemini-2.5-flash-preview-09-2025` (recommended)	Fast, accurate, cost-effective
OpenAI	TBD	High-quality text generation
Anthropic	TBD	Agentic coding

Best Practice

For production use, we recommend Gemini 2.5 Flash (gemini-2.5-flash-preview-09-2025):

⚡ Fastest response times (under 500ms avg)
💰 Most cost-effective ($0.075 per 1M tokens)
✅ Validated in production with 3-node ScyllaDB cluster

Interactive Playground

Try creating and managing automation rules right here in the docs! You can:

✅ Use the example database to see how automation works
✅ Connect to your own database (local or production)
✅ Create rules from templates or build custom rules
✅ Monitor executions in real-time

Automation Playground

Active Rules (0)

No automation rules yet

Click "Create Rule" to get started

Recent Executions (0)

No executions yet

Executions will appear here when rules are triggered

Connection Options

Option 1: Example Database (Recommended for Learning)

The playground connects to a production 3-node ScyllaDB cluster by default. This lets you:

See real automation rules in action
Experiment without setting up infrastructure
Learn by example with pre-configured rules

Just click "Create Rule" and start experimenting!

Option 2: Your Own Database

Click "Connect Your DB" to use your own KnowledgeFlowDB instance:

Local 3-Node Cluster:

API Endpoint: http://localhost:8080/api/v1
API Key: YOUR_API_KEY

Production Cluster:

API Endpoint: http://35.223.203.166/api/v1
API Key: YOUR_API_KEY

info

Your API key is stored locally in your browser only. It's never sent to our documentation server.

Quick Start: Your First Rule

Let's create a rule that auto-summarizes documents:

1. Set Trigger

Trigger Type: node_created
Match Labels: Document
Must Have Property: content

2. Configure LLM

Operation: summarize
Provider: google
Model: gemini-2.5-flash-preview-09-2025
Prompt Template: Summarize the following text in 2-3 sentences:\n\n{content}
Input Property: content

3. Set Output

Strategy: update_property
Output Property: summary

4. Activate

Is Active: ✅ Yes

That's it! Now whenever you create a node with label Document and a content property, it will automatically get a summary property added.

Rule Templates

Use these pre-built templates as starting points:

Auto-summarize Documents

Automatically generate summaries for new documents using Gemini.

Best for: Blog posts, articles, documentation

Trigger: Node created with label Document or Article and property content

Output: Adds summary property with 2-3 sentence summary

Auto-embed Code Files

Generate embeddings for newly created code files for semantic search.

Best for: Code repositories, documentation

Trigger: Node created with label File and property content

Output: Adds embedding property with 1024-dim vector

Extract Entities

Extract named entities (people, places, organizations) from content.

Best for: News articles, research papers

Trigger: Node created with label Document and property content

Output: Creates new Entity nodes linked to the document

Classify Content

Automatically categorize documents into predefined categories.

Best for: Content management, organization

Trigger: Node created with label Document and property content

Output: Adds category property (Technical, Business, Research, etc.)

Best Practices

1. Start with Templates

Use the built-in templates and customize them for your needs. This ensures you start with validated configurations.

2. Test with Inactive Rules First

Create rules with is_active: false, test manually, then activate once validated.

3. Monitor Token Usage

Check the executions table to track:

Token consumption
Costs per execution
Success/failure rates

4. Use Specific Filters

Narrow down triggers with labels and property filters to avoid unnecessary LLM calls:

✅ Good:

{
  "labels": ["Document"],
  "has_property": "content"
}

❌ Too broad:

{
  "labels": []  // Matches ALL nodes!
}

5. Set Reasonable Limits

Configure max_tokens based on your operation:

Summaries: 100-200 tokens
Embeddings: No limit needed
Entity extraction: 300-500 tokens
Classification: 10-50 tokens

6. Handle Failures Gracefully

Monitor execution logs for failures and adjust:

Prompt templates that are too vague
Token limits that are too low
Missing properties in input data

API Reference

See the Automation API Reference for complete endpoint documentation.

Next Steps

📖 Read the Automation API Reference
🔧 Set up a local 3-node cluster (see deployment docs)
🚀 Deploy to production (see deployment docs)

Need Help?

🐛 Report issues

How It Works​

Key Concepts​

Triggers​

Filters​

Operations​

Output Strategies​

Supported LLM Providers​

Interactive Playground​

Automation Playground

Active Rules (0)

Recent Executions (0)

Connection Options​

Option 1: Example Database (Recommended for Learning)​

Option 2: Your Own Database​

Quick Start: Your First Rule​

1. Set Trigger​

2. Configure LLM​

3. Set Output​

4. Activate​

Rule Templates​

Auto-summarize Documents​

Auto-embed Code Files​

Extract Entities​

Classify Content​

Best Practices​

1. Start with Templates​

2. Test with Inactive Rules First​

3. Monitor Token Usage​

4. Use Specific Filters​

5. Set Reasonable Limits​

6. Handle Failures Gracefully​

API Reference​

Next Steps​

Need Help?​