Get the FREE Ultimate OpenClaw Setup Guide →

data-ontologist

npx machina-cli add skill JasonWarrenUK/claude-code-config/data-ontologist --openclaw
Files (1)
SKILL.md
19.4 KB

Polyglot Persistence Architecture

Architectural guidance for using multiple database technologies together, with emphasis on PostgreSQL/Supabase (relational), Neo4j (graph), and MongoDB (document). Demonstrates when to use each database type and how to integrate them effectively.


When This Skill Applies

Use this skill when:

  • Designing data architecture for new projects
  • Choosing between relational, graph, and document databases
  • Integrating multiple database types
  • Schema design decisions
  • Query optimization across databases
  • Migration strategies
  • Questions about when to use which database paradigm

Core Principle

Start with the graph. Optimise from there.

Most real-world domains are fundamentally about relationships. The graph is the truest representation of how entities connect. Start by thinking in nodes and edges — then decide where to persist based on access patterns and consistency needs.

Right database for right data concern

Don't force all data into one database type. Use:

  • Relational (PostgreSQL/Supabase) - Structured data, transactions, strong consistency
  • Graph (Neo4j) - Relationships as primary concern, traversal queries
  • Document (MongoDB) - Semi-structured data, flexible schemas, nested documents

Graph-First Modelling Process

Before choosing databases, model the domain as a graph:

  1. Identify nodes — What are the entities? (Users, Courses, Organisations, Products)
  2. Identify edges — How do they connect? (ENROLLED_IN, REPORTS_TO, PURCHASED)
  3. Annotate edges — Do relationships carry data? (role, since, quantity)
  4. Spot patterns — Trees? DAGs? Social graphs? Bipartite structures?
  5. Then persist — Given the graph, which parts need relational guarantees, which need traversal, which need flexible schemas?
// Step 1-3: Model the domain as a graph first
(:User)-[:MEMBER_OF {role: 'admin', since: date}]->(:Organisation)
(:User)-[:ENROLLED_IN {status: 'active'}]->(:Course)
(:Course)-[:REQUIRES]->(:Course)
(:User)-[:COMPLETED {score: 0.85}]->(:Module)
(:Module)-[:BELONGS_TO]->(:Course)

Then decide:

  • Users and Organisations → PostgreSQL (transactional, auth, billing)
  • MEMBER_OF, ENROLLED_IN, REQUIRES → Neo4j (traversal, recommendations, paths)
  • Course content, Module materials → MongoDB (flexible nested content)

When to Use Relational (PostgreSQL/Supabase)

Strong Fit

Transactional Data:

  • User accounts and authentication
  • Order processing
  • Financial records
  • Inventory management

Structured Records:

  • Clear schema with defined fields
  • Data fits naturally into tables
  • ACID guarantees required
  • Standard CRUD operations
  • Strong data integrity constraints

Examples:

-- Users table
CREATE TABLE users (
  id UUID PRIMARY KEY,
  email TEXT UNIQUE NOT NULL,
  name TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Orders table
CREATE TABLE orders (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  total DECIMAL(10,2),
  status TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

Weak Fit

Highly Connected Data:

  • Social networks (many-to-many relationships)
  • Recommendation engines
  • Organizational hierarchies with flexible depth

Reason: Joins become expensive, recursive queries complex.

Rapidly Evolving Schemas:

  • Frequently adding new fields
  • Different record types need different fields
  • Exploratory data modeling

Reason: Migrations expensive, rigid structure.


When to Use Graph (Neo4j)

Strong Fit

Relationships as Primary Concern:

  • Social graphs (followers, friends, connections)
  • Recommendation systems (collaborative filtering)
  • Knowledge graphs
  • Access control with inheritance
  • Dependencies and prerequisites

Variable-Depth Traversals:

  • "Find all users within 3 degrees"
  • "Shortest path between entities"
  • "All ancestors in org chart"
  • "Courses needed before advanced topic"

Examples:

// Social connections
(:User)-[:FOLLOWS]->(:User)
(:User)-[:BLOCKED]->(:User)

// Learning paths
(:Course)-[:REQUIRES]->(:Course)
(:User)-[:COMPLETED]->(:Course)

// Organizational structure
(:Person)-[:REPORTS_TO]->(:Person)
(:Person)-[:MEMBER_OF]->(:Team)

Weak Fit

Simple Lookups:

  • User by email
  • Order by ID
  • Product details

Reason: Relational databases excel at indexed lookups.

Large Aggregations:

  • Sum all orders this month
  • Count users by region
  • Analytics dashboards

Reason: SQL aggregations and window functions more powerful.


When to Use Document (MongoDB)

Strong Fit

Semi-Structured Data:

  • Content management systems (blog posts, articles)
  • Product catalogs with varying attributes
  • User-generated content
  • Configuration data
  • API responses that need storage

Nested/Embedded Data:

  • Comments within posts
  • Order items within orders
  • Addresses within user profiles
  • Metadata with varying fields

Flexible Schemas:

  • Rapid prototyping
  • Evolving data models
  • Different document types in same collection
  • Optional fields vary by record

Examples:

// Blog post with embedded comments
{
  _id: ObjectId("..."),
  title: "Getting Started with SvelteKit",
  slug: "getting-started-sveltekit",
  author: {
    id: "user-123",
    name: "Alice"
  },
  content: "...",
  tags: ["svelte", "javascript", "tutorial"],
  comments: [
    {
      id: "comment-1",
      userId: "user-456",
      text: "Great article!",
      createdAt: ISODate("2024-01-15")
    }
  ],
  metadata: {
    views: 1250,
    readingTime: "5 min"
  },
  publishedAt: ISODate("2024-01-10")
}

// Product with varying attributes
{
  _id: ObjectId("..."),
  name: "Laptop",
  category: "electronics",
  price: 999.99,
  specs: {
    cpu: "Intel i7",
    ram: "16GB",
    storage: "512GB SSD",
    screen: "15.6 inch"
  }
}

// Different product type, different fields
{
  _id: ObjectId("..."),
  name: "T-Shirt",
  category: "clothing",
  price: 29.99,
  sizes: ["S", "M", "L", "XL"],
  colors: ["black", "white", "blue"],
  material: "100% cotton"
}

Weak Fit

Complex Transactions:

  • Multi-step financial operations
  • Strong ACID guarantees across documents
  • Complex foreign key relationships

Reason: Relational databases better at multi-document transactions.

Relationship-Heavy Data:

  • Social networks
  • Graph traversals
  • "Friends of friends" queries

Reason: Graph databases handle this natively.

Highly Normalized Data:

  • No duplication tolerance
  • Frequent joins needed
  • Strong referential integrity

Reason: Relational databases enforce this better.


Decision Framework

Question 1: What's the primary data concern?

RELATIONSHIPS → Neo4j (Graph)

  • Social connections
  • Recommendations
  • Dependency trees
  • Path finding

STRUCTURED ENTITIES → PostgreSQL (Relational)

  • User accounts
  • Financial transactions
  • Inventory
  • Orders

DOCUMENTS/CONTENT → MongoDB (Document)

  • Blog posts
  • Product catalogs
  • CMS content
  • API data storage

Question 2: How stable is your schema?

VERY STABLE → PostgreSQL

  • Well-defined entities
  • Clear field types
  • Rare schema changes
  • Strong typing needed

EVOLVING → MongoDB

  • Prototyping phase
  • Frequently adding fields
  • Different record structures
  • Flexible modeling

SCHEMA-OPTIONAL → Neo4j

  • Relationships more important than structure
  • Dynamic properties
  • Graph structure evolves

Question 3: How is data accessed?

BY KEY/ID → PostgreSQL or MongoDB

  • User by email
  • Product by SKU
  • Order by ID

BY TRAVERSAL → Neo4j

  • Friends of friends
  • Shortest path
  • Recommendations

BY CONTENT/QUERY → MongoDB

  • Full-text search
  • Filtering nested documents
  • Flexible queries on varying fields

Question 4: Do you need strong consistency?

ABSOLUTE → PostgreSQL

  • Financial transactions
  • ACID guarantees
  • Multi-step operations

EVENTUAL OKAY → MongoDB or Neo4j

  • Content updates
  • Social interactions
  • Non-critical data

Question 5: Is data naturally nested?

YES → MongoDB

  • Posts with comments
  • Orders with line items
  • Documents with metadata

NO → PostgreSQL

  • Flat entities
  • Many-to-many relationships
  • Normalized structure

Real-World Examples

Example 1: Social Application (WorkWise)

Supabase (PostgreSQL):

  • User authentication and profiles
  • Organization/company records
  • Subscription billing
  • Audit logs

Neo4j:

  • Social connections (followers, following)
  • Content interactions (likes, shares)
  • Recommendation engine
  • Activity feed generation

MongoDB:

  • User-generated posts/content
  • Comments and nested discussions
  • Rich media metadata
  • Activity logs with flexible structure

Why All Three?:

  • Auth needs ACID (Supabase)
  • Social graph needs traversal (Neo4j)
  • Posts need flexible schema (MongoDB)
  • User lookups fast in PG
  • "People you may know" fast in Neo4j
  • Content queries flexible in MongoDB

Example 2: Learning Platform (Rhea)

Supabase (PostgreSQL):

  • User accounts and enrollment
  • Payment transactions
  • Progress tracking (completion %)
  • Subscriptions

Neo4j:

  • Course prerequisites and dependencies
  • Learning path recommendations
  • Skill relationships
  • "What should I learn next?" queries

MongoDB:

  • Course content (lessons, modules)
  • Rich lesson materials (videos, exercises, notes)
  • Student submissions and feedback
  • Curriculum templates

Why All Three?:

  • Enrollment is transactional (Supabase)
  • Prerequisites are graph traversal (Neo4j)
  • Course content is nested documents (MongoDB)
  • Payment requires ACID (PG)
  • Learning paths require traversal (Neo4j)
  • Lessons have varying structures (MongoDB)

Example 3: E-commerce Platform

Supabase (PostgreSQL):

  • User accounts
  • Order transactions
  • Inventory counts
  • Payment records

Neo4j:

  • Product recommendations
  • "Customers who bought X also bought Y"
  • Similar products

MongoDB:

  • Product catalog with varying attributes
  • User reviews and ratings
  • Shopping cart state
  • Product images and metadata

Why All Three?:

  • Orders need transactions (Supabase)
  • Recommendations need graph (Neo4j)
  • Products have varying specs (MongoDB)
  • Inventory atomic updates (PG)
  • "Similar items" fast in graph
  • Product attributes flexible in documents

Example 4: Content Platform (CMS)

Supabase (PostgreSQL):

  • User authentication
  • User roles and permissions
  • Subscription management

MongoDB:

  • Articles, blog posts, pages
  • Media library
  • Draft versions
  • Comments and nested discussions
  • SEO metadata

Neo4j (Optional):

  • Content relationships
  • Tag networks
  • Content recommendations

Why This Mix?:

  • Users need auth (Supabase)
  • Content varies by type (MongoDB)
  • Articles have nested comments (MongoDB)
  • Permissions are relational (PG)
  • Tags can be graphed (Neo4j optional)

Integration Patterns

Pattern 1: Shared Primary Keys

Use same IDs across all databases:

const userId = generateId();

// Supabase - Auth and profile
await supabase.from('users').insert({
  id: userId,
  email,
  name
});

// Neo4j - Social graph node
await neo4j.run(`
  CREATE (u:User {id: $userId, name: $name})
`, { userId, name });

// MongoDB - User preferences
await mongo.collection('user_preferences').insertOne({
  _id: userId,
  theme: 'dark',
  notifications: {
    email: true,
    push: false
  }
});

Pattern 2: Reference by ID

Store references, fetch as needed:

// MongoDB - Blog post
{
  _id: ObjectId("..."),
  title: "My Post",
  authorId: "user-123",  // Reference to PostgreSQL user
  content: "...",
  tags: ["javascript", "svelte"]
}

// Query pattern
const post = await mongo.collection('posts').findOne({ _id });
const author = await supabase
  .from('users')
  .select('*')
  .eq('id', post.authorId)
  .single();

Pattern 3: Embed vs Reference Decision

Embed when:

  • Data accessed together
  • One-to-few relationship
  • Child data not shared
// Good: Embed comments in post
{
  title: "My Post",
  comments: [
    { text: "Great!", userId: "user-123" }
  ]
}

Reference when:

  • Data accessed independently
  • One-to-many or many-to-many
  • Child data shared across parents
// Good: Reference author
{
  title: "My Post",
  authorId: "user-123"  // Author data in PostgreSQL
}

Pattern 4: Event-Driven Sync

Keep databases in sync via events:

// User created in Supabase
supabase.on('INSERT', 'users', async (payload) => {
  const user = payload.record;
  
  // Create in Neo4j
  await createUserNode(user);
  
  // Create preferences in MongoDB
  await mongo.collection('user_preferences').insertOne({
    _id: user.id,
    theme: 'light',
    notifications: {}
  });
});

Pattern 5: Aggregate from Multiple Sources

async function getUserDashboard(userId) {
  // PostgreSQL - Account info
  const account = await supabase
    .from('users')
    .select('*')
    .eq('id', userId)
    .single();
  
  // Neo4j - Social metrics
  const social = await neo4j.run(`
    MATCH (u:User {id: $userId})
    OPTIONAL MATCH (u)-[:FOLLOWS]->(following)
    OPTIONAL MATCH (follower)-[:FOLLOWS]->(u)
    RETURN 
      count(DISTINCT following) as followingCount,
      count(DISTINCT follower) as followerCount
  `, { userId });
  
  // MongoDB - Recent content
  const posts = await mongo
    .collection('posts')
    .find({ authorId: userId })
    .sort({ createdAt: -1 })
    .limit(5)
    .toArray();
  
  return {
    account: account.data,
    social: social.records[0],
    recentPosts: posts
  };
}

Schema Design Patterns

Relational Schema (Supabase)

Normalized Structure:

-- Users table
CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  email TEXT UNIQUE NOT NULL,
  name TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Clear foreign keys
CREATE TABLE orders (
  id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(id),
  total DECIMAL(10,2)
);

Graph Schema (Neo4j)

Labels and Relationships:

// Node labels
(:User)
(:Organization)
(:Course)

// Typed relationships
(:User)-[:MEMBER_OF {role}]->(:Organization)
(:User)-[:FOLLOWS {since}]->(:User)
(:Course)-[:REQUIRES]->(:Course)

Document Schema (MongoDB)

Flexible Structure:

// Embedded approach (1-to-few)
{
  _id: ObjectId("..."),
  userId: "user-123",
  title: "My Blog Post",
  content: "...",
  comments: [  // Embedded
    {
      id: "comment-1",
      userId: "user-456",
      text: "Great post!",
      createdAt: ISODate("2024-01-15")
    }
  ],
  tags: ["tutorial", "javascript"],
  metadata: {
    views: 150,
    likes: 23
  }
}

// Reference approach (1-to-many)
{
  _id: ObjectId("..."),
  title: "E-commerce Order",
  userId: "user-123",  // Reference
  items: [
    { productId: "prod-789", quantity: 2 },  // Reference
    { productId: "prod-456", quantity: 1 }
  ],
  total: 149.99
}

Query Optimization

PostgreSQL Optimization

Indexes:

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

Neo4j Optimization

Constraints and Indexes:

CREATE CONSTRAINT user_id_unique
FOR (u:User) REQUIRE u.id IS UNIQUE;

CREATE INDEX user_email
FOR (u:User) ON (u.email);

MongoDB Optimization

Indexes:

// Single field
db.posts.createIndex({ authorId: 1 });

// Compound index
db.posts.createIndex({ authorId: 1, createdAt: -1 });

// Text search
db.posts.createIndex({ title: "text", content: "text" });

// Embedded field
db.posts.createIndex({ "metadata.views": -1 });

Query Patterns:

// Efficient: Uses index
db.posts.find({ authorId: userId }).sort({ createdAt: -1 });

// Inefficient: Full collection scan
db.posts.find({ "comments.text": /keyword/ });

// Better: Index comment text separately or use aggregation

Anti-Patterns

Don't: Use Document DB for Transactions

// ✗ Bad: Complex multi-document transaction in MongoDB
session.startTransaction();
await orders.insertOne({ userId, total });
await inventory.updateOne({ productId }, { $inc: { stock: -1 } });
await session.commitTransaction();

Why: PostgreSQL designed for this, ACID guarantees stronger.

Don't: Embed Everything

// ✗ Bad: Embedding user data in every post
{
  title: "My Post",
  author: {
    id: "user-123",
    name: "Alice",
    email: "alice@example.com",
    bio: "...",
    avatar: "..."  // Duplicated everywhere!
  }
}

Better: Store author ID, fetch user data separately.

Don't: Use Graph for Simple Lookups

// ✗ Bad: Using Neo4j for key-value lookup
MATCH (u:User {email: $email})
RETURN u;

Why: PostgreSQL or MongoDB faster for indexed lookups.

Don't: Force Relational Patterns into Documents

// ✗ Bad: Normalized MongoDB (defeats the purpose)
// users collection
{ _id: "user-123", name: "Alice" }

// posts collection
{ _id: "post-456", authorId: "user-123", title: "..." }

// comments collection
{ _id: "comment-789", postId: "post-456", text: "..." }

Why: If normalizing this much, use PostgreSQL instead.


Migration Strategies

Starting Point

Begin with PostgreSQL:

  • Authentication
  • Core transactional data
  • Well-understood entities

Add MongoDB When:

  • Content becomes varied
  • Schema evolution frequent
  • Nested data structures emerge

Add Neo4j When:

  • Relationships become complex
  • Traversal queries needed
  • Recommendations required

Data Migration Examples

PostgreSQL → MongoDB:

// Export from PG
const posts = await supabase.from('posts').select('*');

// Transform and insert to MongoDB
await mongo.collection('posts').insertMany(
  posts.map(post => ({
    _id: post.id,
    ...post,
    metadata: {
      views: post.view_count,
      likes: post.like_count
    }
  }))
);

MongoDB → Neo4j (relationships):

// Get follows from MongoDB
const follows = await mongo.collection('follows').find().toArray();

// Create relationships in Neo4j
await neo4j.run(`
  UNWIND $follows AS follow
  MATCH (follower:User {id: follow.followerId})
  MATCH (followed:User {id: follow.followedId})
  CREATE (follower)-[:FOLLOWS {since: follow.createdAt}]->(followed)
`, { follows });

Portfolio Evidence

KSBs Demonstrated:

  • K2: All Stages of Software Development Lifecycle (architecture decisions)
  • K3: Roles and Responsibilities (database selection justification)
  • S1: Analyse Requirements (choosing right tool for problem)
  • S6: Design and Implement Database Systems (polyglot approach)

How to Document:

  • Architecture Decision Records (ADRs) explaining database choices
  • Diagrams showing which data lives where
  • Performance comparisons (before/after changes)
  • Migration scripts and sync strategies
  • Trade-off analysis documentation

Success Criteria

Architecture is successful when:

  • Each database handles what it does best
  • Queries are fast (minimal cross-database operations)
  • Data consistency maintained appropriately
  • Clear reasoning for database placement
  • Can scale databases independently
  • Schema evolution manageable
  • Team understands the architecture

Source

git clone https://github.com/JasonWarrenUK/claude-code-config/blob/main/skills/data-ontologist/SKILL.mdView on GitHub

Overview

Data-ontologist guides when to use relational, graph, and document stores and how to integrate them. It emphasizes starting with a graph model to reveal relationships, then selecting persistence based on access patterns and consistency needs.

How This Skill Works

Model the domain as a graph by identifying nodes, edges, and edge data. Then map the graph pieces to the right stores: core entities and transactions in PostgreSQL, relationships and traversals in Neo4j, and flexible content in MongoDB. Use the graph-first modelling process to drive both schema design and storage decisions.

When to Use It

  • Designing data architecture for a new project
  • Choosing between relational, graph, and document databases
  • Integrating multiple database types in a single system
  • Deciding schema design and cross-store query optimization
  • Planning migration or data-model evolution across databases

Quick Start

  1. Step 1: Model the domain as a graph by identifying nodes, edges, and edge attributes
  2. Step 2: Allocate storage: core entities to PostgreSQL, relationships to Neo4j, content to MongoDB
  3. Step 3: Implement integration and test cross-store queries and migrations

Best Practices

  • Start with graph modeling: identify nodes, edges, and attributes
  • Decide persistence after graph modeling based on access patterns and consistency needs
  • Place relational data in PostgreSQL, relationships in Neo4j, content in MongoDB
  • Plan integration patterns for cross-store queries and migrations
  • Use graph-first patterns like MEMbER_OF, ENROLLED_IN, and REQUIRES to guide schema and indexing

Example Use Cases

  • Education platform: users, organizations, courses modeled as a graph; PostgreSQL handles accounts and orders; Neo4j stores enrollments and recommendations; MongoDB holds course content
  • Corporate org chart: employees and reporting lines represented as a graph for traversal queries; relational data for HR records remains in PostgreSQL
  • E-commerce: product and customer relationships in Neo4j for recommendations; orders in PostgreSQL; product specs in MongoDB
  • Social network: user connections managed in Neo4j with traversal patterns; profile and activity data in PostgreSQL and MongoDB respectively
  • Migration project: gradually migrating legacy relational data to a polyglot setup guided by graph-first modeling

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers