data-ontologist
npx machina-cli add skill JasonWarrenUK/claude-code-config/data-ontologist --openclawPolyglot Persistence Architecture
Architectural guidance for using multiple database technologies together, with emphasis on PostgreSQL/Supabase (relational), Neo4j (graph), and MongoDB (document). Demonstrates when to use each database type and how to integrate them effectively.
When This Skill Applies
Use this skill when:
- Designing data architecture for new projects
- Choosing between relational, graph, and document databases
- Integrating multiple database types
- Schema design decisions
- Query optimization across databases
- Migration strategies
- Questions about when to use which database paradigm
Core Principle
Start with the graph. Optimise from there.
Most real-world domains are fundamentally about relationships. The graph is the truest representation of how entities connect. Start by thinking in nodes and edges — then decide where to persist based on access patterns and consistency needs.
Right database for right data concern
Don't force all data into one database type. Use:
- Relational (PostgreSQL/Supabase) - Structured data, transactions, strong consistency
- Graph (Neo4j) - Relationships as primary concern, traversal queries
- Document (MongoDB) - Semi-structured data, flexible schemas, nested documents
Graph-First Modelling Process
Before choosing databases, model the domain as a graph:
- Identify nodes — What are the entities? (Users, Courses, Organisations, Products)
- Identify edges — How do they connect? (ENROLLED_IN, REPORTS_TO, PURCHASED)
- Annotate edges — Do relationships carry data? (role, since, quantity)
- Spot patterns — Trees? DAGs? Social graphs? Bipartite structures?
- Then persist — Given the graph, which parts need relational guarantees, which need traversal, which need flexible schemas?
// Step 1-3: Model the domain as a graph first
(:User)-[:MEMBER_OF {role: 'admin', since: date}]->(:Organisation)
(:User)-[:ENROLLED_IN {status: 'active'}]->(:Course)
(:Course)-[:REQUIRES]->(:Course)
(:User)-[:COMPLETED {score: 0.85}]->(:Module)
(:Module)-[:BELONGS_TO]->(:Course)
Then decide:
- Users and Organisations → PostgreSQL (transactional, auth, billing)
- MEMBER_OF, ENROLLED_IN, REQUIRES → Neo4j (traversal, recommendations, paths)
- Course content, Module materials → MongoDB (flexible nested content)
When to Use Relational (PostgreSQL/Supabase)
Strong Fit
Transactional Data:
- User accounts and authentication
- Order processing
- Financial records
- Inventory management
Structured Records:
- Clear schema with defined fields
- Data fits naturally into tables
- ACID guarantees required
- Standard CRUD operations
- Strong data integrity constraints
Examples:
-- Users table
CREATE TABLE users (
id UUID PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
name TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Orders table
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
total DECIMAL(10,2),
status TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
Weak Fit
Highly Connected Data:
- Social networks (many-to-many relationships)
- Recommendation engines
- Organizational hierarchies with flexible depth
Reason: Joins become expensive, recursive queries complex.
Rapidly Evolving Schemas:
- Frequently adding new fields
- Different record types need different fields
- Exploratory data modeling
Reason: Migrations expensive, rigid structure.
When to Use Graph (Neo4j)
Strong Fit
Relationships as Primary Concern:
- Social graphs (followers, friends, connections)
- Recommendation systems (collaborative filtering)
- Knowledge graphs
- Access control with inheritance
- Dependencies and prerequisites
Variable-Depth Traversals:
- "Find all users within 3 degrees"
- "Shortest path between entities"
- "All ancestors in org chart"
- "Courses needed before advanced topic"
Examples:
// Social connections
(:User)-[:FOLLOWS]->(:User)
(:User)-[:BLOCKED]->(:User)
// Learning paths
(:Course)-[:REQUIRES]->(:Course)
(:User)-[:COMPLETED]->(:Course)
// Organizational structure
(:Person)-[:REPORTS_TO]->(:Person)
(:Person)-[:MEMBER_OF]->(:Team)
Weak Fit
Simple Lookups:
- User by email
- Order by ID
- Product details
Reason: Relational databases excel at indexed lookups.
Large Aggregations:
- Sum all orders this month
- Count users by region
- Analytics dashboards
Reason: SQL aggregations and window functions more powerful.
When to Use Document (MongoDB)
Strong Fit
Semi-Structured Data:
- Content management systems (blog posts, articles)
- Product catalogs with varying attributes
- User-generated content
- Configuration data
- API responses that need storage
Nested/Embedded Data:
- Comments within posts
- Order items within orders
- Addresses within user profiles
- Metadata with varying fields
Flexible Schemas:
- Rapid prototyping
- Evolving data models
- Different document types in same collection
- Optional fields vary by record
Examples:
// Blog post with embedded comments
{
_id: ObjectId("..."),
title: "Getting Started with SvelteKit",
slug: "getting-started-sveltekit",
author: {
id: "user-123",
name: "Alice"
},
content: "...",
tags: ["svelte", "javascript", "tutorial"],
comments: [
{
id: "comment-1",
userId: "user-456",
text: "Great article!",
createdAt: ISODate("2024-01-15")
}
],
metadata: {
views: 1250,
readingTime: "5 min"
},
publishedAt: ISODate("2024-01-10")
}
// Product with varying attributes
{
_id: ObjectId("..."),
name: "Laptop",
category: "electronics",
price: 999.99,
specs: {
cpu: "Intel i7",
ram: "16GB",
storage: "512GB SSD",
screen: "15.6 inch"
}
}
// Different product type, different fields
{
_id: ObjectId("..."),
name: "T-Shirt",
category: "clothing",
price: 29.99,
sizes: ["S", "M", "L", "XL"],
colors: ["black", "white", "blue"],
material: "100% cotton"
}
Weak Fit
Complex Transactions:
- Multi-step financial operations
- Strong ACID guarantees across documents
- Complex foreign key relationships
Reason: Relational databases better at multi-document transactions.
Relationship-Heavy Data:
- Social networks
- Graph traversals
- "Friends of friends" queries
Reason: Graph databases handle this natively.
Highly Normalized Data:
- No duplication tolerance
- Frequent joins needed
- Strong referential integrity
Reason: Relational databases enforce this better.
Decision Framework
Question 1: What's the primary data concern?
RELATIONSHIPS → Neo4j (Graph)
- Social connections
- Recommendations
- Dependency trees
- Path finding
STRUCTURED ENTITIES → PostgreSQL (Relational)
- User accounts
- Financial transactions
- Inventory
- Orders
DOCUMENTS/CONTENT → MongoDB (Document)
- Blog posts
- Product catalogs
- CMS content
- API data storage
Question 2: How stable is your schema?
VERY STABLE → PostgreSQL
- Well-defined entities
- Clear field types
- Rare schema changes
- Strong typing needed
EVOLVING → MongoDB
- Prototyping phase
- Frequently adding fields
- Different record structures
- Flexible modeling
SCHEMA-OPTIONAL → Neo4j
- Relationships more important than structure
- Dynamic properties
- Graph structure evolves
Question 3: How is data accessed?
BY KEY/ID → PostgreSQL or MongoDB
- User by email
- Product by SKU
- Order by ID
BY TRAVERSAL → Neo4j
- Friends of friends
- Shortest path
- Recommendations
BY CONTENT/QUERY → MongoDB
- Full-text search
- Filtering nested documents
- Flexible queries on varying fields
Question 4: Do you need strong consistency?
ABSOLUTE → PostgreSQL
- Financial transactions
- ACID guarantees
- Multi-step operations
EVENTUAL OKAY → MongoDB or Neo4j
- Content updates
- Social interactions
- Non-critical data
Question 5: Is data naturally nested?
YES → MongoDB
- Posts with comments
- Orders with line items
- Documents with metadata
NO → PostgreSQL
- Flat entities
- Many-to-many relationships
- Normalized structure
Real-World Examples
Example 1: Social Application (WorkWise)
Supabase (PostgreSQL):
- User authentication and profiles
- Organization/company records
- Subscription billing
- Audit logs
Neo4j:
- Social connections (followers, following)
- Content interactions (likes, shares)
- Recommendation engine
- Activity feed generation
MongoDB:
- User-generated posts/content
- Comments and nested discussions
- Rich media metadata
- Activity logs with flexible structure
Why All Three?:
- Auth needs ACID (Supabase)
- Social graph needs traversal (Neo4j)
- Posts need flexible schema (MongoDB)
- User lookups fast in PG
- "People you may know" fast in Neo4j
- Content queries flexible in MongoDB
Example 2: Learning Platform (Rhea)
Supabase (PostgreSQL):
- User accounts and enrollment
- Payment transactions
- Progress tracking (completion %)
- Subscriptions
Neo4j:
- Course prerequisites and dependencies
- Learning path recommendations
- Skill relationships
- "What should I learn next?" queries
MongoDB:
- Course content (lessons, modules)
- Rich lesson materials (videos, exercises, notes)
- Student submissions and feedback
- Curriculum templates
Why All Three?:
- Enrollment is transactional (Supabase)
- Prerequisites are graph traversal (Neo4j)
- Course content is nested documents (MongoDB)
- Payment requires ACID (PG)
- Learning paths require traversal (Neo4j)
- Lessons have varying structures (MongoDB)
Example 3: E-commerce Platform
Supabase (PostgreSQL):
- User accounts
- Order transactions
- Inventory counts
- Payment records
Neo4j:
- Product recommendations
- "Customers who bought X also bought Y"
- Similar products
MongoDB:
- Product catalog with varying attributes
- User reviews and ratings
- Shopping cart state
- Product images and metadata
Why All Three?:
- Orders need transactions (Supabase)
- Recommendations need graph (Neo4j)
- Products have varying specs (MongoDB)
- Inventory atomic updates (PG)
- "Similar items" fast in graph
- Product attributes flexible in documents
Example 4: Content Platform (CMS)
Supabase (PostgreSQL):
- User authentication
- User roles and permissions
- Subscription management
MongoDB:
- Articles, blog posts, pages
- Media library
- Draft versions
- Comments and nested discussions
- SEO metadata
Neo4j (Optional):
- Content relationships
- Tag networks
- Content recommendations
Why This Mix?:
- Users need auth (Supabase)
- Content varies by type (MongoDB)
- Articles have nested comments (MongoDB)
- Permissions are relational (PG)
- Tags can be graphed (Neo4j optional)
Integration Patterns
Pattern 1: Shared Primary Keys
Use same IDs across all databases:
const userId = generateId();
// Supabase - Auth and profile
await supabase.from('users').insert({
id: userId,
email,
name
});
// Neo4j - Social graph node
await neo4j.run(`
CREATE (u:User {id: $userId, name: $name})
`, { userId, name });
// MongoDB - User preferences
await mongo.collection('user_preferences').insertOne({
_id: userId,
theme: 'dark',
notifications: {
email: true,
push: false
}
});
Pattern 2: Reference by ID
Store references, fetch as needed:
// MongoDB - Blog post
{
_id: ObjectId("..."),
title: "My Post",
authorId: "user-123", // Reference to PostgreSQL user
content: "...",
tags: ["javascript", "svelte"]
}
// Query pattern
const post = await mongo.collection('posts').findOne({ _id });
const author = await supabase
.from('users')
.select('*')
.eq('id', post.authorId)
.single();
Pattern 3: Embed vs Reference Decision
Embed when:
- Data accessed together
- One-to-few relationship
- Child data not shared
// Good: Embed comments in post
{
title: "My Post",
comments: [
{ text: "Great!", userId: "user-123" }
]
}
Reference when:
- Data accessed independently
- One-to-many or many-to-many
- Child data shared across parents
// Good: Reference author
{
title: "My Post",
authorId: "user-123" // Author data in PostgreSQL
}
Pattern 4: Event-Driven Sync
Keep databases in sync via events:
// User created in Supabase
supabase.on('INSERT', 'users', async (payload) => {
const user = payload.record;
// Create in Neo4j
await createUserNode(user);
// Create preferences in MongoDB
await mongo.collection('user_preferences').insertOne({
_id: user.id,
theme: 'light',
notifications: {}
});
});
Pattern 5: Aggregate from Multiple Sources
async function getUserDashboard(userId) {
// PostgreSQL - Account info
const account = await supabase
.from('users')
.select('*')
.eq('id', userId)
.single();
// Neo4j - Social metrics
const social = await neo4j.run(`
MATCH (u:User {id: $userId})
OPTIONAL MATCH (u)-[:FOLLOWS]->(following)
OPTIONAL MATCH (follower)-[:FOLLOWS]->(u)
RETURN
count(DISTINCT following) as followingCount,
count(DISTINCT follower) as followerCount
`, { userId });
// MongoDB - Recent content
const posts = await mongo
.collection('posts')
.find({ authorId: userId })
.sort({ createdAt: -1 })
.limit(5)
.toArray();
return {
account: account.data,
social: social.records[0],
recentPosts: posts
};
}
Schema Design Patterns
Relational Schema (Supabase)
Normalized Structure:
-- Users table
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
email TEXT UNIQUE NOT NULL,
name TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Clear foreign keys
CREATE TABLE orders (
id UUID PRIMARY KEY,
user_id UUID REFERENCES users(id),
total DECIMAL(10,2)
);
Graph Schema (Neo4j)
Labels and Relationships:
// Node labels
(:User)
(:Organization)
(:Course)
// Typed relationships
(:User)-[:MEMBER_OF {role}]->(:Organization)
(:User)-[:FOLLOWS {since}]->(:User)
(:Course)-[:REQUIRES]->(:Course)
Document Schema (MongoDB)
Flexible Structure:
// Embedded approach (1-to-few)
{
_id: ObjectId("..."),
userId: "user-123",
title: "My Blog Post",
content: "...",
comments: [ // Embedded
{
id: "comment-1",
userId: "user-456",
text: "Great post!",
createdAt: ISODate("2024-01-15")
}
],
tags: ["tutorial", "javascript"],
metadata: {
views: 150,
likes: 23
}
}
// Reference approach (1-to-many)
{
_id: ObjectId("..."),
title: "E-commerce Order",
userId: "user-123", // Reference
items: [
{ productId: "prod-789", quantity: 2 }, // Reference
{ productId: "prod-456", quantity: 1 }
],
total: 149.99
}
Query Optimization
PostgreSQL Optimization
Indexes:
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);
Neo4j Optimization
Constraints and Indexes:
CREATE CONSTRAINT user_id_unique
FOR (u:User) REQUIRE u.id IS UNIQUE;
CREATE INDEX user_email
FOR (u:User) ON (u.email);
MongoDB Optimization
Indexes:
// Single field
db.posts.createIndex({ authorId: 1 });
// Compound index
db.posts.createIndex({ authorId: 1, createdAt: -1 });
// Text search
db.posts.createIndex({ title: "text", content: "text" });
// Embedded field
db.posts.createIndex({ "metadata.views": -1 });
Query Patterns:
// Efficient: Uses index
db.posts.find({ authorId: userId }).sort({ createdAt: -1 });
// Inefficient: Full collection scan
db.posts.find({ "comments.text": /keyword/ });
// Better: Index comment text separately or use aggregation
Anti-Patterns
Don't: Use Document DB for Transactions
// ✗ Bad: Complex multi-document transaction in MongoDB
session.startTransaction();
await orders.insertOne({ userId, total });
await inventory.updateOne({ productId }, { $inc: { stock: -1 } });
await session.commitTransaction();
Why: PostgreSQL designed for this, ACID guarantees stronger.
Don't: Embed Everything
// ✗ Bad: Embedding user data in every post
{
title: "My Post",
author: {
id: "user-123",
name: "Alice",
email: "alice@example.com",
bio: "...",
avatar: "..." // Duplicated everywhere!
}
}
Better: Store author ID, fetch user data separately.
Don't: Use Graph for Simple Lookups
// ✗ Bad: Using Neo4j for key-value lookup
MATCH (u:User {email: $email})
RETURN u;
Why: PostgreSQL or MongoDB faster for indexed lookups.
Don't: Force Relational Patterns into Documents
// ✗ Bad: Normalized MongoDB (defeats the purpose)
// users collection
{ _id: "user-123", name: "Alice" }
// posts collection
{ _id: "post-456", authorId: "user-123", title: "..." }
// comments collection
{ _id: "comment-789", postId: "post-456", text: "..." }
Why: If normalizing this much, use PostgreSQL instead.
Migration Strategies
Starting Point
Begin with PostgreSQL:
- Authentication
- Core transactional data
- Well-understood entities
Add MongoDB When:
- Content becomes varied
- Schema evolution frequent
- Nested data structures emerge
Add Neo4j When:
- Relationships become complex
- Traversal queries needed
- Recommendations required
Data Migration Examples
PostgreSQL → MongoDB:
// Export from PG
const posts = await supabase.from('posts').select('*');
// Transform and insert to MongoDB
await mongo.collection('posts').insertMany(
posts.map(post => ({
_id: post.id,
...post,
metadata: {
views: post.view_count,
likes: post.like_count
}
}))
);
MongoDB → Neo4j (relationships):
// Get follows from MongoDB
const follows = await mongo.collection('follows').find().toArray();
// Create relationships in Neo4j
await neo4j.run(`
UNWIND $follows AS follow
MATCH (follower:User {id: follow.followerId})
MATCH (followed:User {id: follow.followedId})
CREATE (follower)-[:FOLLOWS {since: follow.createdAt}]->(followed)
`, { follows });
Portfolio Evidence
KSBs Demonstrated:
- K2: All Stages of Software Development Lifecycle (architecture decisions)
- K3: Roles and Responsibilities (database selection justification)
- S1: Analyse Requirements (choosing right tool for problem)
- S6: Design and Implement Database Systems (polyglot approach)
How to Document:
- Architecture Decision Records (ADRs) explaining database choices
- Diagrams showing which data lives where
- Performance comparisons (before/after changes)
- Migration scripts and sync strategies
- Trade-off analysis documentation
Success Criteria
Architecture is successful when:
- Each database handles what it does best
- Queries are fast (minimal cross-database operations)
- Data consistency maintained appropriately
- Clear reasoning for database placement
- Can scale databases independently
- Schema evolution manageable
- Team understands the architecture
Source
git clone https://github.com/JasonWarrenUK/claude-code-config/blob/main/skills/data-ontologist/SKILL.mdView on GitHub Overview
Data-ontologist guides when to use relational, graph, and document stores and how to integrate them. It emphasizes starting with a graph model to reveal relationships, then selecting persistence based on access patterns and consistency needs.
How This Skill Works
Model the domain as a graph by identifying nodes, edges, and edge data. Then map the graph pieces to the right stores: core entities and transactions in PostgreSQL, relationships and traversals in Neo4j, and flexible content in MongoDB. Use the graph-first modelling process to drive both schema design and storage decisions.
When to Use It
- Designing data architecture for a new project
- Choosing between relational, graph, and document databases
- Integrating multiple database types in a single system
- Deciding schema design and cross-store query optimization
- Planning migration or data-model evolution across databases
Quick Start
- Step 1: Model the domain as a graph by identifying nodes, edges, and edge attributes
- Step 2: Allocate storage: core entities to PostgreSQL, relationships to Neo4j, content to MongoDB
- Step 3: Implement integration and test cross-store queries and migrations
Best Practices
- Start with graph modeling: identify nodes, edges, and attributes
- Decide persistence after graph modeling based on access patterns and consistency needs
- Place relational data in PostgreSQL, relationships in Neo4j, content in MongoDB
- Plan integration patterns for cross-store queries and migrations
- Use graph-first patterns like MEMbER_OF, ENROLLED_IN, and REQUIRES to guide schema and indexing
Example Use Cases
- Education platform: users, organizations, courses modeled as a graph; PostgreSQL handles accounts and orders; Neo4j stores enrollments and recommendations; MongoDB holds course content
- Corporate org chart: employees and reporting lines represented as a graph for traversal queries; relational data for HR records remains in PostgreSQL
- E-commerce: product and customer relationships in Neo4j for recommendations; orders in PostgreSQL; product specs in MongoDB
- Social network: user connections managed in Neo4j with traversal patterns; profile and activity data in PostgreSQL and MongoDB respectively
- Migration project: gradually migrating legacy relational data to a polyglot setup guided by graph-first modeling