What is the Technical PM Skill?

A structured framework for tackling AI product questions from a PM perspective, covering RLHF, evals, RAG, LLM deployment, system and API design.

When should I use this framework?

When you need depth on AI/ML technical questions, including system design, data pipelines, and trade-offs for AI products.

What deliverables does this framework emphasize?

Architecture sketches, trade-off analyses, clear requirements, and measurable evaluation criteria—without implying implementation details.

technical-pm

Scanned

npx machina-cli add skill aroyburman-codes/pm-skills/technical-pm --openclaw

Files (1)

SKILL.md

8.0 KB

Technical PM Skill

Apply a structured framework to technical PM questions targeting AI product roles.

When to Use

User asks about RLHF, fine-tuning, evals, inference, model architecture
User asks "Design a system that uses LLMs to X"
User asks "How would you build a RAG system for X"
User asks about technical trade-offs in AI/ML systems
User asks about API design for AI products
User says /technical-pm followed by a question
Any question requiring ML/AI technical depth from a PM perspective

Context

Tuned for: AI product roles at frontier AI companies
What matters: Going deep with researchers and engineers. You don't need to implement, but you need to understand the technical landscape well enough to make informed product decisions.
Common pitfall: Hand-waving on technical details. Be specific about architectures, trade-offs, and constraints.

Framework: AI PM Technical Method (6 Sections)

Section 1: Technical Clarifications & Constraints

Before designing anything, scope the technical problem:

Capability Assumptions: What model capabilities are available? (reasoning, multimodal, tool use, code gen)
Scale: How many users/queries? What latency requirements?
Infrastructure: Cloud vs. on-prem? What compute budget?
Data: What training/eval data exists? Privacy constraints?
Integration: What systems does this need to plug into?
Timeline: MVP vs. production-grade?

Section 2: Users (Developer & End-User Personas)

For technical products, think about two user layers:

Developers/Engineers: Who builds on this? What's their skill level? What do they expect?
End Users: Who consumes the output? What quality bar do they need?

For each persona: current workflow, technical sophistication, key frustrations.

Section 3: High-Level System Design

Draw the system architecture (describe it clearly):

Data Pipeline: How does data flow in? (user input → preprocessing → model → postprocessing → output)
Model Layer: Which model(s)? Foundation model + fine-tuned? Routing? Ensemble?
Orchestration: How are multi-step workflows managed? (agents, chains, state machines)
Storage: What needs to be persisted? (conversation history, embeddings, user preferences, model artifacts)
Serving: How is inference served? (batch vs. real-time, edge vs. cloud)

For RAG systems specifically:

Document ingestion pipeline (chunking strategy, embedding model, vector DB)
Retrieval (similarity search, reranking, hybrid search)
Generation (context window management, prompt engineering, citation)
Evaluation (relevance, faithfulness, answer quality)

For Agent systems specifically:

Tool/function calling architecture
Planning and reasoning loop
Memory (short-term working memory vs. long-term)
Safety/sandboxing (what can the agent actually do?)

Section 4: Deep Dive & Trade-offs

The interviewer will pick an area to go deep. Be prepared for:

The Latency-Cost-Quality Triangle: Every AI system has this fundamental trade-off:

Latency <-> Quality: Faster responses = less reasoning time, fewer model calls
Cost <-> Quality: Cheaper inference = smaller models, less compute per query
Latency <-> Cost: Real-time serving = more provisioned capacity, higher cost

Discuss specific techniques for each trade-off:

Latency: Streaming, caching, speculative decoding, model distillation, edge deployment
Cost: Batching, model routing (small model for easy queries, large for hard), quantization, spot instances
Quality: Chain-of-thought, self-consistency, retrieval augmentation, fine-tuning, human-in-the-loop

RLHF Pipeline (know this end-to-end):

Supervised Fine-Tuning (SFT) on high-quality demonstrations
Reward Model training from human preference comparisons
PPO optimization against the reward model with KL penalty
RLHF alternatives: DPO (Direct Preference Optimization), RLAIF, Constitutional AI

Evals (increasingly critical for AI PMs):

What to eval: Accuracy, safety, instruction-following, hallucination, code correctness
How to eval: Human eval, LLM-as-judge, automated benchmarks, A/B testing in production
Eval pitfalls: Benchmark contamination, Goodhart's law, distributional shift
Building eval sets: Golden datasets, adversarial examples, edge cases, domain-specific

Context Windows & Memory:

Trade-offs of larger context: Cost (quadratic attention), latency, lost-in-the-middle
Strategies: Summarization, RAG, hierarchical memory, sliding window
When to use fine-tuning vs. in-context learning vs. RAG

Hallucination Detection & Mitigation:

Detection: Confidence calibration, self-consistency checks, retrieval verification, citation validation
Mitigation: Grounding in retrieved facts, chain-of-thought transparency, abstention (model says "I don't know")
Measurement: Factual accuracy benchmarks, human annotation, automated fact-checking

Section 5: API Design & Developer Experience

For platform/API products, design the interface:

API surface: REST vs. streaming vs. SDK. Key endpoints.
Developer journey: Sign up → first API call → production integration
Documentation: What developers need to succeed
Pricing: Per-token, per-request, tiered, seat-based
Rate limiting & quotas: Fair usage, abuse prevention
Versioning: How to ship improvements without breaking existing users

Section 6: Metrics (Technical + Product)

Technical metrics:

Time to First Token (TTFT)
Tokens Per Second (TPS)
Error rate (4xx, 5xx, timeout)
Cost per 1K tokens (input/output)
Model accuracy on eval suite
Hallucination rate
Safety violation rate

Product metrics:

Developer activation (first API call within 7 days)
API adoption (monthly active developers, production integrations)
Quality satisfaction (developer NPS, support ticket volume)
Revenue (API spend, conversion to paid tiers)

Key Technical Topics to Know

Transformers & Attention

Self-attention mechanism, positional encoding
Scaling laws (Chinchilla, compute-optimal training)
Multi-head attention, KV cache

Training Pipeline

Pre-training (next token prediction on massive corpus)
Supervised Fine-Tuning (SFT)
RLHF / DPO / Constitutional AI
Mixture of Experts (MoE) architectures

Inference Optimization

Quantization (INT8, INT4, GPTQ, AWQ)
Speculative decoding
KV cache optimization
Batching strategies (continuous batching)
Model distillation (larger → smaller model)

Safety & Alignment

Constitutional AI
Red teaming and adversarial testing
Content filtering and classifiers
Responsible scaling policies

Multimodal

Vision-language models (image understanding)
Speech/audio models
Video understanding
Cross-modal retrieval

Output Format

Structure as a technical walkthrough. Be technical but accessible — translate between researchers, engineers, and product. Whiteboard-style system diagrams described in text. Aim for ~2500 words.

Research-First Workflow

Before generating the answer:

Research — Use web search to find latest technical thinking from AI leaders, engineering blogs from major AI labs, papers, benchmarks. Do 5-10 searches.
Cite sources — Include [linked source](url) inline for technical claims and architecture decisions.
Display the complete structured answer.

What Good Looks Like

Starts with technical scoping questions (constraints, scale, data)
System design is coherent and production-aware (not just academic)
Understands the Latency-Cost-Quality triangle deeply
Can explain RLHF, evals, RAG without hand-waving
Shows awareness of what's hard (hallucination, eval, safety)
Trade-off analysis is specific and quantitative
Connects technical decisions back to user/product impact

Source

git clone https://github.com/aroyburman-codes/pm-skills/blob/main/skills/technical-pm/SKILL.mdView on GitHub

Overview

A structured framework for technical PM questions in AI product roles. It covers RLHF, evals, RAG, LLM deployment, system design, and API design, with a focus on deep collaboration with researchers and engineers to make informed product decisions.

How This Skill Works

Follow the AI PM Technical Method's six sections: start with technical clarifications and constraints, define developer and end-user personas, outline a high-level system design, then dive into deep trade-offs and the RLHF/evaluation mindset. It emphasizes concrete architectures, data flows, and measurable criteria over hand-waving.

When to Use It

User asks about RLHF, fine-tuning, evals, inference, or model architecture.
User asks to design a system that uses LLMs to achieve a specific goal.
User asks how to build a RAG system for a given use case.
User asks about API design for AI products and how to expose AI capabilities.
User says /technical-pm followed by a question requiring ML/AI technical depth from a PM perspective.

Quick Start

Step 1: Clarify constraints and success metrics (capabilities, scale, data privacy, integration, timeline).
Step 2: Define Developers/Engineers and End Users personas; outline their workflows and pain points; sketch a high-level system design (data pipeline, model layer, serving).
Step 3: Identify a focused deep-dive area (e.g., latency vs cost vs quality) and outline evaluation criteria and next steps for discussion with researchers/engineers.

Best Practices

Ground decisions in explicit capabilities, scale, data/privacy constraints, and integration points.
Define two personas (Developers/Engineers and End Users) with their workflows and frustrations.
Draft a clear high-level system design: data pipeline, model layer, orchestration, storage, and serving strategy.
Explicitly map latency, cost, and quality trade-offs; propose concrete techniques like caching, batching, and model routing.
Articulate an RLHF pipeline and evaluation criteria with measurable metrics; avoid hand-waving.

Example Use Cases

Design a RAG-powered knowledge assistant with chunking, embeddings, and a vector DB, including retrieval, generation, and citation handling.
Plan an LLM deployment architecture with model routing and real-time vs batch inference considerations.
Create an API design strategy for AI features including versioning, authentication, rate limits, and observability.
Evaluate RLHF and SFT data quality for a product and map governance and safety constraints.
Prototype an AI agent system with tool calling, memory management, and a safety sandbox for user tasks.

Frequently Asked Questions

Add this skill to your agents