What tasks is this pattern best suited for?

Theorem proving / formal reasoning, code generation, and creative/exploratory tasks, with task-specific presets and prompts derived from APOLLO and Godel-Prover research.

What are the key parameter presets to remember?

Theorem proving: max_tokens 4096, temperature 0.6, top_p 0.95; Code generation: max_tokens 2048, temperature 0.2-0.4; Creative tasks: max_tokens 4096, temperature 0.8-1.0.

What is the Proof Plan Prompt and why use it?

It requests a high-level plan before tactics, improving chain-of-thought and tactic quality, especially for Lean 4 styles.

llm-tuning-patterns

npx machina-cli add skill parcadei/Continuous-Claude-v3/llm-tuning-patterns --openclaw

Files (1)

SKILL.md

1.9 KB

LLM Tuning Patterns

Evidence-based patterns for configuring LLM parameters, based on APOLLO and Godel-Prover research.

Pattern

Different tasks require different LLM configurations. Use these evidence-based settings.

Theorem Proving / Formal Reasoning

Based on APOLLO parity analysis:

Parameter	Value	Rationale
max_tokens	4096	Proofs need space for chain-of-thought
temperature	0.6	Higher creativity for tactic exploration
top_p	0.95	Allow diverse proof paths

Proof Plan Prompt

Always request a proof plan before tactics:

Given the theorem to prove:
[theorem statement]

First, write a high-level proof plan explaining your approach.
Then, suggest Lean 4 tactics to implement each step.

The proof plan (chain-of-thought) significantly improves tactic quality.

Parallel Sampling

For hard proofs, use parallel sampling:

Generate N=8-32 candidate proof attempts
Use best-of-N selection
Each sample at temperature 0.6-0.8

Code Generation

Parameter	Value	Rationale
max_tokens	2048	Sufficient for most functions
temperature	0.2-0.4	Prefer deterministic output

Creative / Exploration Tasks

Parameter	Value	Rationale
max_tokens	4096	Space for exploration
temperature	0.8-1.0	Maximum creativity

Anti-Patterns

Too low tokens for proofs: 512 tokens truncates chain-of-thought
Too low temperature for proofs: 0.2 misses creative tactic paths
No proof plan: Jumping to tactics without planning reduces success rate

Source Sessions

This session: APOLLO parity - increased max_tokens 512->4096, temp 0.2->0.6
This session: Added proof plan prompt for chain-of-thought before tactics

Source

git clone https://github.com/parcadei/Continuous-Claude-v3/blob/main/.claude/skills/llm-tuning-patterns/SKILL.mdView on GitHub

Overview

Evidence-based patterns for configuring LLM parameters, drawn from APOLLO and Godel-Prover research. These patterns map task types to concrete settings to improve proof quality, code generation, and creative tasks.

How This Skill Works

The skill provides task-specific parameter presets (max_tokens, temperature, top_p) and prompts such as a Proof Plan to guide tactics. It also describes a Parallel Sampling workflow for hard proofs, detailing how many candidate attempts to generate and how to select the best one.

When to Use It

Theorem proving / formal reasoning tasks requiring structured proofs
Code generation tasks needing deterministic results
Creative or exploratory tasks seeking higher creativity
Hard proofs or complex reasoning benefiting from parallel sampling
Scenarios using APOLLO parity-informed token and temperature adjustments

Quick Start

Step 1: Identify the task type (theorem proving, code generation, or creative exploration)
Step 2: Apply the corresponding parameter presets and enable the Proof Plan Prompt for proofs (e.g., 4096/0.6/0.95 for theorems; 2048/0.2-0.4 for code; 4096/0.8-1.0 for creative tasks)
Step 3: For challenging proofs, enable Parallel Sampling with N=8-32 and choose the best-of-N result

Best Practices

Always use the Proof Plan Prompt before tactics to improve plan quality and tactic selection
Apply task-appropriate presets: Theorem proving uses max_tokens 4096, temperature 0.6, top_p 0.95; Code generation uses max_tokens 2048, temperature 0.2-0.4; Creative tasks use max_tokens 4096, temperature 0.8-1.0
Use Parallel Sampling for hard proofs: generate N=8-32 candidate proofs and select the best-of-N
Avoid too-low tokens for proofs (512) and avoid too-low temperature (0.2) to prevent truncation and missed tactic paths
Consider APOLLO parity adjustments: baseline transitions like 512→4096 tokens and 0.2→0.6 temperature to guide configurations

Example Use Cases

Theorem proving: apply max_tokens 4096, temp 0.6, top_p 0.95 and use a Proof Plan Prompt before tactics to derive Lean 4 proofs
Code generation: generate function implementations with max_tokens 2048 and temp 0.2-0.4 for deterministic output
Hard proofs: run Parallel Sampling with N between 8 and 32 and use best-of-N to select the most robust proof path
Creative tasks: set max_tokens 4096 and temp 0.8-1.0 to maximize novelty in exploratory writing or problem-solving
APOLLO parity session notes: adjust max_tokens and temperature according to parity changes to optimize proof quality

Frequently Asked Questions

Add this skill to your agents