What is the purpose of model suffixes like :groq or :Together?

Suffixes indicate the provider to route to via the single unified endpoint, enabling multi-provider routing without code changes.

How do I enable reasoning visibility in the agent outputs?

Reasoning is exposed through dedicated fields (e.g., reasoning content, encrypted_content, summary) and can be controlled by setting the reasoning effort level in requests.

Can I mix provider routing with streaming results?

Yes. The Open Responses endpoint supports structured events and streaming-like outputs for progressive results, including reasoning, tool calls, and final messages.

open-responses-agent-dev

Scanned

npx machina-cli add skill OthmanAdi/open-responses-agent-skill/open-responses-agent-dev --openclaw

Files (1)

SKILL.md

16.0 KB

Open Responses Agent Development

Build autonomous agents with the Open Responses API - the open-source standard for multi-provider, agentic LLM interfaces via HuggingFace Inference Providers.

When to Use This Skill

Activate this skill when:

Building autonomous agents (not chatbots)
Need multi-step workflows in a single request
Want multi-provider routing with a single endpoint
Need reasoning visibility (see agent thinking)
Building sub-agent loops with tools
Want to use the OpenAI SDK with open-source models

Key Concept: Single Unified Endpoint

IMPORTANT: Open Responses uses ONE unified endpoint with provider routing via model suffixes.

Endpoint: https://router.huggingface.co/v1
Model format: model-id:provider (e.g., moonshotai/Kimi-K2-Instruct-0905:groq)

Providers are specified as suffixes on the model name:

:groq - Groq inference
:together - Together AI
:nebius - Nebius AI
:auto - Automatic provider selection
(no suffix) - Default provider

Core Concepts

1. The Responses Endpoint

POST https://router.huggingface.co/v1/responses
Authorization: Bearer $HF_TOKEN
Content-Type: application/json

Request Structure:

{
  "model": "moonshotai/Kimi-K2-Instruct-0905:groq",
  "instructions": "You are a helpful assistant.",
  "input": "User request",
  "tools": [...],
  "tool_choice": "auto",
  "reasoning": { "effort": "medium" },
  "stream": false
}

Response Structure:

{
  "id": "resp_abc123",
  "model": "moonshotai/Kimi-K2-Instruct-0905:groq",
  "output": [
    { "type": "reasoning", "content": "Let me think..." },
    { "type": "function_call", "name": "search", "arguments": "{...}" },
    { "type": "function_call_output", "output": "..." },
    { "type": "message", "content": "Final response" }
  ],
  "output_text": "Final response",
  "usage": { "input_tokens": 100, "output_tokens": 200 }
}

2. Sub-Agent Loops

The API automatically handles:

Model samples a response
Emits tool calls if needed
Executes tools (for server-side tools like MCP)
Feeds results back
Repeats until completion

No manual loop management required!

3. Reasoning Visibility

Three fields for reasoning:

content: Raw reasoning traces (open weight models)
encrypted_content: Protected reasoning (proprietary models)
summary: Sanitized summary

Control reasoning effort:

{ "reasoning": { "effort": "low" | "medium" | "high" } }

4. Semantic Streaming

Events are structured, not raw text:

event: response.created
event: response.output_item.added
event: response.output_text.delta
event: response.output_item.done
event: response.completed

Language Selection

Choose based on your use case:

Language	Best For	Recommended SDK
TypeScript	Web apps, serverless, Node.js	`openai` npm package
Python	ML/AI, data science, rapid prototyping	`openai` pip package

TypeScript Implementation

Setup

npm init -y
npm install openai

Basic Agent (Using OpenAI SDK)

import OpenAI from "openai";

// Configure client with HuggingFace router
const client = new OpenAI({
  baseURL: "https://router.huggingface.co/v1",
  apiKey: process.env.HF_TOKEN,
});

async function createAgent(
  model: string,
  input: string,
  instructions?: string
) {
  const response = await client.responses.create({
    model,  // e.g., "moonshotai/Kimi-K2-Instruct-0905:groq"
    instructions: instructions || "You are a helpful assistant.",
    input,
  });

  // Use the convenience helper for simple text output
  console.log(response.output_text);

  // Or iterate through all output items
  for (const item of response.output) {
    console.log(item.type, item.content);
  }

  return response;
}

// Usage
const result = await createAgent(
  "moonshotai/Kimi-K2-Instruct-0905:groq",
  "What is the capital of France?"
);

Sub-Agent Loop with Tools

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://router.huggingface.co/v1",
  apiKey: process.env.HF_TOKEN,
});

// Tools are defined at top level (not nested in function object)
const tools = [
  {
    type: "function" as const,
    name: "get_current_weather",
    description: "Get the current weather in a given location",
    parameters: {
      type: "object",
      properties: {
        location: { type: "string", description: "City and state, e.g. San Francisco, CA" },
        unit: { type: "string", enum: ["celsius", "fahrenheit"] },
      },
      required: ["location", "unit"],
    },
  },
  {
    type: "function" as const,
    name: "search_documents",
    description: "Search company documents for information",
    parameters: {
      type: "object",
      properties: {
        query: { type: "string", description: "Search query" },
      },
      required: ["query"],
    },
  },
];

async function runAgentWithTools() {
  const response = await client.responses.create({
    model: "moonshotai/Kimi-K2-Instruct-0905:groq",
    instructions: "You are a helpful assistant.",
    input: "What is the weather like in Boston today?",
    tools,
    tool_choice: "auto",
  });

  // Process all output items
  for (const item of response.output) {
    switch (item.type) {
      case "reasoning":
        console.log(`[REASONING] ${item.content}`);
        break;
      case "function_call":
        console.log(`[TOOL CALL] ${item.name}(${JSON.stringify(item.arguments)})`);
        break;
      case "function_call_output":
        console.log(`[TOOL RESULT] ${item.output}`);
        break;
      case "message":
        console.log(`[RESPONSE] ${item.content}`);
        break;
    }
  }

  return response;
}

Streaming (TypeScript)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://router.huggingface.co/v1",
  apiKey: process.env.HF_TOKEN,
});

async function streamAgent() {
  const stream = await client.responses.create({
    model: "moonshotai/Kimi-K2-Instruct-0905:groq",
    instructions: "You are a helpful assistant.",
    input: "Say 'double bubble bath' ten times fast.",
    stream: true,
  });

  for await (const event of stream) {
    console.log(event);
  }
}

Structured Outputs

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://router.huggingface.co/v1",
  apiKey: process.env.HF_TOKEN,
});

async function getStructuredOutput() {
  const response = await client.responses.create({
    model: "openai/gpt-oss-120b:groq",
    instructions: "Extract the event information. Return JSON.",
    input: "Alice and Bob are going to a science fair on Friday.",
    response_format: {
      type: "json_schema",
      json_schema: {
        name: "CalendarEvent",
        schema: {
          type: "object",
          properties: {
            name: { type: "string" },
            date: { type: "string" },
            participants: { type: "array", items: { type: "string" } },
          },
          required: ["name", "date", "participants"],
          additionalProperties: false,
        },
        strict: true,
      },
    },
  });

  const parsed = JSON.parse(response.output_text);
  console.log(parsed);
}

Python Implementation

Setup

pip install openai

Basic Agent (Using OpenAI SDK)

import os
from openai import OpenAI

# Configure client with HuggingFace router
client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.getenv("HF_TOKEN"),
)

def create_agent(model: str, input_text: str, instructions: str = None):
    response = client.responses.create(
        model=model,  # e.g., "moonshotai/Kimi-K2-Instruct-0905:groq"
        instructions=instructions or "You are a helpful assistant.",
        input=input_text,
    )

    # Use the convenience helper for simple text output
    print(response.output_text)

    # Or iterate through all output items
    for item in response.output:
        print(item.type, item.content)

    return response

# Usage
result = create_agent(
    "moonshotai/Kimi-K2-Instruct-0905:groq",
    "What is the capital of France?"
)

Sub-Agent Loop with Tools

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.getenv("HF_TOKEN"),
)

# Tools are defined at top level (not nested in function object)
tools = [
    {
        "type": "function",
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state, e.g. San Francisco, CA"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["location", "unit"],
        },
    },
    {
        "type": "function",
        "name": "search_documents",
        "description": "Search company documents for information",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"},
            },
            "required": ["query"],
        },
    },
]

def run_agent_with_tools():
    response = client.responses.create(
        model="moonshotai/Kimi-K2-Instruct-0905:groq",
        instructions="You are a helpful assistant.",
        input="What is the weather like in Boston today?",
        tools=tools,
        tool_choice="auto",
    )

    # Process all output items
    for item in response.output:
        match item.type:
            case "reasoning":
                print(f"[REASONING] {item.content}")
            case "function_call":
                print(f"[TOOL CALL] {item.name}({item.arguments})")
            case "function_call_output":
                print(f"[TOOL RESULT] {item.output}")
            case "message":
                print(f"[RESPONSE] {item.content}")

    return response

run_agent_with_tools()

Streaming (Python)

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.getenv("HF_TOKEN"),
)

def stream_agent():
    stream = client.responses.create(
        model="moonshotai/Kimi-K2-Instruct-0905:groq",
        input=[{"role": "user", "content": "Say 'double bubble bath' ten times fast."}],
        stream=True,
    )

    for event in stream:
        print(event)

stream_agent()

Structured Outputs (Python)

import os
from openai import OpenAI
from pydantic import BaseModel

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.getenv("HF_TOKEN"),
)

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

def get_structured_output():
    response = client.responses.parse(
        model="openai/gpt-oss-120b:groq",
        input=[
            {"role": "system", "content": "Extract the event information."},
            {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
        ],
        text_format=CalendarEvent,
    )

    print(response.output_parsed)

get_structured_output()

Reasoning Control (Python)

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.getenv("HF_TOKEN"),
)

def agent_with_reasoning():
    response = client.responses.create(
        model="openai/gpt-oss-120b:groq",
        instructions="You are a helpful assistant.",
        input="Say hello to the world.",
        reasoning={"effort": "low"},  # "low" | "medium" | "high"
    )

    for i, item in enumerate(response.output):
        print(f"Output #{i}: {item.type}", item.content)

agent_with_reasoning()

Provider Routing

Switch providers by changing the model suffix:

# Same endpoint, different providers via model suffix
client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.getenv("HF_TOKEN"),
)

# Use Groq
response = client.responses.create(model="moonshotai/Kimi-K2-Instruct-0905:groq", ...)

# Use Together AI
response = client.responses.create(model="meta-llama/Llama-3.1-70B-Instruct:together", ...)

# Use Nebius
response = client.responses.create(model="meta-llama/Llama-3.1-70B-Instruct:nebius", ...)

# Auto-select provider
response = client.responses.create(model="meta-llama/Llama-3.1-70B-Instruct:auto", ...)

Available Providers (via Model Suffix)

Suffix	Provider	Reasoning
`:groq`	Groq	Fast inference
`:together`	Together AI	Open weight models
`:nebius`	Nebius AI	European infrastructure
`:auto`	Automatic	System chooses
(none)	Default	Provider default

Browse available models: HuggingFace Inference Models

Migration from Chat Completion

Before (Chat Completion - Manual Loop)

// OLD: Manual agentic loop with OpenAI client
const openai = new OpenAI();
let messages = [{ role: "user", content: "Search and summarize" }];

while (true) {
  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages,
    tools,
  });

  if (response.choices[0].finish_reason === "tool_calls") {
    // Manually execute tools
    // Manually manage state
    // Loop again...
  } else {
    break;
  }
}

After (Open Responses - Single Request)

// NEW: Single request, automatic loop
const client = new OpenAI({
  baseURL: "https://router.huggingface.co/v1",
  apiKey: process.env.HF_TOKEN,
});

const response = await client.responses.create({
  model: "moonshotai/Kimi-K2-Instruct-0905:groq",
  instructions: "You are a helpful assistant.",
  input: "Search and summarize",
  tools,
  tool_choice: "auto",
});

// Complete execution trace in response.output
for (const item of response.output) {
  console.log(item);
}

Best Practices

1. Use the OpenAI SDK

# Recommended: Use official SDK with custom base_url
from openai import OpenAI
client = OpenAI(base_url="https://router.huggingface.co/v1", api_key=token)

2. Include Provider Suffix

# Be explicit about provider for consistency
model = "moonshotai/Kimi-K2-Instruct-0905:groq"

3. Use instructions Field

response = client.responses.create(
    model="...",
    instructions="You are a helpful assistant.",  # System prompt
    input="User message",
)

4. Handle All Output Types

for item in response.output:
    match item.type:
        case "reasoning": ...
        case "function_call": ...
        case "function_call_output": ...
        case "message": ...

5. Use output_text for Simple Cases

# Quick access to final text response
print(response.output_text)

6. Control Reasoning Effort

response = client.responses.create(
    model="...",
    reasoning={"effort": "medium"},  # low, medium, high
    ...
)

Resources

HuggingFace Docs: huggingface.co/docs/inference-providers/en/guides/responses-api
Open Responses Spec: openresponses.org/specification
GitHub: github.com/openresponses/openresponses
HuggingFace Blog: huggingface.co/blog/open-responses
Available Models: huggingface.co/inference/models

Examples

See the /examples directory for complete implementations:

examples/typescript/ - TypeScript examples
examples/python/ - Python examples

Templates

See the /templates directory for starter code:

templates/typescript/agent-template.ts
templates/python/agent_template.py

Source

git clone https://github.com/OthmanAdi/open-responses-agent-skill/blob/master/skills/open-responses-agent-dev/SKILL.mdView on GitHub

Overview

Open-responses-agent-dev enables building autonomous agents that use the Open Responses API through HuggingFace Inference Providers. It centers a single unified endpoint with model suffix routing to multiple providers and works with the OpenAI SDK via a custom base_url for smooth development. This setup supports multi-step workflows, sub-agent tool use, and visible reasoning.

How This Skill Works

Requests are sent to the HuggingFace router endpoint with a model in the format model-id:provider (e.g., moonshotai/Kimi-K2-Instruct-0905:groq). The router uses the model suffix to route to the chosen provider, returning a structured response that can include reasoning traces, tool calls, and the final message. Developers implement agents using the OpenAI SDK with baseURL set to the router and rely on the API’s automatic sub-agent loop to manage tool execution and result handling.

When to Use It

Building autonomous agents (not just chatbots) that perform multi-step workflows within a single request
Need multi-provider routing through one endpoint to compare or combine capabilities
Require reasoning visibility to inspect or audit the agent's thought process
Building sub-agent loops that invoke tools or external services
Using OpenAI SDK with open-source models via HuggingFace router for rapid development

Quick Start

Step 1: Install the OpenAI package and set up a client with baseURL = https://router.huggingface.co/v1
Step 2: Create a model call using a format like model = 'moonshotai/Kimi-K2-Instruct-0905:groq' and supply input/instructions
Step 3: Call responses.create (or the SDK equivalent) and handle response.output_text, along with any reasoning or tool_call results

Best Practices

Define clear tool interfaces and expected inputs/outputs for each sub-agent to minimize ambiguity in tool calls
Use the model suffixes (e.g., :groq, :Together) to route to the appropriate provider without changing code
Leverage the reasoning fields to surface traces; adjust reasoning effort (low/medium/high) to balance transparency and latency
Test end-to-end flows with realistic multi-step tasks and tool responses to validate the loop behavior
Configure the OpenAI client with baseURL set to https://router.huggingface.co/v1 and manage HF_TOKEN securely

Example Use Cases

Autonomous procurement assistant that routes provider calls to compare supplier data and generate a purchase plan
Research agent that uses tools to fetch web results or data, then compiles a summarized report with sources
IT incident responder agent that queries dashboards and runs checks via integrated tools to triage issues
Knowledge-base builder that aggregates information from multiple providers to create a comprehensive answer set
Sub-agent orchestrator that sequences subtasks (data gathering, transformation, validation) with visible reasoning

Frequently Asked Questions

Add this skill to your agents