Get the FREE Ultimate OpenClaw Setup Guide →

web-to-markdown

npx machina-cli add skill softaworks/agent-toolkit/web-to-markdown --openclaw
Files (1)
SKILL.md
3.4 KB

web-to-markdown

Convert web pages to clean Markdown by driving a locally installed browser (via web2md).

Hard trigger gate (must enforce)

This skill MUST NOT be used unless the user explicitly wrote exactly a phrase like:

  • use the skill web-to-markdown ...
  • use a skill web-to-markdown ...

If the user did not explicitly request this skill by name, stop and ask them to re-issue the request including: use the skill web-to-markdown.

What this skill does

  • Handles JS-rendered pages (Puppeteer → user Chrome).
  • Works best with Chromium-family browsers (Chrome/Chromium/Brave/Edge) via puppeteer-core.
  • Extracts main content (Readability).
  • Converts to Markdown (Turndown) with cleaned links and optional YAML frontmatter.

Non-goals

  • Do not use Playwright or other browser automation stacks; the mechanism is web2md.

Inputs you should collect (ask only if missing)

  • url (or a list of URLs)
  • Output preference:
    • Print to stdout (--print), OR
    • Save to a file (--out ./file.md), OR
    • Save to a directory (--out ./some-dir/ to auto-name by page title)
  • Optional rendering controls for tricky pages:
    • --chrome-path <path> (if Chrome auto-detection fails)
    • --interactive (show Chrome and pause so the user can complete human checks/login, then press Enter)
    • --wait-until load|domcontentloaded|networkidle0|networkidle2
    • --wait-for '<css selector>'
    • --wait-ms <milliseconds>
    • --headful (debug)
    • --no-sandbox (sometimes required in containers/CI)
    • --user-data-dir <dir> (login/session; use a dedicated profile directory)

Workflow

  1. Confirm the user explicitly invoked the skill (use the skill web-to-markdown).
  2. Validate URL(s) start with http:// or https://.
  3. Ensure web2md is installed:
    • Run: command -v web2md
    • If missing, instruct the user to install it (assume the project exists at ~/workspace/softaworks/projects/web2md):
      • cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm link
      • Or: cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm install -g .
  4. Convert:
    • Single URL → file:
      • web2md '<url>' --out ./page.md
    • Single URL → auto-named file in directory:
      • mkdir -p ./out && web2md '<url>' --out ./out/
    • Human verification / login walls (interactive):
      • mkdir -p ./out && web2md '<url>' --interactive --user-data-dir ./tmp/web2md-profile --out ./out/
      • Then: complete the check in the browser window and press Enter in the terminal to continue.
    • Print to stdout:
      • web2md '<url>' --print
    • Multiple URLs (batch):
      • Create output dir (e.g. ./out/) then run one web2md command per URL using --out ./out/
  5. Validate output:
    • If writing files, verify they exist and are non-empty (e.g. ls -la <path> and wc -c <path>).
  6. Return:
    • The saved file path(s), or the Markdown (stdout mode).

Defaults (recommended)

  • For most pages: --wait-until networkidle2
  • For heavy apps: start with --wait-until domcontentloaded --wait-ms 2000, then add --wait-for 'main' (or another stable selector) if needed.

Source

git clone https://github.com/softaworks/agent-toolkit/blob/main/skills/web-to-markdown/SKILL.mdView on GitHub

Overview

web-to-markdown converts web pages into clean Markdown by driving a local browser with web2md (Puppeteer + Readability). It extracts the main content and outputs Markdown with cleaned links and optional YAML frontmatter, making it ideal for docs and knowledge bases.

How This Skill Works

Technically, it launches a local browser via Puppeteer, renders JS-heavy pages, uses Readability to pull the main article, then Turndown to generate Markdown. It can output to stdout or save to a file or directory, with optional rendering controls and login support. Note: this skill activates only when the user explicitly issues the trigger phrase, e.g., 'use the skill web-to-markdown'.

When to Use It

  • You need a Markdown export of a JS-rendered page for docs or knowledge bases.
  • You must process pages where static HTML scraping would miss content due to client-side rendering.
  • You want to save output to a specific file, a directory (auto-named by page title), or print to stdout.
  • You anticipate login walls or interactive checks and need to use --interactive and a user-data-dir.
  • You are batch-processing multiple URLs and organizing results in a dedicated output directory.

Quick Start

  1. Step 1: Ensure you trigger the skill explicitly: use the skill web-to-markdown
  2. Step 2: Provide a URL (http(s)://...) and choose an output mode (--print, --out ./file.md, or --out ./dir/)
  3. Step 3: Run a command such as: web2md '<url>' --out ./out/ or web2md '<url>' --print

Best Practices

  • Always ensure the user explicitly invoked the skill with 'use the skill web-to-markdown' before proceeding.
  • Validate each URL starts with http:// or https:// before converting.
  • Prefer using Chromium-family browsers via puppeteer-core for best results.
  • Start with --wait-until networkidle2 by default; adjust for heavy apps if needed.
  • After conversion, verify the output file exists and is non-empty.

Example Use Cases

  • Convert a JS-heavy product page to Markdown and save as ./out/product.md.
  • Batch convert several blog posts and store each as an auto-named file in ./out/.
  • Preview a page in Markdown by running web2md '<url>' --print and copying the result.
  • Access a page behind a login wall using --interactive and --user-data-dir, then save to ./out/.
  • Integrate into a docs workflow by exporting multiple pages to a repository's docs/ directory.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers