web-to-markdown
npx machina-cli add skill softaworks/agent-toolkit/web-to-markdown --openclawweb-to-markdown
Convert web pages to clean Markdown by driving a locally installed browser (via web2md).
Hard trigger gate (must enforce)
This skill MUST NOT be used unless the user explicitly wrote exactly a phrase like:
use the skill web-to-markdown ...use a skill web-to-markdown ...
If the user did not explicitly request this skill by name, stop and ask them to re-issue the request including: use the skill web-to-markdown.
What this skill does
- Handles JS-rendered pages (Puppeteer → user Chrome).
- Works best with Chromium-family browsers (Chrome/Chromium/Brave/Edge) via
puppeteer-core. - Extracts main content (Readability).
- Converts to Markdown (Turndown) with cleaned links and optional YAML frontmatter.
Non-goals
- Do not use Playwright or other browser automation stacks; the mechanism is
web2md.
Inputs you should collect (ask only if missing)
url(or a list of URLs)- Output preference:
- Print to stdout (
--print), OR - Save to a file (
--out ./file.md), OR - Save to a directory (
--out ./some-dir/to auto-name by page title)
- Print to stdout (
- Optional rendering controls for tricky pages:
--chrome-path <path>(if Chrome auto-detection fails)--interactive(show Chrome and pause so the user can complete human checks/login, then press Enter)--wait-until load|domcontentloaded|networkidle0|networkidle2--wait-for '<css selector>'--wait-ms <milliseconds>--headful(debug)--no-sandbox(sometimes required in containers/CI)--user-data-dir <dir>(login/session; use a dedicated profile directory)
Workflow
- Confirm the user explicitly invoked the skill (
use the skill web-to-markdown). - Validate URL(s) start with
http://orhttps://. - Ensure
web2mdis installed:- Run:
command -v web2md - If missing, instruct the user to install it (assume the project exists at
~/workspace/softaworks/projects/web2md):cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm link- Or:
cd ~/workspace/softaworks/projects/web2md && npm install && npm run build && npm install -g .
- Run:
- Convert:
- Single URL → file:
web2md '<url>' --out ./page.md
- Single URL → auto-named file in directory:
mkdir -p ./out && web2md '<url>' --out ./out/
- Human verification / login walls (interactive):
mkdir -p ./out && web2md '<url>' --interactive --user-data-dir ./tmp/web2md-profile --out ./out/- Then: complete the check in the browser window and press Enter in the terminal to continue.
- Print to stdout:
web2md '<url>' --print
- Multiple URLs (batch):
- Create output dir (e.g.
./out/) then run oneweb2mdcommand per URL using--out ./out/
- Create output dir (e.g.
- Single URL → file:
- Validate output:
- If writing files, verify they exist and are non-empty (e.g.
ls -la <path>andwc -c <path>).
- If writing files, verify they exist and are non-empty (e.g.
- Return:
- The saved file path(s), or the Markdown (stdout mode).
Defaults (recommended)
- For most pages:
--wait-until networkidle2 - For heavy apps: start with
--wait-until domcontentloaded --wait-ms 2000, then add--wait-for 'main'(or another stable selector) if needed.
Source
git clone https://github.com/softaworks/agent-toolkit/blob/main/skills/web-to-markdown/SKILL.mdView on GitHub Overview
web-to-markdown converts web pages into clean Markdown by driving a local browser with web2md (Puppeteer + Readability). It extracts the main content and outputs Markdown with cleaned links and optional YAML frontmatter, making it ideal for docs and knowledge bases.
How This Skill Works
Technically, it launches a local browser via Puppeteer, renders JS-heavy pages, uses Readability to pull the main article, then Turndown to generate Markdown. It can output to stdout or save to a file or directory, with optional rendering controls and login support. Note: this skill activates only when the user explicitly issues the trigger phrase, e.g., 'use the skill web-to-markdown'.
When to Use It
- You need a Markdown export of a JS-rendered page for docs or knowledge bases.
- You must process pages where static HTML scraping would miss content due to client-side rendering.
- You want to save output to a specific file, a directory (auto-named by page title), or print to stdout.
- You anticipate login walls or interactive checks and need to use --interactive and a user-data-dir.
- You are batch-processing multiple URLs and organizing results in a dedicated output directory.
Quick Start
- Step 1: Ensure you trigger the skill explicitly: use the skill web-to-markdown
- Step 2: Provide a URL (http(s)://...) and choose an output mode (--print, --out ./file.md, or --out ./dir/)
- Step 3: Run a command such as: web2md '<url>' --out ./out/ or web2md '<url>' --print
Best Practices
- Always ensure the user explicitly invoked the skill with 'use the skill web-to-markdown' before proceeding.
- Validate each URL starts with http:// or https:// before converting.
- Prefer using Chromium-family browsers via puppeteer-core for best results.
- Start with --wait-until networkidle2 by default; adjust for heavy apps if needed.
- After conversion, verify the output file exists and is non-empty.
Example Use Cases
- Convert a JS-heavy product page to Markdown and save as ./out/product.md.
- Batch convert several blog posts and store each as an auto-named file in ./out/.
- Preview a page in Markdown by running web2md '<url>' --print and copying the result.
- Access a page behind a login wall using --interactive and --user-data-dir, then save to ./out/.
- Integrate into a docs workflow by exporting multiple pages to a repository's docs/ directory.