Tech News Digest
Verified@dinstein
npx machina-cli add skill @dinstein/tech-news-digest --openclawTech News Digest
Automated tech news digest system with unified data source model, quality scoring pipeline, and template-based output generation.
Quick Start
-
Configuration Setup: Default configs are in
config/defaults/. Copy to workspace for customization:mkdir -p workspace/config cp config/defaults/sources.json workspace/config/ cp config/defaults/topics.json workspace/config/ -
Environment Variables:
X_BEARER_TOKEN- Twitter API bearer token (optional)BRAVE_API_KEY- Brave Search API key (optional)GITHUB_TOKEN- GitHub personal access token (optional, improves rate limits)
-
Generate Digest:
# Unified pipeline (recommended) — runs all 5 sources in parallel + merge python3 scripts/run-pipeline.py \ --defaults config/defaults \ --config workspace/config \ --hours 48 --freshness pd \ --archive-dir workspace/archive/tech-news-digest/ \ --output /tmp/td-merged.json --verbose --force -
Use Templates: Apply Discord, email, or markdown templates to merged output
Configuration Files
sources.json - Unified Data Sources
{
"sources": [
{
"id": "openai-rss",
"type": "rss",
"name": "OpenAI Blog",
"url": "https://openai.com/blog/rss.xml",
"enabled": true,
"priority": true,
"topics": ["llm", "ai-agent"],
"note": "Official OpenAI updates"
},
{
"id": "sama-twitter",
"type": "twitter",
"name": "Sam Altman",
"handle": "sama",
"enabled": true,
"priority": true,
"topics": ["llm", "frontier-tech"],
"note": "OpenAI CEO"
}
]
}
topics.json - Enhanced Topic Definitions
{
"topics": [
{
"id": "llm",
"emoji": "🧠",
"label": "LLM / Large Models",
"description": "Large Language Models, foundation models, breakthroughs",
"search": {
"queries": ["LLM latest news", "large language model breakthroughs"],
"must_include": ["LLM", "large language model", "foundation model"],
"exclude": ["tutorial", "beginner guide"]
},
"display": {
"max_items": 8,
"style": "detailed"
}
}
]
}
Scripts Pipeline
run-pipeline.py - Unified Pipeline (Recommended)
python3 scripts/run-pipeline.py \
--defaults config/defaults [--config CONFIG_DIR] \
--hours 48 --freshness pd \
--archive-dir workspace/archive/tech-news-digest/ \
--output /tmp/td-merged.json --verbose --force
- Features: Runs all 5 fetch steps in parallel, then merges + deduplicates + scores
- Output: Final merged JSON ready for report generation (~30s total)
- Metadata: Saves per-step timing and counts to
*.meta.json - GitHub Auth: Auto-generates GitHub App token if
$GITHUB_TOKENnot set - Fallback: If this fails, run individual scripts below
Individual Scripts (Fallback)
fetch-rss.py - RSS Feed Fetcher
python3 scripts/fetch-rss.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE] [--verbose]
- Parallel fetching (10 workers), retry with backoff, feedparser + regex fallback
- Timeout: 30s per feed, ETag/Last-Modified caching
fetch-twitter.py - Twitter/X KOL Monitor
python3 scripts/fetch-twitter.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE]
- Requires
X_BEARER_TOKEN, rate limit handling, engagement metrics
fetch-web.py - Web Search Engine
python3 scripts/fetch-web.py [--defaults DIR] [--config DIR] [--freshness pd] [--output FILE]
- Auto-detects Brave API rate limit: paid plans → parallel queries, free → sequential
- Without API: generates search interface for agents
fetch-github.py - GitHub Releases Monitor
python3 scripts/fetch-github.py [--defaults DIR] [--config DIR] [--hours 168] [--output FILE]
- Parallel fetching (10 workers), 30s timeout
- Auth priority:
$GITHUB_TOKEN→ GitHub App auto-generate →ghCLI → unauthenticated (60 req/hr)
fetch-reddit.py - Reddit Posts Fetcher
python3 scripts/fetch-reddit.py [--defaults DIR] [--config DIR] [--hours 48] [--output FILE]
- Parallel fetching (4 workers), public JSON API (no auth required)
- 13 subreddits with score filtering
merge-sources.py - Quality Scoring & Deduplication
python3 scripts/merge-sources.py --rss FILE --twitter FILE --web FILE --github FILE --reddit FILE
- Quality scoring, title similarity dedup (85%), previous digest penalty
- Output: topic-grouped articles sorted by score
validate-config.py - Configuration Validator
python3 scripts/validate-config.py [--defaults DIR] [--config DIR] [--verbose]
- JSON schema validation, topic reference checks, duplicate ID detection
User Customization
Workspace Configuration Override
Place custom configs in workspace/config/ to override defaults:
- Sources: Append new sources, disable defaults with
"enabled": false - Topics: Override topic definitions, search queries, display settings
- Merge Logic:
- Sources with same
id→ user version takes precedence - Sources with new
id→ appended to defaults - Topics with same
id→ user version completely replaces default
- Sources with same
Example Workspace Override
// workspace/config/sources.json
{
"sources": [
{
"id": "simonwillison-rss",
"enabled": false,
"note": "Disabled: too noisy for my use case"
},
{
"id": "my-custom-blog",
"type": "rss",
"name": "My Custom Tech Blog",
"url": "https://myblog.com/rss",
"enabled": true,
"priority": true,
"topics": ["frontier-tech"]
}
]
}
Templates & Output
Discord Template (references/templates/discord.md)
- Bullet list format with link suppression (
<link>) - Mobile-optimized, emoji headers
- 2000 character limit awareness
Email Template (references/templates/email.md)
- Rich metadata, technical stats, archive links
- Executive summary, top articles section
- HTML-compatible formatting
Markdown Template (references/templates/markdown.md)
- GitHub-compatible tables and formatting
- Technical details section
- Expandable sections support
Default Sources (133 total)
- RSS Feeds (49): AI labs, tech blogs, crypto news, Chinese tech media
- Twitter/X KOLs (49): AI researchers, crypto leaders, tech executives
- GitHub Repos (22): Major open-source projects (LangChain, vLLM, DeepSeek, Llama, etc.)
- Reddit (13): r/MachineLearning, r/LocalLLaMA, r/CryptoCurrency, r/ChatGPT, r/OpenAI, etc.
- Web Search (4 topics): LLM, AI Agent, Crypto, Frontier Tech
All sources pre-configured with appropriate topic tags and priority levels.
Dependencies
pip install -r requirements.txt
Optional but Recommended:
feedparser>=6.0.0- Better RSS parsing (fallback to regex if unavailable)jsonschema>=4.0.0- Configuration validation
All scripts work with Python 3.8+ standard library only.
Monitoring & Operations
Health Checks
# Validate configuration
python3 scripts/validate-config.py --verbose
# Test RSS feeds
python3 scripts/fetch-rss.py --hours 1 --verbose
# Check Twitter API
python3 scripts/fetch-twitter.py --hours 1 --verbose
Archive Management
- Digests automatically archived to
<workspace>/archive/tech-news-digest/ - Previous digest titles used for duplicate detection
- Old archives cleaned automatically (90+ days)
Error Handling
- Network Failures: Retry with exponential backoff
- Rate Limits: Automatic retry with appropriate delays
- Invalid Content: Graceful degradation, detailed logging
- Configuration Errors: Schema validation with helpful messages
API Keys & Environment
Set in ~/.zshenv or similar:
export X_BEARER_TOKEN="your_twitter_bearer_token"
export BRAVE_API_KEY="your_brave_search_api_key" # Optional
- Twitter: Read-only bearer token, pay-per-use pricing
- Brave Search: Optional, fallback to agent web_search if unavailable
Cron / Scheduled Task Integration
OpenClaw Cron (Recommended)
The cron prompt should NOT hardcode the pipeline steps. Instead, reference references/digest-prompt.md and only pass configuration parameters. This ensures the pipeline logic stays in the skill repo and is consistent across all installations.
Daily Digest Cron Prompt
Read <SKILL_DIR>/references/digest-prompt.md and follow the complete workflow to generate a daily digest.
Replace placeholders with:
- MODE = daily
- TIME_WINDOW = past 1-2 days
- FRESHNESS = pd
- RSS_HOURS = 48
- ITEMS_PER_SECTION = 3-5
- BLOG_PICKS_COUNT = 2-3
- EXTRA_SECTIONS = (none)
- SUBJECT = Daily Tech Digest - YYYY-MM-DD
- WORKSPACE = <your workspace path>
- SKILL_DIR = <your skill install path>
- DISCORD_CHANNEL_ID = <your channel id>
- EMAIL = (optional)
- LANGUAGE = English
- TEMPLATE = discord
Follow every step in the prompt template strictly. Do not skip any steps.
Weekly Digest Cron Prompt
Read <SKILL_DIR>/references/digest-prompt.md and follow the complete workflow to generate a weekly digest.
Replace placeholders with:
- MODE = weekly
- TIME_WINDOW = past 7 days
- FRESHNESS = pw
- RSS_HOURS = 168
- ITEMS_PER_SECTION = 5-8
- BLOG_PICKS_COUNT = 3-5
- EXTRA_SECTIONS = 📊 Weekly Trend Summary (2-3 sentences summarizing macro trends)
- SUBJECT = Weekly Tech Digest - YYYY-MM-DD
- WORKSPACE = <your workspace path>
- SKILL_DIR = <your skill install path>
- DISCORD_CHANNEL_ID = <your channel id>
- EMAIL = (optional)
- LANGUAGE = English
- TEMPLATE = discord
Follow every step in the prompt template strictly. Do not skip any steps.
Why This Pattern?
- Single source of truth: Pipeline logic lives in
digest-prompt.md, not scattered across cron configs - Portable: Same skill on different OpenClaw instances, just change paths and channel IDs
- Maintainable: Update the skill → all cron jobs pick up changes automatically
- Anti-pattern: Do NOT copy pipeline steps into the cron prompt — it will drift out of sync
Multi-Channel Delivery Limitation
OpenClaw enforces cross-provider isolation: a single session can only send messages to one provider (e.g., Discord OR Telegram, not both). If you need to deliver digests to multiple platforms, create separate cron jobs for each provider:
# Job 1: Discord + Email
- DISCORD_CHANNEL_ID = <your-discord-channel-id>
- EMAIL = user@example.com
- TEMPLATE = discord
# Job 2: Telegram DM
- DISCORD_CHANNEL_ID = (none)
- EMAIL = (none)
- TEMPLATE = telegram
Replace DISCORD_CHANNEL_ID delivery with Telegram delivery in the second job's prompt (use message tool with channel=telegram).
This is a security feature, not a bug — it prevents accidental cross-context data leakage.
Security Notes
Execution Model
This skill uses a prompt template pattern: the agent reads digest-prompt.md and follows its instructions. This is the standard OpenClaw skill execution model — the agent interprets structured instructions from skill-provided files. All instructions are shipped with the skill bundle and can be audited before installation.
Network Access
The Python scripts make outbound requests to:
- RSS feed URLs (configured in
sources.json) - Twitter/X API (
api.x.com) - Brave Search API (
api.search.brave.com) - GitHub API (
api.github.com)
No data is sent to any other endpoints. All API keys are read from environment variables declared in the skill metadata.
Shell Safety
Email delivery uses the gog CLI with hardcoded subject formats (Daily Tech Digest - YYYY-MM-DD). The prompt template explicitly prohibits interpolating untrusted content into shell arguments.
File Access
Scripts read from config/ and write to workspace/archive/. No files outside the workspace are accessed.
Support & Troubleshooting
Common Issues
- RSS feeds failing: Check network connectivity, use
--verbosefor details - Twitter rate limits: Reduce sources or increase interval
- Configuration errors: Run
validate-config.pyfor specific issues - No articles found: Check time window (
--hours) and source enablement
Debug Mode
All scripts support --verbose flag for detailed logging and troubleshooting.
Performance Tuning
- Parallel Workers: Adjust
MAX_WORKERSin scripts for your system - Timeout Settings: Increase
TIMEOUTfor slow networks - Article Limits: Adjust
MAX_ARTICLES_PER_FEEDbased on needs
Security Considerations
Shell Execution
The digest prompt instructs agents to run Python scripts via shell commands. All script paths and arguments are skill-defined constants — no user input is interpolated into commands. Two scripts use subprocess:
run-pipeline.pyorchestrates child fetch scripts (all withinscripts/directory)fetch-github.pyhas two subprocess calls:openssl dgst -sha256 -signfor JWT signing (only ifGH_APP_*env vars are set — signs a self-constructed JWT payload, no user content involved)gh auth tokenCLI fallback (only ifghis installed — reads from gh's own credential store)
No user-supplied or fetched content is ever interpolated into subprocess arguments. Email delivery writes HTML to a temp file before passing to gog CLI, avoiding shell interpolation. Email subjects are static format strings only.
Credential & File Access
Scripts do not directly read ~/.config/, ~/.ssh/, or any credential files. All API tokens are read from environment variables declared in the skill metadata. The GitHub auth cascade is:
$GITHUB_TOKENenv var (you control what to provide)- GitHub App token generation (only if you set
GH_APP_ID,GH_APP_INSTALL_ID, andGH_APP_KEY_FILE— uses inline JWT signing viaopensslCLI, no external scripts involved) gh auth tokenCLI (delegates to gh's own secure credential store)- Unauthenticated (60 req/hr, safe fallback)
If you prefer no automatic credential discovery, simply set $GITHUB_TOKEN and the script will use it directly without attempting steps 2-3.
Dependency Installation
This skill does not install any packages. requirements.txt lists optional dependencies (feedparser, jsonschema) for reference only. All scripts work with Python 3.8+ standard library. Users should install optional deps in a virtualenv if desired — the skill never runs pip install.
Input Sanitization
- URL resolution rejects non-HTTP(S) schemes (javascript:, data:, etc.)
- RSS fallback parsing uses simple, non-backtracking regex patterns (no ReDoS risk)
- All fetched content is treated as untrusted data for display only
Network Access
Scripts make outbound HTTP requests to configured RSS feeds, Twitter API, GitHub API, Reddit JSON API, and Brave Search API. No inbound connections or listeners are created.
Overview
Tech News Digest automates the creation of a unified tech news brief by aggregating five data layers from RSS feeds, Twitter/X KOLs, GitHub releases, Reddit, and web search. It applies a quality scoring pipeline, deduplication, and template-driven output to deliver consistent, publish-ready digests to Discord, email, or Markdown templates.
How This Skill Works
The system runs a pipeline that ingests content from five sources, scores quality, deduplicates, and merges items into a single digest. It then formats the merged results into template outputs for Discord, email, or Markdown, ready for distribution.
When to Use It
- Daily or weekly digest for engineering and product teams to track AI, tooling, and platform updates
- Executive briefs highlighting top trends, releases, or strategic moves from multiple sources
- Developer relations or community teams generating newsletters for audiences on Discord or email
- Open source and GitHub focused monitoring for releases and project activity
- Reddit discussions and influencer threads surfaced in a concise digest for quick skim
Quick Start
- Step 1: Copy default configs to workspace/config and customize sources.json and topics.json
- Step 2: Set environment variables for API access (eg X_BEARER_TOKEN Brave API key GitHub token) as needed
- Step 3: Run the unified pipeline to generate merged digests and apply templates
Best Practices
- Curate a stable sources.json to reduce noise; prefer high signal feeds
- Enable deduplication and set a freshness window to prevent old stories
- Tune quality scoring thresholds to match audience needs and review top stories occasionally
- Use relevant topics and tags to improve search and filtering
- Archive past digests to maintain context and improve dedup over time
Example Use Cases
- Daily AI and ML digest for an engineering team with model releases and tooling notes
- Weekly executive digest highlighting top stories and strategic implications
- Discord newsletter digest for developer relations with notable threads and discussions
- GitHub releases roundup for platform teams monitoring new versions and features
- Reddit roundup summarizing community sentiment and hot threads