Get the FREE Ultimate OpenClaw Setup Guide →

email-receipt-scanning

npx machina-cli add skill peerjakobsen/smartspender/email-receipt-scanning --openclaw
Files (1)
SKILL.md
6.9 KB

Email Receipt Scanning

Purpose

Provides Gmail search queries, Danish sender patterns, email content type detection, and deduplication rules for scanning a user's inbox for receipts and invoices. Used by the /smartspender:receipt email command.

Prerequisites

This skill requires the Gmail MCP server to be configured in the user's environment. The following MCP tools are needed:

  • gmail_search or equivalent — search emails by query
  • gmail_get_message or equivalent — read email content and metadata
  • gmail_get_attachment or equivalent — download PDF attachments

If Gmail MCP is not available, commands using this skill should fail gracefully with a clear message.

Gmail Search Queries

Primary Search Query

Search for receipt and invoice emails using this combined query:

subject:(faktura OR kvittering OR receipt OR invoice OR ordre OR order OR betaling) has:attachment after:{YYYY/MM/DD}

Where {YYYY/MM/DD} is either:

  • The last_email_scan timestamp from settings.csv (incremental scan)
  • A calculated date based on the days argument (e.g., 30 days back)
  • Default: 90 days back if no previous scan and no argument

Supplementary Queries

If the primary query returns few results, also try:

from:(*faktura* OR *invoice* OR *noreply* OR *no-reply*) has:attachment after:{YYYY/MM/DD}
subject:(ordrebekraeftelse OR orderbekraeftelse OR betalingsbekraeftelse) after:{YYYY/MM/DD}

Query Notes

  • has:attachment ensures only emails with files are returned (most invoices are PDF attachments)
  • Danish keywords (faktura, kvittering, betaling) catch Danish vendor emails
  • English keywords (receipt, invoice, order) catch international vendors
  • Date filter prevents re-scanning old emails

Danish Sender Patterns

Map known email sender domains to vendor IDs for faster vendor detection:

Sender PatternVendor IDVendor NameType
*@tdc.dktdcTDCTelecom
*@tdcnet.dktdcTDCTelecom
*@telenor.dktelenorTelenorTelecom
*@telia.dkteliaTeliaTelecom
*@orsted.dkorstedOerstedElectricity
*@hofor.dkhoforHOFORWater
*@norlys.dknorlysNorlysElectricity
*@ewii.dkewiiEWIIUtility
*@dinenergi.dkdinenergiDin EnergiElectricity
*@netflix.comnetflixNetflixStreaming
*@spotify.comspotifySpotifyStreaming
*@amazon.comamazonAmazonOnline order
*@amazon.deamazonAmazonOnline order
*@zalando.dkzalandoZalandoOnline order
*@ikea.comikeaIKEAOnline order
*@wolt.comwoltWoltDelivery
*@nemlig.comnemligNemligDelivery

For senders not in this table: extract the domain name as a starting point for vendor detection, then fall through to the invoice-parsing skill's vendor detection workflow.

Email Content Types

Receipt emails come in three forms. Detect and handle each:

Content TypeDetectionExtraction Method
PDF attachmentEmail has .pdf attachmentDownload attachment → process as PDF invoice
Inline HTMLEmail body contains structured receipt data, no PDFExtract from email HTML body
BothPDF attachment + summary in bodyPrefer PDF attachment (more complete)

PDF Attachment Priority

When an email has both a PDF attachment and inline content, always process the PDF. The inline content is typically a summary or notification, while the PDF is the full invoice.

Inline HTML Extraction

For emails without PDF attachments (e.g., Wolt order confirmations, Nemlig receipts):

  1. Parse the email HTML body
  2. Look for structured tables with item names, quantities, prices
  3. Extract total from summary section
  4. Set file_reference to email:{message_id} (no file to archive)

Deduplication

Timestamp-Based Scan Window

  • Read last_email_scan from settings.csv
  • Only search emails received after this timestamp
  • After successful scan, update last_email_scan to current datetime

Cross-Check with receipts.csv

Before processing each email:

  1. Detect vendor and date from email metadata
  2. Extract total (from subject line or quick body scan)
  3. Check receipts.csv for existing receipt with same date + merchant + total_amount
  4. If match found: skip and note in scan summary as "allerede registreret"

Deduplication Fields

FieldSourceMatch Rule
dateEmail date or invoice dateSame day
merchantSender domain mapping or vendor detectionSame normalized merchant
total_amountPDF extraction or email bodyExact match

Date Range Calculation

ScenarioDate Range
last_email_scan exists in settings.csvFrom last_email_scan to now
User provides days argumentFrom (today - days) to now
Neither (first scan)From (today - 90 days) to now

If the user provides a days argument, it overrides last_email_scan. This allows rescanning a specific period.

Email Filtering Heuristics

Not every email matching the search query is a receipt. Apply these filters:

Include

  • Emails with PDF attachments from known vendor domains
  • Emails with "faktura" or "kvittering" in subject
  • Order confirmation emails with itemized totals

Exclude

  • Marketing emails (subject contains "tilbud", "kampagne", "nyhedsbrev" without "faktura"/"kvittering")
  • Password reset or account notification emails
  • Shipping notifications without invoice content
  • Emails already processed (deduplication check)

Examples

Example 1: Incremental Scan

Context: last_email_scan = 2026-01-15 in settings.csv

Search query: subject:(faktura OR kvittering OR receipt OR invoice OR ordre OR order OR betaling) has:attachment after:2026/01/15

Results: 4 emails found

  1. TDC faktura (2026-01-20) — PDF attachment — known vendor
  2. Oersted aarsopgoerelse (2026-01-25) — PDF attachment — known vendor
  3. Wolt ordrebekraeftelse (2026-01-28) — inline HTML — no parser
  4. Spam email about "tilbud" — filtered out

After processing: Update last_email_scan to 2026-02-01

Example 2: First-Time Scan with Days Argument

Context: No last_email_scan in settings.csv. User runs /smartspender:receipt email 30

Search query: subject:(faktura OR kvittering OR receipt OR invoice OR ordre OR order OR betaling) has:attachment after:2026/01/02

Results: Emails from the last 30 days

Related Skills

  • See skills/document-parsing/SKILL.md for vendor detection, parser lookup workflow, and general extraction rules
  • See skills/data-schemas/SKILL.md for the CSV file structure (email receipts use source: email)

Source

git clone https://github.com/peerjakobsen/smartspender/blob/main/skills/email-receipt-scanning/SKILL.mdView on GitHub

Overview

Email Receipt Scanning identifies receipts and invoices in a user's Gmail by applying dedicated search queries, Danish sender patterns, and content-type detection. It includes deduplication rules and is used by the /smartspender:receipt email command to streamline expense tracking.

How This Skill Works

It relies on Gmail MCP tools (gmail_search, gmail_get_message, gmail_get_attachment) to locate emails, read content, and download PDFs. It uses a primary search query for receipts and invoices with attachments after a date, plus supplementary queries if needed. When both a PDF and inline content exist, the PDF is given priority because it typically contains the full invoice.

When to Use It

  • When scanning a Gmail inbox for new receipts and invoices since the last run
  • When dealing with Danish vendors that use faktura or kvittering keywords
  • When emails include PDF attachments that are likely invoices
  • When receipts arrive as inline HTML without attachments
  • When encountering an unknown vendor, relying on domain-based sender patterns to kick off vendor detection

Quick Start

  1. Step 1: Ensure Gmail MCP server is configured with gmail_search, gmail_get_message, and gmail_get_attachment
  2. Step 2: Run /smartspender:receipt email to start scanning
  3. Step 3: Validate results and review deduplication; ensure last_email_scan is updated

Best Practices

  • Keep last_email_scan in settings.csv to drive incremental scans
  • Start with the primary search query; fall back to supplementary queries if results are sparse
  • Process PDFs first when both PDF and inline content exist
  • Configure Gmail MCP tools (gmail_search, gmail_get_message, gmail_get_attachment) and handle errors gracefully if unavailable
  • Map known Danish sender patterns to vendor IDs to speed up vendor detection; otherwise fall back to domain-based parsing

Example Use Cases

  • A Danish telecom invoice from TDC with a PDF attachment is found via the primary query and parsed
  • Nemlig receipts are inline HTML without PDFs and are parsed from the email body
  • Netflix receipts arrive from netflix.com and are detected via sender pattern
  • Wolt delivery receipts with PDFs are downloaded and parsed
  • Unknown vendor emails are handled by domain-based detection to route to the vendor-detection workflow

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers