Execute BDD scenarios
npx machina-cli add skill Intai/story-flow/execute-bdd-scenario --openclawExecute BDD scenarios
Instructions
- Use
mcp__plugin_story-flow_playwright__*tools to manipulate browser to EXECUTE the BDD scenarios directly without generating any Playwright test file. - Test Execution Protocol:
- Before executing: Read the feature file and confirm the exact line numbers, scenario title, background sections and all "Then" and "And" assertions.
- Execution order is CRITICAL:
- Scenario tags FIRST (e.g.,
@purge-data) - Background steps SECOND (setup prerequisites)
- Scenario steps THIRD (Given/When/Then)
- Scenario tags FIRST (e.g.,
- Execute all Background steps: Background sections (both global and rule-specific) in feature files define prerequisite setup steps. EVERY step MUST be completed before executing the scenario steps, including both UI interactions and API calls by curl.
- During execution: Compare actual behavior against expected behavior at each assertion step.
- On failure: STOP immediately and provide a summary including:
- Expected behavior (with line numbers from feature file)
- Actual behavior (with evidence from page snapshot)
- Root cause analysis
- CRITICAL: Do NOT continue to subsequent test steps after a failure
- Executing Multiple Scenarios:
When executing multiple scenarios (e.g., "Execute all scenarios in @path/to/file.feature", "Execute ARMR-01,ARMR-02 scenarios in @path/to/file.feature"):
- Read the feature file to identify all scenario IDs and titles
- For each scenario, use the Task tool with
subagent_type="general-purpose"to run it in isolated context sequentially:
Recording Mode: When executing multiple scenarios withLoad BOTH skills in this order using the Skill tool: 1. First: `story-flow:execute-bdd-scenario` (plugin - general BDD framework) 2. Then: `execute-bdd-scenario` (project-level - overrides/extends the plugin) 3. Confirm both skills are loaded before continuing with the execution. Execute BDD scenario all in @path/to/file.feature [--record if recording mode is active].--record, pass the flag to each subagent prompt. Recording and Playwright.spec.jsgeneration happen within each scenario's subagent isolated context, not at the parent orchestration level. - Wait for subagent completion, record PASS/FAIL result
- If the scenario FAILED: STOP immediately and print the summary table with remaining scenarios marked as "⊘ SKIPPED". Do NOT proceed to the next scenario.
- Print summary table:
| Scenario | Title | Result | |----------|-------|--------| | SMG-01 | Display available languages | ✓ PASSED | | SMG-02 | Search and filter strings | ✗ FAILED | | SMG-03 | Inline edit a string value | ⊘ SKIPPED |
- Save
.featurefiles in the same folder as the stories. - If a BDD scenario has the
@screenshotstag, take screenshots throughout the execution. - If a BDD scenario doesn't have the
@screenshotstag, do not take any screenshot. - If a BDD scenario has the
@purge-datatag, restore the seed data first (before the Background steps) by executing themake reseedcommand. - If a BDD scenario does not have the
@purge-datatag, do not restore the seed data before running the scenario. - Use
mcp__playwright__browser_run_codeto set the browser offline. - Reference the @Makefile for local development workflows.
Mobile App Instructions
Environment Variables
| Variable | Description | Default |
|---|---|---|
APPIUM_DEVICE_NAME | Local device name or emulator | emulator-5554 |
APPIUM_APP_PACKAGE | Local app package unique identifier | - |
BROWSERSTACK_USERNAME | BrowserStack username | - |
BROWSERSTACK_ACCESS_KEY | BrowserStack access key | - |
BROWSERSTACK_APP_ID | Uploaded app ID (bs://...) | - |
Mode detection: When all 3 BrowserStack variables are set, use BrowserStack. Otherwise, use local Appium.
Local Appium Configuration (Default)
Start Appium server: npx appium server --port 4723
WebdriverIO options:
{
hostname: "localhost",
port: 4723,
waitforTimeout: 30000,
waitforInterval: 500,
connectionRetryTimeout: 30000,
connectionRetryCount: 3,
capabilities: {
platformName: "Android",
"appium:deviceName": process.env.APPIUM_DEVICE_NAME || "emulator-5554",
"appium:automationName": "UiAutomator2",
"appium:appPackage": process.env.APPIUM_APP_PACKAGE,
"appium:appActivity": ".MainActivity",
"appium:autoGrantPermissions": true,
"appium:noReset": true,
"appium:fullReset": false,
},
}
BrowserStack Configuration
When all 3 BrowserStack env vars are set, connect to BrowserStack cloud instead of local Appium. Start BrowserStackLocal binary before running tests. Stop BrowserStackLocal after tests complete. Use the WebDriver REST API directly by curl instead of Appium MCP or WebDriverIO library.
WebdriverIO options:
{
hostname: "hub-cloud.browserstack.com",
port: 443,
protocol: "https" as const,
path: "/wd/hub",
waitforTimeout: 30000,
waitforInterval: 500,
connectionRetryTimeout: 60000,
connectionRetryCount: 3,
capabilities: {
"bstack:options": {
buildName: ${featureName},
sessionName: ${scenarioName},
deviceName: "Google Pixel 9",
osVersion: "16",
userName: process.env.BROWSERSTACK_USERNAME,
accessKey: process.env.BROWSERSTACK_ACCESS_KEY,
local: true, // Enable tunnel for localhost access
},
platformName: "Android",
"appium:app": process.env.BROWSERSTACK_APP_ID,
"appium:automationName": "UiAutomator2",
},
}
Execution Guidelines
- Use
mcp__plugin_story-flow_appium__*tools or WebDriver REST API to manipulate mobile app to EXECUTE the BDD scenarios directly without generating any Playwright test file. - Save Appium screenshots to
.appium-mcpfolder using absolute paths (e.g.,/path/to/project/.appium-mcp/screenshot.png). The Appium MCP tool does not support relative paths. - Page source vs screenshots:
- Use page source XML (via
appium_get_page_sourceor REST API) for:- Verifying element presence, text content, and attributes
- Waiting for elements to appear or disappear. Leverage WebDriverIO
waitForDisplayed,waitForEnabled,waitForExist,waitForClickableandwaitUntil.- Default timeout: Call these methods WITHOUT explicit timeout parameters - they use
waitforTimeoutfrom appiumOptions (default: 30000ms) - Custom timeout: Only add explicit
{ timeout: X }when an operation is known to take longer than the default (e.g., network fetches, large file operations) - Example:
// CORRECT - uses default waitforTimeout (30s) await element.waitForDisplayed(); // CORRECT - custom timeout for slow operation await slowNetworkElement.waitForDisplayed({ timeout: 60000 }); // WRONG - redundant timeout same as default await element.waitForDisplayed({ timeout: 30000 });
- Default timeout: Call these methods WITHOUT explicit timeout parameters - they use
- Finding element bounds for tap coordinates
- All non-visual verification steps
- Only use screenshots for:
- Visual verification (e.g., verifying colors with ImageMagick)
- When the scenario has
@screenshotstag
- Page source is faster and provides structured data; screenshots are only needed when pixel-level visual verification is required.
- Use page source XML (via
- Multi-finger gestures (3-finger tap, pinch, etc.):
- The Appium MCP tools don't have direct multi-finger support, so use the W3C Actions API via HTTP.
Recording Mode (--record flag)
When the --record flag is present in the input, generate a Playwright .spec.js file after executing every scenario by recording actions and expectations during execution.
Recording Scenario Tags
When recording, capture scenario tags and output them BEFORE recording any other steps (including Background):
[RECORD_TAG]
scenario: "DAS-01: Finish onboarding"
tags: ["@purge-data"]
[/RECORD_TAG]
Annotation fields:
scenario: The scenario ID and titletags: Array of tags applied to this scenario
Example: Scenario with @purge-data tag and Background
Feature file:
Background:
Given I extract appId from S3
And I set localStorage
@purge-data
Scenario: DAS-01: Finish onboarding
Given I am on the onboarding page
Recording output:
[RECORD_TAG]
scenario: "DAS-01: Finish onboarding"
tags: ["@purge-data"]
[/RECORD_TAG]
Generated Playwright code (note: tag action comes BEFORE Background helper call):
test('DAS-01: Finish onboarding', async ({ page, context }) => {
// @purge-data - Restore the seed data to initial state (runs FIRST)
execSync('make reseed', { stdio: 'inherit' });
// Background (runs SECOND)
await setupBackground(context);
// Scenario steps (runs THIRD)
// Given I am on the onboarding page
await page.goto('http://localhost:8080/onboarding');
});
Recording Background Steps as Helper Functions
Background steps are converted to reusable helper functions at the top of the generated spec file. This eliminates code duplication and makes tests self-contained.
CRITICAL: When recording, you MUST:
- Record scenario tags FIRST (if any)
- Record ALL Background steps using
[RECORD_HELPER]annotation - Each Background step becomes either a standalone helper or part of a composite helper
- Follow the same order as they appear in the feature file
[RECORD_HELPER] Annotation Format
When recording Background steps, output a helper function definition:
[RECORD_HELPER]
name: extractAppIdFromS3
params: []
returns: "string"
body: |
const output = execSync(
"AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin aws --endpoint-url=http://localhost:9000 s3 ls s3://apps/",
{ encoding: "utf-8" }
);
return output.match(/PRE ([a-f0-9-]{36})\//)?.[1] ?? "";
[/RECORD_HELPER]
Annotation fields:
name: Function name (camelCase, descriptive)params: Array of typed parameters (e.g.,["context: BrowserContext", "appId: string"])returns: Return type (e.g.,"string","Promise<void>","Promise<string>")body: The function implementation (multiline supported)
Composite vs Granular Helpers
Create composite helpers when multiple Background steps work together:
[RECORD_HELPER]
name: setupBackground
params: ["context: BrowserContext"]
returns: "Promise<string>"
body: |
const appId = extractAppIdFromS3();
await setLocalStorageAppId(context, appId);
await setCookies(context);
return appId;
[/RECORD_HELPER]
Guidelines:
- Granular helpers: Individual reusable operations (e.g.,
extractAppIdFromS3,setCookies) - Composite helpers: Combine granular helpers for common setup patterns (e.g.,
setupBackground) - If a scenario needs a value from Background (e.g.,
appId), the composite helper should return it
Recording Dynamic Value Extraction
When a Background step extracts a value (e.g., appId from S3), create a helper that returns the value:
[RECORD_HELPER]
name: extractAppIdFromS3
params: []
returns: "string"
body: |
const output = execSync(
"AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin aws --endpoint-url=http://localhost:9000 s3 ls s3://apps/",
{ encoding: "utf-8" }
);
return output.match(/PRE ([a-f0-9-]{36})\//)?.[1] ?? "";
[/RECORD_HELPER]
The value is then available in the test via the composite helper's return value:
test('DAS-02: Delete image', async ({ page, context }) => {
// Background (returns appId for use in scenario)
const appId = await setupBackground(context);
// Scenario steps can now use appId...
});
Recording Actions (Given/When steps)
After each action step, output a structured annotation:
[RECORD_ACTION]
step: "When I click the \"Cancel\" button in the dialog"
method: click
locator: page.getByRole('dialog').getByRole('button', { name: 'Cancel' })
[/RECORD_ACTION]
Annotation fields:
step: The exact Gherkin step text (becomes a comment in generated code)method: The Playwright action methodlocator: The Playwright locator used to find the elementvalue: For fill/type/evaluate actions (optional)extract: Variable name to store extracted value (for execSync)pattern: Regex pattern to extract value from output (for execSync)args: Array of arguments to pass to evaluate function (for evaluate with args)
Handling Ordinal Qualifiers:
When BDD steps contain ordinal words (first, second, third, last), include the appropriate Playwright method directly in the locator field:
| Ordinal Word | Append to Locator |
|---|---|
| first | .first() |
| second | .nth(1) |
| third | .nth(2) |
| last | .last() |
Example - BDD step with "first":
And I click the close icon button on the first image
Recording output:
[RECORD_ACTION]
step: "And I click the close icon button on the first image"
method: click
locator: page.getByTestId('Images').getByTestId('DeleteButton').first()
[/RECORD_ACTION]
When a BDD step contains an ordinal qualifier, you MUST append the corresponding method to the locator to avoid strict mode violations.
Supported action methods:
| method | Generated Code |
|---|---|
goto | await page.goto(value) |
click | await locator.click() |
fill | await locator.fill(value) |
type | await locator.pressSequentially(value) |
selectOption | await locator.selectOption(value) |
check | await locator.check() |
uncheck | await locator.uncheck() |
evaluate | await page.evaluate(value) or await page.evaluate(value, ...args) |
execSync | const output = execSync(value, { encoding: 'utf-8' }); extract = output.match(pattern)?.[1] |
setInputFiles | await locator.setInputFiles(value) |
keyboardPress | await page.keyboard.press(value) |
keyboardDown | await page.keyboard.down(value) |
keyboardUp | await page.keyboard.up(value) |
waitFor | await locator.waitFor() |
waitForURL | await page.waitForURL(value) |
waitForLoadState | await page.waitForLoadState(value) |
Recording Expectations (Then/And assertion steps)
After verifying each assertion step, output a structured annotation:
[RECORD_EXPECT]
step: "Then I should see 3 images in the product section"
locator: page.getByTestId('Product').locator('img')
assertion: toHaveCount
value: 3
[/RECORD_EXPECT]
Annotation fields:
step: The exact Gherkin step text (becomes a comment)locator: The Playwright locator used to find the elementassertion: The Playwright assertion methodvalue: The expected value (optional, depends on assertion type)
Supported assertions:
| assertion | Generated Code |
|---|---|
toBeVisible | await expect(locator).toBeVisible() |
toBeHidden | await expect(locator).toBeHidden() |
toHaveCount | await expect(locator).toHaveCount(value) |
toHaveText | await expect(locator).toHaveText(value) |
toContainText | await expect(locator).toContainText(value) |
toHaveValue | await expect(locator).toHaveValue(value) |
toBeEnabled | await expect(locator).toBeEnabled() |
toBeDisabled | await expect(locator).toBeDisabled() |
toHaveURL | await expect(page).toHaveURL(value) |
toHaveAttribute | await expect(locator).toHaveAttribute(name, value) |
Locator Selection for expectations:
Playwright's accessibility snapshot does NOT show data-testid attributes. Use JavaScript evaluation to discover testids for the specific element you're interacting with.
-
Find testid for a specific element using
browser_evaluatewith the element ref:(element) => { // Walk up DOM tree to find testid chain let el = element; const chain = []; while (el && el !== document.body) { const testid = el.getAttribute('data-testid'); if (testid) { chain.push({ testid, tag: el.tagName.toLowerCase() }); } el = el.parentElement; } return chain; } -
Use the testid chain to build scoped locators:
# Example: clicking delete button in product section # Snapshot shows: [S42] button "delete" # JavaScript returns: [ # { testid: "DeleteButton", tag: "button" }, # { testid: "Product", tag: "div" } # ] # Generated locator: locator: page.getByTestId("Product").getByTestId("DeleteButton") -
Use testids over DOM traversal:
// WRONG - fragile, breaks when DOM structure changes locator: page.getByRole("heading", { name: "Product" }).locator("..").locator("..") // CORRECT - stable, uses discovered testid chain locator: page.getByTestId("Product").getByTestId("DeleteButton") -
If no testid exists, use this fallback priority:
page.locator('[data-slot]')- for other data attributesgetByRole()with accessible namegetByLabel()for form fieldsgetByText()for unique text content
Recording Command Line Executions
For steps that require command line verification (e.g., S3 state checks), output:
[RECORD_COMMAND]
step: "And the image should be deleted from S3 bucket"
command: AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin aws --endpoint-url=http://localhost:9000 s3 ls s3://apps/${appId}/assets/image.png
assertion: shouldFail
[/RECORD_COMMAND]
Annotation fields:
step: The exact Gherkin step textcommand: The shell command to executeassertion: EithershouldSucceed(command exits 0) orshouldFail(command exits non-zero)pattern: Optional regex pattern to match in output
Supported command assertions:
| assertion | Generated Code |
|---|---|
shouldSucceed | expect(() => execSync(command)).not.toThrow() |
shouldFail | expect(() => execSync(command)).toThrow() |
outputContains | expect(execSync(command, { encoding: "utf-8" })).toContain(pattern) |
outputMatches | expect(execSync(command, { encoding: "utf-8" })).toMatch(pattern) |
Shell Path Quoting:
When generating shell commands with file paths, always quote local file paths that may contain special shell characters (parentheses, spaces, $, etc.):
// WRONG - parentheses interpreted as subshell syntax
execSync(`aws s3 cp ${fixturePath}.png s3://bucket/...`);
// CORRECT - quoted path prevents shell interpretation
execSync(`aws s3 cp "${fixturePath}.png" s3://bucket/...`);
Common paths requiring quotes:
- Next.js route groups:
src/app/(app)/... - Paths with spaces:
src/My Documents/... - Paths with special chars:
src/feature[1]/...
Recording Mobile Screenshot Color Verification
For mobile app scenarios that verify UI colors via screenshots, use [RECORD_COMMAND] with ImageMagick. The generated spec file uses WebdriverIO with Appium.
Recording annotation:
[RECORD_COMMAND]
step: "Then the \"Login\" button should have background in the primary color \"#9933FF\""
command: magick "${imagePath}" -crop 1x1+150+2110 txt:- | tail -1
assertion: outputContains
pattern: "#9933FF"
[/RECORD_COMMAND]
Generated imports and setup:
import { test, expect } from '@playwright/test';
import { remote } from 'webdriverio';
import { execSync } from 'child_process';
import path from 'path';
function getAppiumOptions() {
const { BROWSERSTACK_USERNAME, BROWSERSTACK_ACCESS_KEY, BROWSERSTACK_APP_ID } = process.env;
const useBrowserStack = BROWSERSTACK_USERNAME && BROWSERSTACK_ACCESS_KEY && BROWSERSTACK_APP_ID;
if (useBrowserStack) {
return {
// See "BrowserStack Configuration" section
};
}
return {
// See "Local Appium Configuration" section
};
}
Generated helper functions:
async function takeAppScreenshot(driver: WebdriverIO.Browser, filename: string): Promise<string> {
const imagePath = path.join(process.cwd(), '.appium-mcp', filename);
await driver.saveScreenshot(imagePath);
return imagePath;
}
function verifyColorAtPixel(imagePath: string, x: number, y: number, expectedColor: string): void {
const output = execSync(
`magick '${imagePath}' -crop 1x1+${x}+${y} txt:- | tail -1`,
{ encoding: 'utf-8' }
);
if (!output.toUpperCase().includes(expectedColor.toUpperCase())) {
throw new Error(`Expected color ${expectedColor} at (${x},${y}), got: ${output.trim()}`);
}
}
Generated test code:
const driver = await remote(getAppiumOptions());
// Take screenshot for color verification
const imagePath = await takeAppScreenshot(driver, "color-verification.png");
// Then the "Login" button should have background in the primary color "#9933FF"
verifyColorAtPixel(imagePath, 150, 2110, "#9933FF");
await driver.deleteSession();
ImageMagick crop syntax: -crop 1x1+X+Y extracts a single pixel at coordinates (X, Y).
Finding pixel coordinates during execution:
- Take a screenshot using Appium MCP:
appium_take_screenshot - Get image dimensions:
magick identify image.png - Estimate element position based on screen layout
- Sample pixels to find the target color:
magick image.png -crop 1x1+X+Y txt:- | tail -1
Recording Wait Strategies
When the page needs to wait for async operations (e.g., after navigation, after platform change), output:
[RECORD_WAIT]
step: "Wait for page content to load"
method: waitForLoadState
value: "networkidle"
[/RECORD_WAIT]
Annotation fields:
step: Description of what is being waited formethod: The Playwright wait methodvalue: The wait condition or selector
Supported wait methods:
| method | Generated Code |
|---|---|
waitForLoadState | await page.waitForLoadState(value) |
waitForSelector | await page.waitForSelector(value) |
waitForFunction | await page.waitForFunction(value) |
When to record waits:
- After
page.goto()when waiting for dynamic content - After actions that trigger server requests (platform change, form submit)
- Before assertions on dynamically loaded content
DO NOT use waitForTimeout - if you find yourself needing a hardcoded wait, use waitForLoadState('networkidle') or wait for a specific element/assertion instead.
Recording Polling Assertions
For steps that require waiting for eventual consistency (S3 state, API responses), use Playwright's expect.poll():
When to use polling:
- S3 bucket state verification after async operations
- API state changes with eventual consistency
- Backend operations that don't complete immediately
Recording annotation:
[RECORD_POLL]
step: "And I wait until \"name\" is \"John\" in S3"
command: AWS_ACCESS_KEY_ID=minioadmin ... s3 cp s3://apps/${appId}/user.json -
key: name
assertion: toBe
value: "John"
[/RECORD_POLL]
Annotation fields:
step: The exact Gherkin step textcommand: The shell command to execute (should return JSON)key: JSON property to check (supports dot notation, e.g.,user.name)assertion: The Playwright assertion methodvalue: The expected value
Generated Playwright code:
// Helper function (add to helpers section if not present)
function getS3User(appId: string): Record<string, unknown> {
try {
const output = execSync(
`AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin aws --endpoint-url=http://localhost:9000 s3 cp s3://apps/${appId}/user.json - 2>/dev/null`,
{ encoding: "utf-8" }
);
return JSON.parse(output);
} catch {
return {};
}
}
// In test - check specific key
await expect.poll(() => getS3User(appId).name).toBe("John");
Supported polling assertions:
| assertion | Generated Code |
|---|---|
toBe | await expect.poll(() => getS3User(appId).key).toBe(value) |
toEqual | await expect.poll(() => getS3User(appId).key).toEqual(value) |
toContain | await expect.poll(() => getS3User(appId).key).toContain(value) |
toMatch | await expect.poll(() => getS3User(appId).key).toMatch(pattern) |
MUI Component Handling
MUI Select/Combobox:
- MUI Select components use
role="combobox"but are NOT actual<input>elements toHaveValue()will fail on combobox elements with "Not an input element" error- Use
toContainText()to verify the selected value:// WRONG - will fail (combobox is not an <input>) await expect(page.getByRole('combobox', { name: 'Country' })).toHaveValue('US'); // CORRECT - check displayed text await expect(page.getByRole('combobox', { name: 'Country' })).toContainText('United States');
MUI DataGrid Column Index:
- Column headers have
aria-colindexattribute (1-based index) - Grid cells have
data-colindexattribute (0-based index) anddata-fieldattribute (field name) - To verify column position or adjacency:
// Get column index by field name const stateIndex = await page.locator('[data-field="state"]').first() .getAttribute('data-colindex'); // Verify adjacent columns const countryIdx = await page.locator('[data-field="country"]').first().getAttribute('data-colindex'); expect(parseInt(stateIndex)).toBe(parseInt(countryIdx) + 1);
Navigation Handling
Waiting for URL redirections:
- When asserting URL changes after actions that trigger navigation (e.g., form submissions, button clicks that redirect), use
waitForURLbefore the assertion:// Wait for navigation to complete before asserting await page.waitForURL(/\/landing/); await expect(page).toHaveURL(/\/landing/); - For recording mode, output:
[RECORD_ACTION] step: "Verify redirect to landing page" method: waitForURL value: /\/landing/ [/RECORD_ACTION]
Waiting Strategies
CRITICAL: Never use waitForTimeout() in generated code. Hardcoded waits are anti-patterns that make tests slow and flaky.
Instead, use these Playwright waiting strategies:
| Scenario | Solution | Example |
|---|---|---|
| Wait for page load | waitForLoadState | await page.waitForLoadState('networkidle') |
| Wait for element | Assertion with auto-retry | await expect(locator).toBeVisible() |
| Wait for dynamic content | waitForSelector | await page.waitForSelector('[alt^="image"]') |
| Wait after navigation | waitForURL + waitForLoadState | See below |
Pattern for page navigation with dynamic content:
// WRONG - hardcoded wait
await page.goto("http://localhost:8080/settings");
await page.waitForTimeout(2000);
// CORRECT - wait for network to settle
await page.goto("http://localhost:8080/settings");
await page.waitForLoadState("networkidle");
// BEST - wait for specific content you need
await page.goto("http://localhost:8080/settings");
await expect(page.locator('[alt^="image"]')).toHaveCount(3);
Pattern for actions that trigger async operations:
// WRONG
await page.getByRole("combobox", { name: "Country" }).click();
await page.getByRole("option", { name: "United States" }).click();
await page.waitForTimeout(2000);
// CORRECT - wait for the result of the action
await page.getByRole("combobox", { name: "Country" }).click();
await page.getByRole("option", { name: "United States" }).click();
await page.waitForLoadState("networkidle");
Locator Strategies
NEVER fabricate or assume selectors. Always discover actual selectors from the page snapshot or via JavaScript evaluation before recording.
Web (Playwright)
Use Playwright's preferred locator strategies in order of preference:
page.getByTestId('OnboardingPrimaryColor')- ALWAYS check for data-testid firstpage.locator('[data-slot="sidebar-overlay"]')- for other data attributespage.getByRole('button', { name: 'Submit' })- for semantic elementspage.getByLabel('Name')- for form fieldspage.getByPlaceholder('Enter name')- for placeholder textpage.getByText('Error message')- for text contentpage.locator('.class-name')- fallback for CSS selectors
Anti-pattern (DO NOT): Feature file says: "click the delete button on image1" Recording assumes: data-testid="delete-image1" ← WRONG if not verified
Correct pattern: Feature file says: "click the delete button on image1" → Take snapshot → Find actual element: [S42] button "delete" (data-testid="DeleteButton") → Record: page.locator('[data-testid="DeleteButton"]').first()
Mobile (Appium/WebDriverIO)
The ~accessibilityId selector (e.g., driver.$('~Sign in')) is cross-platform but only works if the app sets accessibility attributes:
| Platform | Required Attribute |
|---|---|
| Android | content-description |
| iOS | accessibilityIdentifier or label |
Common issue: An element may have text="Sign in" but no content-description, causing ~Sign in to fail.
Cross-platform pattern:
const isAndroid = driver.isAndroid;
const element = isAndroid
? await driver.$('android=new UiSelector().text("Sign in")')
: await driver.$('-ios predicate string:label == "Sign in"');
Locator strategy priority:
~accessibilityId- if app has proper accessibility attributes (preferred, cross-platform)id=resource-id- Android resource IDs (e.g.,id=com.app:id/buttonSubmit)android=UiSelector()/-ios predicate string:- platform-specific text/attribute matching- XPath - last resort, fragile
Generating the Spec File
After executing all scenario steps, use the Write tool to create a .spec.js file alongside the .feature file with:
Handling Existing Spec Files:
- If a
.spec.jsfile already exists, read it first to preserve the file structure - Replace test cases that match the recorded scenario ID (e.g.,
test('DAS-01: ...')) - Keep other existing test cases, helper functions, imports, and describe blocks intact
- Maintain the original file's formatting and organization
Generated File Structure (unified for all scenarios):
import { expect, test, BrowserContext, Page } from '@playwright/test';
import { execSync } from 'child_process';
// ============================================================
// Helper Functions
// ============================================================
function extractAppIdFromS3(): string {
const output = execSync(
'AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin aws --endpoint-url=http://localhost:9000 s3 ls s3://apps/',
{ encoding: 'utf-8' }
);
return output.match(/PRE ([a-f0-9-]{36})\//)?.[1] ?? '';
}
async function setLocalStorageAppId(context: BrowserContext, appId: string): Promise<void> {
await context.addInitScript((id) => {
if (id) localStorage.setItem('app_id', JSON.stringify(id));
}, appId);
}
async function setCookies(context: BrowserContext): Promise<void> {
await context.addCookies([
{ name: 'x-tenant-id', value: '0595cc48-354b-4b13-8f17-fe65d7f79146', domain: 'localhost', path: '/' },
{ name: 'x-app-id', value: 'd37a8fae-ed2d-4a78-9d71-d5a3dfe0d0ca', domain: 'localhost', path: '/' },
]);
}
async function setupBackground(context: BrowserContext): Promise<string> {
const appId = extractAppIdFromS3();
await setLocalStorageAppId(context, appId);
await setCookies(context);
return appId;
}
// ============================================================
// Test Suite
// ============================================================
test.describe('Feature: App Settings', () => {
test('DAS-01: Finish onboarding', async ({ page, context }) => {
// @purge-data - Restore the seed data to initial state (tag action runs FIRST)
execSync('make reseed', { stdio: 'inherit' });
// Background (helper call runs SECOND)
await setupBackground(context);
// Scenario steps (runs THIRD)
await page.goto('http://localhost:8080/onboarding');
// ... remaining scenario steps
});
test('DAS-02: Delete image', async ({ page, context }) => {
// Background (returns appId for use in scenario)
const appId = await setupBackground(context);
// Scenario steps using appId
await page.goto('http://localhost:8080/settings');
// ... remaining scenario steps using appId
});
});
Key points:
- Helper functions first: All Background-derived helpers appear at the top of the file
- No beforeEach: Each test explicitly calls
setupBackground()for clarity - Tags only affect tag actions:
@purge-dataaddsexecSync('make reseed')BEFORE the helper call - Self-contained tests: Each test shows its full setup, making debugging easier
- Return values: If scenario needs a Background value (e.g.,
appId), capture it from the helper
Example Recording Session
Feature step:
When I enter "John" in the "Name" field
And I click the "CREATE" button
Then a confirmation dialog should appear
Claude's output during execution:
Filling "Name" field with "John"...
[RECORD_ACTION]
step: "When I enter \"John\" in the \"Name\" field"
method: fill
locator: page.getByLabel('Name')
value: "John"
[/RECORD_ACTION]
Clicking CREATE button...
[RECORD_ACTION]
step: "And I click the \"CREATE\" button"
method: click
locator: page.getByRole('button', { name: 'CREATE' })
[/RECORD_ACTION]
Taking snapshot to verify dialog...
Dialog found at ref [S1].
[RECORD_EXPECT]
step: "Then a confirmation dialog should appear"
locator: page.getByRole('dialog')
assertion: toBeVisible
[/RECORD_EXPECT]
Generated Playwright code:
// When I enter "John" in the "Name" field
await page.getByLabel('Name').fill('John');
// And I click the "CREATE" button
await page.getByRole('button', { name: 'CREATE' }).click();
// Then a confirmation dialog should appear
await expect(page.getByRole('dialog')).toBeVisible();
Example: S3 Verification Recording
Feature step:
And the image should be deleted from S3 bucket "apps" at "{appId}/assets/image1.png"
Claude's output during execution:
Verifying S3 deletion via AWS CLI...
[RECORD_COMMAND]
step: "And the image should be deleted from S3 bucket"
command: AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin aws --endpoint-url=http://localhost:9000 s3 ls s3://apps/0bec779c-db96-4c0e-b78f-800888d4fe20/assets/image1.png
assertion: shouldFail
[/RECORD_COMMAND]
Generated Playwright code:
// And the image should be deleted from S3 bucket
expect(() => execSync(
`AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin aws --endpoint-url=http://localhost:9000 s3 ls s3://apps/${appId}/assets/image1.png`,
{ stdio: "pipe" }
)).toThrow();
Example Inputs
- Execute BDD scenario ARMR-01 in @path/to/file.feature
- Execute scenario ARMR-06 for Driver card in @path/to/file.feature
- Execute all scenarios in @path/to/file.feature
- Run scenario ARMR-03 from @path/to/file.feature
- Run the ARMR-05 test in file.feature
- Test ARMR-02 scenario
- Execute BDD scenario DAS-02 in @path/to/file.feature --record
- Execute all scenarios in @path/to/file.feature --record
Source
git clone https://github.com/Intai/story-flow/blob/main/plugins/story-flow/skills/execute-bdd-scenario/SKILL.mdView on GitHub Overview
Executes BDD scenarios defined in .feature files through browser automation. It preserves the required execution order (tags first, background second, then scenario steps) and uses Playwright-based tooling to validate UI behavior and capture evidence when needed.
How This Skill Works
The system reads the feature file to confirm line numbers, scenario titles, backgrounds, and all Then/And assertions. It executes scenarios in isolation using the story-flow and project-level execute-bdd-scenario skills, honoring background steps and relevant tags (e.g., @purge-data, @screenshots). It uses mcp__playwright__browser_run_code to control browser state and stops on first failure, providing a detailed summary with evidence.
When to Use It
- When you need to run BDD scenarios from a .feature file using a browser
- When execution order must be strictly: Tags, Background steps, then Scenario steps
- When seed data must be reset before running scenarios via @purge-data
- When you want screenshots managed by the @screenshots tag
- When executing multiple scenarios in isolation and capturing PASS/FAIL results
Quick Start
- Step 1: Inspect the feature file to confirm line numbers, backgrounds, and assertions
- Step 2: Run scenarios in order using the Story Flow BDD execution tooling (tags -> background -> steps)
- Step 3: Review PASS/FAIL summary and collect artifacts; stop on first failure if any
Best Practices
- Always read the feature file to confirm line numbers, scenario titles, and all assertions before execution
- Ensure Background steps complete fully before any Scenario steps
- Use the Task tool sequence to load story-flow:execute-bdd-scenario first, then execute-bdd-scenario
- Respect @purge-data to reseed data prior to background steps when required
- Capture evidence (screenshots if @screenshots is present) and stop immediately on first failure
Example Use Cases
- Smoke-testing a new UI feature described in a feature file with full scenario execution
- Verifying search and filter flows across scenarios with deterministic background setup
- Inline editing workflow validated end-to-end across multiple scenarios
- Regression sweep of a feature with seed-data reset to ensure consistent state
- Cross-feature validation where background prerequisites are shared across scenarios