Get the FREE Ultimate OpenClaw Setup Guide →

data-deprecation-analysis

Scanned
npx machina-cli add skill karim-bhalwani/agent-skills-collection/data-deprecation-analysis --openclaw
Files (1)
SKILL.md
8.8 KB

Data Deprecation Analysis

Expert in analyzing ODBC/database access logs to identify technical debt, unused data structures, and security risks for safe deprecation.

When to Use This Skill

Use when:

  • Auditing data warehouses for unused tables, views, or stored procedures
  • Planning data platform modernization or migration projects
  • Identifying legacy application dependencies on data assets
  • Detecting stale data structures that can be safely deprecated
  • Analyzing access patterns to prioritize modernization efforts
  • Building technical debt reports for leadership
  • Identifying security risks from legacy or orphaned data objects

Core Capabilities

  1. Dead Data Detection - Analyze access logs to tier data objects by recency (Ghost/Cold/Cool/Active)
  2. Legacy Pattern Recognition - Identify deprecated technologies (old ODBC drivers, legacy Excel, Access databases)
  3. Dependency Mapping - Trace which applications, users, and service accounts access which objects
  4. Risk Assessment - Flag security concerns (orphaned objects, stale service accounts, unoptimized queries)
  5. Deprecation Planning - Generate actionable reports for safe data asset retirement
  6. Query Pattern Analysis - Detect anti-patterns (SELECT *, unfiltered scans, legacy joins)

Workflow / Process

Phase 1: Data Collection

  1. Identify access log sources (SQL Server Audit, Splunk, application logs, ODBC traces)
  2. Define key variables: Timestamp, Application_Name, Database_Object, User_Account, Query_Text
  3. Establish analysis time window (typically 6-24 months)
  4. Extract and normalize log data into analysis-ready format

Phase 2: Dead Data Detection

  1. Group access records by Database_Object
  2. Calculate Max_Access_Date (last access) for each object
  3. Assign tier based on access recency:
    • Ghost Tier: No access in >18 months
    • Cold Tier: No access in >12 months
    • Cool Tier: No access in >6 months
    • Active: Accessed within last 6 months
  4. Prioritize false negatives: Any access in 6 months = Active

Phase 3: Legacy Pattern Recognition

  1. Filter for legacy Application_Name indicators:
    • Keywords: "Access", "Excel 2013", "ODBC 3.0", "Python Script", ".Net App"
  2. Identify high-frequency queries from legacy applications
  3. Flag anti-patterns: SELECT *, unfiltered table scans, deprecated syntax
  4. Map legacy dependencies to data objects

Phase 4: Risk Assessment & Reporting

  1. Cross-reference Ghost/Cold objects with User_Account for contact information
  2. Identify orphaned objects (no recent access, no known owner)
  3. Flag security risks (stale service accounts, excessive permissions)
  4. Generate tiered deprecation report with recommendations

Outputs & Deliverables

  • Primary Output: Tiered deprecation report (Ghost/Cold/Cool/Active counts with object details)
  • Secondary Output: Legacy dependency map, contact list for last-known users, SQL/Python analysis scripts
  • Success Criteria: All objects categorized, Ghost tier objects have owner contact info, actionable deprecation timeline
  • Quality Gate: Zero false negatives (any recent access = Active), verified with stakeholders before deprecation

Analysis Patterns

Tiering Thresholds

TierLast AccessRisk LevelAction
Ghost>18 monthsLow (safe to deprecate)Archive and remove
Cold>12 monthsMediumFlag for review, contact owners
Cool>6 monthsMedium-HighMonitor, investigate usage
Active<6 monthsN/AKeep, optimize if needed

Legacy Application Indicators

IndicatorConcernRecommended Action
"Microsoft Access"End-of-life technologyMigrate to modern BI tool
"Excel 2013" or olderLegacy Office versionUpgrade or migrate workflow
"ODBC 3.0"Outdated driverUpdate connection strings
Generic "Python Script"Unknown/undocumented automationDocument and modernize
".Net App" (unnamed)Untracked applicationIdentify owner, document dependency

Query Anti-Patterns

PatternDetectionRisk
SELECT *Regex: SELECT\s+\*Unoptimized, fragile to schema changes
No WHERE clauseFull table scan detectionPerformance, cost concerns
Deprecated syntax+= joins, non-ANSI joinsCompatibility risk
Hardcoded datesRegex: date literals in queriesMaintenance debt

Standards & Best Practices

Data Collection

  • Collect minimum 12 months of access logs for accurate tiering
  • Normalize timestamps to UTC for consistent analysis
  • Include both successful and failed access attempts
  • Capture full query text when possible for pattern analysis

Analysis Integrity

  • Prioritize False Negatives: If accessed once in 6 months, mark as Active
  • Validate Object Existence: Cross-reference with current catalog (some logged objects may be already dropped)
  • Account for Seasonality: Some objects accessed only quarterly (fiscal reports)
  • Consider Indirect Access: Views/procedures may access tables not directly queried

Deprecation Safety

  • Never deprecate without contacting last-known User_Account
  • Archive to cold storage before permanent deletion
  • Maintain deprecation log for audit trail
  • Set grace period (30-90 days) between deprecation notice and removal
  • Test downstream dependencies before removal

Security Considerations

  • Flag orphaned service accounts with data access
  • Identify objects accessed by terminated employees
  • Review excessive permissions on Cold/Ghost objects
  • Document any PII/sensitive data in deprecated objects

Integration Points

PhaseInput FromOutput ToContext
Collectionops-manager, data-pipeline-engineerLog extraction and normalization
Analysissenior-data-engineerTechnical debt assessment
Reportingprincipal-data-engineerStrategic modernization planning
Remediationsenior-data-engineerguardianSecurity risk validation
Executionops-managerArchive and removal automation

Constraints

Technical Constraints:

  • Requires access to database audit logs or ODBC traces
  • Analysis accuracy depends on log completeness and retention
  • Cannot detect objects accessed via dynamic SQL not captured in logs

Scope Constraints:

  • In Scope: Tables, views, stored procedures, functions with access logs
  • Out of Scope: Application code refactoring, infrastructure changes, data migration execution

Governance Constraints:

  • All deprecation decisions require stakeholder approval
  • Maintain audit trail of deprecated objects and approval chain
  • Comply with data retention policies before permanent deletion

Common Pitfalls

  • Seasonal Access Missed: Quarterly/annual reports appear inactive. Fix: Use 18+ month window, verify with business calendar.
  • Dynamic SQL Not Logged: Objects accessed via EXEC/sp_executesql may not appear in logs. Fix: Enable extended events or query store.
  • Service Account Mapping: Generic service accounts hide true ownership. Fix: Cross-reference with application inventory, contact team leads.
  • View Dependencies Ignored: Table appears unused but accessed via views. Fix: Trace view dependencies before deprecating underlying tables.
  • False Confidence in Ghost Tier: Object not accessed but still referenced in code. Fix: Grep codebase for object references before deprecation.
  • No Backup Before Removal: Data permanently lost after deprecation. Fix: Always archive to cold storage with 90-day retention.

Reference Files

Detailed implementation guides available in references/ folder:


Dependencies

  • Depends On: senior-data-engineer, sql-optimization-patterns, guardian
  • Related Skills: principal-data-engineer (strategic planning), data-pipeline-engineer (log extraction), ops-manager (removal automation)

Source

git clone https://github.com/karim-bhalwani/agent-skills-collection/blob/main/skills/data-deprecation-analysis/SKILL.mdView on GitHub

Overview

Data Deprecation Analysis specializes in analyzing ODBC/database access logs to identify technical debt, unused data structures, and security risks for safe deprecation. It is ideal for auditing data warehouses for deprecation candidates, detecting stale tables and views, identifying legacy application dependencies, and planning safe retirement of data assets.

How This Skill Works

It collects and normalizes access log data from sources like SQL Server Audit, Splunk, and ODBC traces. It then groups records by Database_Object, assigns a recency tier (Ghost/Cold/Cool/Active), and flags legacy patterns (old ODBC drivers, Excel-based workbooks, or Python/.Net scripts). Finally, it maps dependencies, performs risk assessment, and generates a tiered deprecation report with actionable recommendations.

When to Use It

  • Auditing data warehouses for unused tables, views, or stored procedures.
  • Planning data platform modernization or migration projects.
  • Identifying legacy application dependencies on data assets.
  • Detecting stale data structures that can be safely deprecated.
  • Analyzing access patterns to prioritize modernization and deprecation efforts.

Quick Start

  1. Step 1: Identify access log sources and define fields (Timestamp, Application_Name, Database_Object, User_Account, Query_Text).
  2. Step 2: Normalize data and compute last access per object, assigning Ghost/Cold/Cool/Active.
  3. Step 3: Produce a tiered deprecation report with dependencies, owner contacts, and recommended retirement actions.

Best Practices

  • Standardize log sources (SQL Server Audit, Splunk, ODBC traces) and maintain a single schema.
  • Use a defined analysis window (6-24 months) and normalize data before processing.
  • Tier objects by recency (Ghost/Cold/Cool/Active) and require owner for Ghost/Cold.
  • Map dependencies to applications and service accounts to avoid orphaned assets.
  • Validate deprecation plans with stakeholders and document retirement timelines.

Example Use Cases

  • Archived a Ghost-tier table after confirming no owner and no active workloads for 18+ months.
  • Created a legacy dependency map showing which apps rely on an old Excel-based data source.
  • Flagged anti-patterns such as SELECT * and unfiltered scans to drive query refactors.
  • Coordinated with security to review orphaned objects and stale service accounts.
  • Delivered a deprecation plan with an actionable retirement timeline for a deprecated view.

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers