data-management-plan-creator
npx machina-cli add skill aipoch/medical-research-skills/data-management-plan-creator --openclawData Management Plan (DMP) Creator
Automatically generate draft Data Management and Sharing Plans (DMSP) compliant with NIH 2023 policy requirements and FAIR principles.
Overview
This Skill generates comprehensive Data Management and Sharing Plans (DMSP) that meet NIH's 2023 Final Policy for Data Management and Sharing. The output follows FAIR principles (Findable, Accessible, Interoperable, Reusable) to ensure research data is properly managed and shared.
Requirements
- Python 3.8+
- No external dependencies required (uses standard library only)
Usage
Command Line
python scripts/main.py \
--project-title "Your Research Project Title" \
--pi-name "Principal Investigator Name" \
--data-types "genomic,imaging,clinical" \
--repository "GEO,Figshare" \
--output dmsp_draft.md
Interactive Mode
python scripts/main.py --interactive
As a Module
from scripts.main import DMSPCreator
creator = DMSPCreator(
project_title="Cancer Genomics Study",
pi_name="Dr. Jane Smith",
institution="National Cancer Institute",
data_types=["genomic sequencing", "clinical metadata"],
estimated_size_gb=500,
repositories=["dbGaP", "GEO"],
sharing_timeline="6 months after study completion"
)
dmsp = creator.generate_plan()
creator.save_to_file("dmsp_output.md")
Parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
--project-title | string | - | Yes | Title of the research project |
--pi-name | string | - | Yes | Name of the Principal Investigator |
--institution | string | - | Yes | Research institution or organization |
--data-types | string | - | Yes | Comma-separated list of data types (e.g., "genomic,imaging,clinical") |
--estimated-size | float | - | No | Estimated data size in GB |
--repository | string | - | Yes | Comma-separated list of target repositories |
--sharing-timeline | string | No later than the end of the award period | No | When data will be shared |
--access-restrictions | string | - | No | Any access restrictions (e.g., "controlled-access for sensitive data") |
--format-standards | string | - | No | Data format standards to be used |
--output | string | dmsp_[timestamp].md | No | Output file path |
--interactive | flag | - | No | Run in interactive mode |
NIH DMSP Required Elements
The generated plan addresses all six required elements per NIH policy:
- Data Type - Types and estimated amount of scientific data
- Related Tools, Software and/or Code - Tools needed to access/manipulate data
- Standards - Standards for data/metadata to be applied
- Data Preservation, Access, and Associated Timelines - Repository selection and sharing timeline
- Access, Distribution, or Reuse Considerations - Factors affecting subsequent access
- Oversight of Data Management and Sharing - Plans for compliance monitoring
FAIR Principles Implementation
Findable
- Persistent identifiers (DOIs)
- Rich metadata with standard vocabularies
- Registration in searchable repositories
Accessible
- Standardized communication protocols
- Metadata available even if data is no longer available
- Access procedures clearly documented
Interoperable
- Standard data formats
- Standard terminologies and vocabularies
- Qualified references to other data
Reusable
- Detailed provenance information
- Clear usage licenses
- Domain-relevant community standards
Example Output
The generated DMSP includes:
- Executive summary
- NIH-compliant section headers
- Specific language for data type descriptions
- FAIR-aligned metadata standards
- Repository recommendations
- Timeline for data sharing
- Access control procedures
- Roles and responsibilities
References
License
MIT License - See project root for details.
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
Security Checklist
- No hardcoded credentials or API keys
- No unauthorized file system access (../)
- Output does not expose sensitive information
- Prompt injection protections in place
- Input file paths validated (no ../ traversal)
- Output directory restricted to workspace
- Script execution in sandboxed environment
- Error messages sanitized (no stack traces exposed)
- Dependencies audited
Prerequisites
# Python dependencies
pip install -r requirements.txt
Evaluation Criteria
Success Metrics
- Successfully executes main functionality
- Output meets quality standards
- Handles edge cases gracefully
- Performance is acceptable
Test Cases
- Basic Functionality: Standard input → Expected output
- Edge Case: Invalid input → Graceful error handling
- Performance: Large dataset → Acceptable processing time
Lifecycle Status
- Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Performance optimization
- Additional feature support
Source
git clone https://github.com/aipoch/medical-research-skills/blob/main/scientific-skills/Academic writing/data-management-plan-creator/SKILL.mdView on GitHub Overview
Automatically generate draft Data Management and Sharing Plans (DMSP) that comply with NIH 2023 Final Policy and apply FAIR principles. The tool outputs comprehensive plans covering NIH's six required elements, data types, repositories, standards, timelines, and oversight to support compliant and reusable data sharing.
How This Skill Works
Uses Python 3.8+ with the standard library to collect inputs such as project title, PI, institution, data types, estimated size, repositories, and sharing timeline via CLI or interactive mode. It then assembles NIH six-required elements and FAIR-aligned sections into a markdown DMSP draft, or exposes a module interface for programmatic generation.
When to Use It
- Preparing an NIH grant proposal (R01/R21) that requires a compliant Data Management and Sharing Plan
- Plans involving multiple data types and repositories needing a clear sharing timeline
- Projects with restricted data requiring controlled-access metadata and licenses
- Early-stage data planning to define formats, standards, and provenance
- Compliance reviews or institutional approvals before final NIH submission
Quick Start
- Step 1: Run the CLI with required arguments, e.g., python scripts/main.py --project-title Your Research Project Title --pi-name Principal Investigator Name --institution Your Institution --data-types genomic,imaging,clinical --repository GEO,dbGaP --sharing-timeline 6 months after study completion --output dmsp_draft.md
- Step 2: If you prefer guided prompts, run python scripts/main.py --interactive
- Step 3: If using the module directly, create and save: dmsp = creator.generate_plan(); creator.save_to_file("dmsp_output.md")
Best Practices
- Clearly define data types and estimated data volumes up front
- Map each data type to the NIH six required DMSP elements
- Select repositories early and document access, licensing, and sharing timeline
- Specify data formats, metadata standards, and vocabularies for interoperability
- Incorporate governance and oversight steps for ongoing compliance
Example Use Cases
- Draft DMSP for an NIH R01 in cancer genomics using dbGaP and GEO with a six‑month post-study sharing timeline
- Clinical study data with controlled-access metadata and a clear access procedure
- Imaging data plan following DICOM standards with FAIR metadata and repository registration
- Multi‑site microbiome project coordinating data formats, licenses, and cross‑institution access
- Pilot study releasing non-sensitive data openly with a documented usage license