Get the FREE Ultimate OpenClaw Setup Guide →

scvi-tools

npx machina-cli add skill anthropics/knowledge-work-plugins/scvi-tools --openclaw
Files (1)
SKILL.md
7.2 KB

scvi-tools Deep Learning Skill

This skill provides guidance for deep learning-based single-cell analysis using scvi-tools, the leading framework for probabilistic models in single-cell genomics.

How to Use This Skill

  1. Identify the appropriate workflow from the model/workflow tables below
  2. Read the corresponding reference file for detailed steps and code
  3. Use scripts in scripts/ to avoid rewriting common code
  4. For installation or GPU issues, consult references/environment_setup.md
  5. For debugging, consult references/troubleshooting.md

When to Use This Skill

  • When scvi-tools, scVI, scANVI, or related models are mentioned
  • When deep learning-based batch correction or integration is needed
  • When working with multi-modal data (CITE-seq, multiome)
  • When reference mapping or label transfer is required
  • When analyzing ATAC-seq or spatial transcriptomics data
  • When learning latent representations of single-cell data

Model Selection Guide

Data TypeModelPrimary Use Case
scRNA-seqscVIUnsupervised integration, DE, imputation
scRNA-seq + labelsscANVILabel transfer, semi-supervised integration
CITE-seq (RNA+protein)totalVIMulti-modal integration, protein denoising
scATAC-seqPeakVIChromatin accessibility analysis
Multiome (RNA+ATAC)MultiVIJoint modality analysis
Spatial + scRNA referenceDestVICell type deconvolution
RNA velocityveloVITranscriptional dynamics
Cross-technologysysVISystem-level batch correction

Workflow Reference Files

WorkflowReference FileDescription
Environment Setupreferences/environment_setup.mdInstallation, GPU, version info
Data Preparationreferences/data_preparation.mdFormatting data for any model
scRNA Integrationreferences/scrna_integration.mdscVI/scANVI batch correction
ATAC-seq Analysisreferences/atac_peakvi.mdPeakVI for accessibility
CITE-seq Analysisreferences/citeseq_totalvi.mdtotalVI for protein+RNA
Multiome Analysisreferences/multiome_multivi.mdMultiVI for RNA+ATAC
Spatial Deconvolutionreferences/spatial_deconvolution.mdDestVI spatial analysis
Label Transferreferences/label_transfer.mdscANVI reference mapping
scArches Mappingreferences/scarches_mapping.mdQuery-to-reference mapping
Batch Correctionreferences/batch_correction_sysvi.mdAdvanced batch methods
RNA Velocityreferences/rna_velocity_velovi.mdveloVI dynamics
Troubleshootingreferences/troubleshooting.mdCommon issues and solutions

CLI Scripts

Modular scripts for common workflows. Chain together or modify as needed.

Pipeline Scripts

ScriptPurposeUsage
prepare_data.pyQC, filter, HVG selectionpython scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch
train_model.pyTrain any scvi-tools modelpython scripts/train_model.py prepared.h5ad results/ --model scvi
cluster_embed.pyNeighbors, UMAP, Leidenpython scripts/cluster_embed.py adata.h5ad results/
differential_expression.pyDE analysispython scripts/differential_expression.py model/ adata.h5ad de.csv --groupby leiden
transfer_labels.pyLabel transfer with scANVIpython scripts/transfer_labels.py ref_model/ query.h5ad results/
integrate_datasets.pyMulti-dataset integrationpython scripts/integrate_datasets.py results/ data1.h5ad data2.h5ad
validate_adata.pyCheck data compatibilitypython scripts/validate_adata.py data.h5ad --batch-key batch

Example Workflow

# 1. Validate input data
python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest

# 2. Prepare data (QC, HVG selection)
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000

# 3. Train model
python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch

# 4. Cluster and visualize
python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8

# 5. Differential expression
python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden

Python Utilities

The scripts/model_utils.py provides importable functions for custom workflows:

FunctionPurpose
prepare_adata()Data preparation (QC, HVG, layer setup)
train_scvi()Train scVI or scANVI
evaluate_integration()Compute integration metrics
get_marker_genes()Extract DE markers
save_results()Save model, data, plots
auto_select_model()Suggest best model
quick_clustering()Neighbors + UMAP + Leiden

Critical Requirements

  1. Raw counts required: scvi-tools models require integer count data

    adata.layers["counts"] = adata.X.copy()  # Before normalization
    scvi.model.SCVI.setup_anndata(adata, layer="counts")
    
  2. HVG selection: Use 2000-4000 highly variable genes

    sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3")
    adata = adata[:, adata.var['highly_variable']].copy()
    
  3. Batch information: Specify batch_key for integration

    scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")
    

Quick Decision Tree

Need to integrate scRNA-seq data?
├── Have cell type labels? → scANVI (references/label_transfer.md)
└── No labels? → scVI (references/scrna_integration.md)

Have multi-modal data?
├── CITE-seq (RNA + protein)? → totalVI (references/citeseq_totalvi.md)
├── Multiome (RNA + ATAC)? → MultiVI (references/multiome_multivi.md)
└── scATAC-seq only? → PeakVI (references/atac_peakvi.md)

Have spatial data?
└── Need cell type deconvolution? → DestVI (references/spatial_deconvolution.md)

Have pre-trained reference model?
└── Map query to reference? → scArches (references/scarches_mapping.md)

Need RNA velocity?
└── veloVI (references/rna_velocity_velovi.md)

Strong cross-technology batch effects?
└── sysVI (references/batch_correction_sysvi.md)

Key Resources

Source

git clone https://github.com/anthropics/knowledge-work-plugins/blob/main/bio-research/skills/scvi-tools/SKILL.mdView on GitHub

Overview

scvi-tools provides probabilistic deep-learning models for single-cell genomics, enabling data integration, batch correction, and multi-modal analysis. It covers scVI, scANVI, PeakVI, totalVI, MultiVI, DestVI, veloVI, and more, making it a go-to framework for modern single-cell workflows.

How This Skill Works

The framework uses variational autoencoders to learn a compact latent representation of cells, correcting for batch effects and enabling joint analysis across modalities. It provides modular models, workflow references, and CLI scripts to run end-to-end pipelines without rewriting core code.

When to Use It

  • When scvi-tools, scVI, scANVI, or related models are mentioned in the project
  • When deep learning-based batch correction or integration is required
  • When working with multi-modal data (CITE-seq, multiome)
  • When reference mapping or label transfer is needed
  • When analyzing ATAC-seq or spatial transcriptomics data

Quick Start

  1. Step 1: Identify the workflow from the model/workflow tables in the Skill
  2. Step 2: Open the corresponding reference file in references/ for detailed steps
  3. Step 3: Run the modular scripts, e.g., python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch && python scripts/train_model.py prepared.h5ad results/ --model scvi

Best Practices

  • Select the correct model for your data type (scVI for scRNA-seq, PeakVI for ATAC, totalVI for RNA+protein, etc.)
  • Consult the environment_setup.md for installation and GPU compatibility to avoid runtime issues
  • Use the provided workflow reference files instead of rewriting pipelines
  • Prepare data with the Data Preparation workflow before training
  • Validate results with appropriate QC plots and downstream analyses (deconvolution, velocity, etc.)

Example Use Cases

  • Batch-corrected integration of scRNA-seq datasets across studies using scVI/scANVI
  • Label transfer from a reference atlas to a query dataset with scANVI or scArches
  • CITE-seq multi-modal analysis (RNA+protein) using totalVI
  • Joint RNA+ATAC analysis in multiome data with MultiVI
  • Spatial transcriptomics deconvolution with DestVI to infer cell-type composition per spot

Frequently Asked Questions

Add this skill to your agents
Sponsor this space

Reach thousands of developers