lightgbm
npx machina-cli add skill G1Joshi/Agent-Skills/lightgbm --openclawLightGBM
LightGBM is Microsoft's gradient boosting library. It is often faster and uses less memory than XGBoost due to leaf-wise tree growth.
When to Use
- Huge Datasets: Optimized for efficiency.
- Ranking:
LGBMRankeris excellent for search/recommendation systems.
Core Concepts
Leaf-wise Growth
Grows the tree by splitting the leaf with max loss delta (creates deeper, unbalanced trees) vs Level-wise (balanced).
Histogram-based
Buckets continuous values into discrete bins for speed.
Best Practices (2025)
Do:
- Tune
num_leaves: The most important parameter for controlling complexity. - Use Categorical Features: Pass indexes of categorical columns directly.
Don't:
- Don't overfit: Leaf-wise growth overfits easily on small data. Limit
max_depth.
References
Source
git clone https://github.com/G1Joshi/Agent-Skills/blob/main/skills/ai-ml/lightgbm/SKILL.mdView on GitHub Overview
LightGBM is Microsoft's gradient boosting library designed for speed and memory efficiency. It uses leaf-wise tree growth and histogram-based binning to train on large datasets quickly, and it supports ranking via LGBMRanker.
How This Skill Works
LightGBM grows trees using leaf-wise growth, selecting the leaf with the maximum loss delta to split, which can yield deeper, more accurate trees. It also bins continuous features into discrete histogram bins to speed up computations and reduce memory usage.
When to Use It
- Working with huge datasets requiring efficiency
- Building ranking models (LGBMRanker) for search/recommendation
- Need faster training and lower memory usage compared to other frameworks
- Desiring histogram-based speed improvements with discrete bins
- Leveraging categorical features by passing their indexes directly
Quick Start
- Step 1: Install LightGBM, load your dataset, identify the label and feature columns, and note categorical feature indices
- Step 2: Initialize a model (e.g., LGBMClassifier or LGBMRegressor) with key params like num_leaves and use_histogram, and specify categorical feature indices
- Step 3: Train the model on training data, apply validation/early stopping if available, and evaluate with appropriate metrics
Best Practices
- Tune num_leaves to control model complexity
- Pass indexes of categorical features directly to improve handling
- Don't overfit: leaf-wise growth can overfit on small data; limit max_depth
- Leverage histogram-based training to speed up computations
- For ranking tasks, consider using LGBMRanker to optimize order-based metrics
Example Use Cases
- Training large-scale ranking models for search results or recommendations using LGBMRanker
- Efficiently training gradient boosting on web-scale datasets with lower memory usage
- Replacing heavier boosting frameworks in pipelines to achieve faster preprocessing and training
- Deploying LightGBM models in production to reduce latency due to histogram-based binning
- Using categorical feature indexes directly to accelerate feature processing