LightGBM is Microsoft's gradient boosting library optimized for speed and memory efficiency.

What problems is LightGBM good for?

LightGBM excels with huge datasets and includes ranking support via LGBMRanker for search/recommendation tasks.

What is leaf-wise growth?

Leaf-wise growth splits the leaf with the maximum loss delta, potentially creating deeper trees and higher accuracy if properly regularized.

lightgbm

npx machina-cli add skill G1Joshi/Agent-Skills/lightgbm --openclaw

Files (1)

SKILL.md

993 B

LightGBM

LightGBM is Microsoft's gradient boosting library. It is often faster and uses less memory than XGBoost due to leaf-wise tree growth.

When to Use

Huge Datasets: Optimized for efficiency.
Ranking: LGBMRanker is excellent for search/recommendation systems.

Core Concepts

Leaf-wise Growth

Grows the tree by splitting the leaf with max loss delta (creates deeper, unbalanced trees) vs Level-wise (balanced).

Histogram-based

Buckets continuous values into discrete bins for speed.

Best Practices (2025)

Do:

Tune num_leaves: The most important parameter for controlling complexity.
Use Categorical Features: Pass indexes of categorical columns directly.

Don't:

Don't overfit: Leaf-wise growth overfits easily on small data. Limit max_depth.

References

LightGBM Documentation

Source

git clone https://github.com/G1Joshi/Agent-Skills/blob/main/skills/ai-ml/lightgbm/SKILL.mdView on GitHub

Overview

LightGBM is Microsoft's gradient boosting library designed for speed and memory efficiency. It uses leaf-wise tree growth and histogram-based binning to train on large datasets quickly, and it supports ranking via LGBMRanker.

How This Skill Works

LightGBM grows trees using leaf-wise growth, selecting the leaf with the maximum loss delta to split, which can yield deeper, more accurate trees. It also bins continuous features into discrete histogram bins to speed up computations and reduce memory usage.

When to Use It

Working with huge datasets requiring efficiency
Building ranking models (LGBMRanker) for search/recommendation
Need faster training and lower memory usage compared to other frameworks
Desiring histogram-based speed improvements with discrete bins
Leveraging categorical features by passing their indexes directly

Quick Start

Step 1: Install LightGBM, load your dataset, identify the label and feature columns, and note categorical feature indices
Step 2: Initialize a model (e.g., LGBMClassifier or LGBMRegressor) with key params like num_leaves and use_histogram, and specify categorical feature indices
Step 3: Train the model on training data, apply validation/early stopping if available, and evaluate with appropriate metrics

Best Practices

Tune num_leaves to control model complexity
Pass indexes of categorical features directly to improve handling
Don't overfit: leaf-wise growth can overfit on small data; limit max_depth
Leverage histogram-based training to speed up computations
For ranking tasks, consider using LGBMRanker to optimize order-based metrics

Example Use Cases

Training large-scale ranking models for search results or recommendations using LGBMRanker
Efficiently training gradient boosting on web-scale datasets with lower memory usage
Replacing heavier boosting frameworks in pipelines to achieve faster preprocessing and training
Deploying LightGBM models in production to reduce latency due to histogram-based binning
Using categorical feature indexes directly to accelerate feature processing

Frequently Asked Questions

Add this skill to your agents