What is entropy in information theory?

Entropy is a measure of uncertainty: discrete H(X) = -∑ p(x) log2 p(x); continuous h(X) = -∫ f(x) log f(x) dx.

How do I compute conditional and joint entropy?

Use H(X,Y) = H(X) + H(Y|X) and H(Y|X) = H(X,Y) - H(X); H(Y|X) ≤ H(Y), with equality iff X and Y are independent.

Can differential entropy be negative?

Yes, unlike discrete entropy, differential entropy h(X) can be negative for some continuous distributions.

entropy

npx machina-cli add skill parcadei/Continuous-Claude-v3/entropy --openclaw

Files (1)

SKILL.md

2.1 KB

Entropy

When to Use

Use this skill when working on entropy problems in information theory.

Decision Tree

Shannon Entropy
- H(X) = -sum p(x) log2 p(x)
- Maximum for uniform distribution: H_max = log2(n)
- Minimum = 0 for deterministic (one outcome certain)
- scipy.stats.entropy(p, base=2) for discrete
Entropy Properties
- Non-negative: H(X) >= 0
- Concave in p
- Chain rule: H(X,Y) = H(X) + H(Y|X)
- z3_solve.py prove "entropy_nonnegative"
Joint and Conditional Entropy
- H(X,Y) = -sum sum p(x,y) log2 p(x,y)
- H(Y|X) = H(X,Y) - H(X)
- H(Y|X) <= H(Y) with equality iff independent
Differential Entropy (Continuous)
- h(X) = -integral f(x) log f(x) dx
- Can be negative!
- Gaussian: h(X) = 0.5 * log2(2pie*sigma^2)
- sympy_compute.py integrate "-f(x)*log(f(x))" --var x
Maximum Entropy Principle
- Given constraints, max entropy distribution is least biased
- Uniform for no constraints
- Exponential for E[X] = mu constraint
- Gaussian for E[X], Var[X] constraints

Tool Commands

Scipy_Entropy

uv run python -c "from scipy.stats import entropy; p = [0.25, 0.25, 0.25, 0.25]; H = entropy(p, base=2); print('Entropy:', H, 'bits')"

Scipy_Kl_Div

uv run python -c "from scipy.stats import entropy; p = [0.5, 0.5]; q = [0.9, 0.1]; kl = entropy(p, q); print('KL divergence:', kl)"

Sympy_Entropy

uv run python -m runtime.harness scripts/sympy_compute.py simplify "-p*log(p, 2) - (1-p)*log(1-p, 2)"

Key Techniques

From indexed textbooks:

[Elements of Information Theory] Elements of Information Theory -- Thomas M_ Cover & Joy A_ Thomas -- 2_, Auflage, New York, NY, 2012 -- Wiley-Interscience -- 9780470303153 -- 2fcfe3e8a16b3aeefeaf9429fcf9a513 -- Anna’s Archive. What is the channel capacity of this channel? This is the multiple-access channel solved by Liao and Ahlswede.

Cognitive Tools Reference

See .claude/skills/math-mode/SKILL.md for full tool documentation.

Source

git clone https://github.com/parcadei/Continuous-Claude-v3/blob/main/.claude/skills/math/information-theory/entropy/SKILL.md

View on GitHub

Overview

This skill teaches how to quantify uncertainty using Shannon entropy, joint and conditional entropy, and differential entropy. It covers the maximum entropy principle and practical formulas and tool commands to verify results in discrete and continuous settings.

How This Skill Works

Identify whether the variable is discrete or continuous and apply the appropriate entropy formula. For discrete: H(X) = -sum p(x) log2 p(x); for continuous: h(X) = -∫ f(x) log f(x) dx. Use the chain rule H(X,Y) = H(X) + H(Y|X) and H(Y|X) = H(X,Y) - H(X); note H(Y|X) ≤ H(Y). Differential entropy can be negative, unlike discrete entropy. Apply the maximum entropy principle to select the least biased distribution under given constraints, e.g., uniform with no constraints, exponential with a mean constraint, Gaussian with mean and variance constraints. Tool guidelines include using scipy.stats.entropy for discrete cases and symbolic or numerical tools for continuous cases.

When to Use It

Evaluating H(X) for a discrete outcome distribution.
Computing H(X,Y) and H(Y|X) for a joint distribution.
Checking entropy properties like non-negativity and concavity.
Working with differential entropy for continuous variables.
Applying the Maximum Entropy Principle under given constraints.

Quick Start

Step 1: Identify whether the variable is discrete or continuous and select the appropriate entropy formula.
Step 2: Compute H(X) (or H(X,Y), H(Y|X)) or h(X) using the given distribution or density.
Step 3: Verify results with recommended tools (SciPy, SymPy) and check properties like non-negativity and chain rule.

Best Practices

Use base-2 (bits) when computing Shannon entropy for discrete distributions.
Compute H(X) with -sum p(x) log2 p(x) and H(X,Y) with -sum_x sum_y p(x,y) log2 p(x,y).
Apply chain rule correctly: H(X,Y) = H(X) + H(Y|X). Remember H(Y|X) ≤ H(Y).
Be mindful: differential entropy h(X) can be negative; interpret with context.
Leverage the Maximum Entropy Principle: uniform without constraints; exponential with a mean; Gaussian with mean and variance constraints.

Example Use Cases

Entropy of a fair four-face die: H = log2(4) = 2 bits.
Two independent fair coins: H(X,Y) = H(X) + H(Y) = 2 bits.
Conditional entropy example: H(Y|X) reduces uncertainty when X informs Y.
Differential entropy for a Gaussian X with variance σ^2: h(X) = 0.5 log2(2πeσ^2).
Maximum entropy under constraints: uniform for no constraints; exponential for E[X] = μ; Gaussian for E[X], Var[X].

Frequently Asked Questions

Add this skill to your agents