source-coding
npx machina-cli add skill parcadei/Continuous-Claude-v3/source-coding --openclawSource Coding
When to Use
Use this skill when working on source-coding problems in information theory.
Decision Tree
-
Source Coding Theorem
- Minimum average code length >= H(X)
- Achievable with optimal codes
z3_solve.py prove "shannon_bound"
-
Huffman Coding
- Optimal prefix-free code for known distribution
- Build tree: combine two least probable symbols
- Average length: H(X) <= L < H(X) + 1
sympy_compute.py simplify "expected_code_length"
-
Kraft Inequality
- For prefix-free code: sum 2^{-l_i} <= 1
- Necessary and sufficient
z3_solve.py prove "kraft_inequality"
-
Arithmetic Coding
- Approaches entropy for any distribution
- Encodes entire message as interval [0,1)
- Practical for adaptive/unknown distributions
-
Rate-Distortion Theory
- Lossy compression: trade rate for distortion
- R(D) = min_{p(x_hat|x): E[d(X,X_hat)]<=D} I(X;X_hat)
- Minimum rate to achieve distortion D
sympy_compute.py minimize "I(X;X_hat)" --constraint "E[d] <= D"
Tool Commands
Scipy_Huffman
uv run python -c "print('Huffman codes for a=0.5, b=0.25, c=0.125, d=0.125: a=0, b=10, c=110, d=111')"
Sympy_Kraft
uv run python -m runtime.harness scripts/sympy_compute.py simplify "2**(-l1) + 2**(-l2) + 2**(-l3) + 2**(-l4)"
Z3_Shannon_Bound
uv run python -m runtime.harness scripts/z3_solve.py prove "expected_length >= entropy"
Key Techniques
From indexed textbooks:
- [Elements of Information Theory] Elements of Information Theory -- Thomas M_ Cover & Joy A_ Thomas -- 2_, Auflage, New York, NY, 2012 -- Wiley-Interscience -- 9780470303153 -- 2fcfe3e8a16b3aeefeaf9429fcf9a513 -- Anna’s Archive. The Shannon–Fano–Elias coding procedure can also be applied to sequences of random variables. The key idea is to use the cumulative distribution function of the sequence, expressed to the appropriate accuracy, as a code for the sequence.
- [Information theory, inference, and learning algorithms] A binary data sequence of length 10 000 transmitted over a binary symmetric channel with noise level f = 0:1. Dilbert image Copyright c Syndicate, Inc. The physical solution is to improve the physical characteristics of the commu- nication channel to reduce its error probability.
- [Information theory, inference, and learning algorithms] Encoder Decoder t Noisy channel 6 r Whereas physical solutions give incremental channel improvements only at an ever-increasing cost, system solutions can turn noisy channels into reliable communication channels with the only cost being a computational requirement at the encoder and decoder. Coding theory is concerned with the creation of practical encoding and We now consider examples of encoding and decoding systems. What is the simplest way to add useful redundancy to a transmission?
Cognitive Tools Reference
See .claude/skills/math-mode/SKILL.md for full tool documentation.
Source
git clone https://github.com/parcadei/Continuous-Claude-v3/blob/main/.claude/skills/math/information-theory/source-coding/SKILL.mdView on GitHub Overview
This skill consolidates core source-coding approaches in information theory, including the Source Coding Theorem, Huffman coding, Kraft inequality, arithmetic coding, and rate-distortion theory. It guides when to apply each method and how to verify results with practical tool commands.
How This Skill Works
You learn the theoretical bounds and optimal coding strategies, then apply practical steps to verify and implement them. The framework covers choosing between prefix-free codes, interval coding for unknown distributions, and distortion-aware schemes, with guidance on validation using specific tool commands.
When to Use It
- When you need to apply the Source Coding Theorem to bound the minimum average code length.
- When you want to build an optimal prefix-free code for a known distribution using Huffman coding.
- When you must verify a proposed code satisfies prefix-free constraints via Kraft inequality.
- When using arithmetic coding for adaptive or unknown distributions.
- When exploring rate-distortion theory to balance rate and distortion in lossy compression.
Quick Start
- Step 1: Choose the coding approach based on the problem: known distribution suggests Huffman; adaptive/unknown distributions suggest arithmetic coding; for lossy tasks consider rate-distortion.
- Step 2: Run the relevant verifications and calculations with the provided tools to verify bounds or compute lengths (examples include entropy comparisons, Kraft checks, and Huffman length calculations).
- Step 3: Compare results to the theoretical limits (entropy, Kraft bound, distortion targets) and select the final code construction.
Best Practices
- Compute H(X) and compare it to the code's average length L to ensure L ≥ H(X).
- Use Huffman coding for known distributions to achieve L with H(X) ≤ L < H(X) + 1.
- Always verify Kraft inequality sum 2^{-l_i} ≤ 1 for prefix-free codes.
- Consider arithmetic coding when the source distribution is adaptive or unknown.
- Apply rate-distortion theory to minimize I(X; X_hat) under a distortion constraint D and compute R(D).
Example Use Cases
- Design Huffman codes for a known distribution such as probabilities 0.5, 0.25, 0.125, 0.125 and verify the expected code length.
- Verify a proposed code satisfies prefix-freeness by checking Kraft inequality for lengths l1, l2, l3, l4.
- Apply arithmetic coding to an adaptive source to approach entropy without a fixed distribution.
- Perform a rate-distortion optimization to determine the minimum rate required for a target distortion D.
- Use a solver to prove the Shannon bound that the expected length cannot be smaller than the source entropy.