Market Analysis using Kalman Paris

From Noisy Songs to Stock Profits: How I Taught My Computer to “Hear” the Market

Turning signal processing math into a market-neutral trading bot using Cointegration, Kalman Filters, and Mean Reversion

Introduction

Ever tried listening to your favorite song while someone is vacuuming right next to you? You can still catch the melody. Your brain filters out the noise and focus on what matters. That idea became the foundation of this project. In my previous post on MFCCs, I explored how computers extract human voice signals from noisy audio. But while looking at a stock price chart one night, something clicked in my mind

The "Wait, What?" Moment

Price charts look exactly like sound waves.

Random spikes. Hidden patterns. Underlying rhythm buried in chaos.

So, What if we treated financial data like audio signals?

Filter the noise, extract the structure and then trade the pattern.

This blog walks through the full system I built from theory to Python implementation.

Signal Processing: Audio vs Finance

At their core, both domains deal with signals over time.

AUDIO SIGNAL PROCESSING        FINANCIAL SIGNAL PROCESSING
-----------------------        ---------------------------
Raw sound waves                Raw price data
Noise filtering                Market noise filtering
Feature extraction             Spread extraction
Voice detection                Trading signals

Same mathematics but different application.

Step 1 — Cointegration: The Drunk Guy and His Dog

Imagine:

A drunk guy stumbling home
His dog running around
Both connected by a leash

Individually → random movement
Together → constrained distance

That leash is cointegration.

In markets, some assets move together long-term because of economic linkage.

Examples:

Gold ↔ Silver
Coke ↔ Pepsi
Bitcoin ↔ Ethereum

We don't bet on where Gold is going.
We bet on the leash stretching too far and snapping back.

We trade distance between them.

When the leash stretches too far → we bet on snap back.

Step 2 — Mean Reversion: The Rubber Band Model

Normal stock prices behave like balloons — they drift.

But spreads between cointegrated assets behave like rubber bands.

They revert to a long-term average.

This behavior is modeled by the Ornstein-Uhlenbeck Process.

Key Parameters

Dial	What It Means	Real-World Analogy
θ (theta) : Mean Reversion Speed	How fast the rubber band snaps back	Tight rubber band = fast snap. Loose = slow
μ (mu) : Long-Term Mean	Where the ball "wants" to be	The resting point. Home base
σ (sigma) : Volatility	How wildly it bounces around	Calm day = small bounces. Crazy day = wild shaking

High θ? The spread snaps back quickly → More trading opportunities → More profit.
High σ? Lots of bouncing → Noisier signal → Harder to trade.

This technique measures these dials in real-time and decides is this rubber band stretched enough to bet on?

Step 3 — Z-Score: The Stretch Meter

We need a way to measure how “abnormal” the spread is.

That’s where Z-Score comes in.

Interpretation:

   THE Z-SCORE THERMOMETER

    +3.0  ┃  🔴 EXTREMELY EXPENSIVE (rare!)
    +2.0  ┃  🔴 Very expensive
    +1.5  ┃  🟡 ──── SELL SIGNAL ──── ←── Robot screams "SELL!"
    +1.0  ┃  
     0.0  ┃  🟢 Normal. Chill. Do nothing.
    -1.0  ┃  
    -1.5  ┃  🟡 ──── BUY SIGNAL ──── ←── Robot screams "BUY!"
    -2.0  ┃  🔵 Very cheap
    -3.0  ┃  🔵 EXTREMELY CHEAP (rare!)

Z-Score near 0: The leash is relaxed. Nothing interesting. Go get coffee.
Z-Score hits +1.5: Gold is WAY too expensive relative to Silver. SELL GOLD.
Z-Score hits -1.5: Gold is WAY too cheap relative to Silver. BUY GOLD.
Z-Score comes back to 0: Close the trade. Pocket the difference.

**Step 4: The Robot's Brain That Learns (Kalman Filter)**

Here's where most trading bots mess up.

Old-school bots calculate the relationship between Gold and Silver ONCE and assume it never changes. Like measuring your shoe size at age 5 and buying the same shoes forever.

Spoiler: your feet grow. Markets change.

Sometimes Gold and Silver move in lockstep. Sometimes a financial crisis hits and they stop caring about each other entirely. The "leash length" (called beta, or β) is constantly shifting.

Predicts current relationship
Compares with new data
Measures error
Updates belief

Step 5 — Full Trading Pipeline

Download prices
      ↓
Log transform
      ↓
Kalman Filter → dynamic β
      ↓
Compute spread
      ↓
Compute Z-Score
      ↓
Generate signals
      ↓
Execute trades

Auto-Tune	Kalman Filter
Listens to the singer's voice	Listens to market prices
Detects if the pitch is off	Detects if β has changed
Adjusts pitch in real-time	Adjusts β in real-time
Singer sounds perfect	Robot trades perfectly

Kalman Filter

def run_kalman_filter(x, y):
    n = len(x)
    delta = 0.0001
    Vw = delta / (1 - delta) * np.eye(2)
    Ve = 0.001

    theta = np.zeros((2, n))
    P = np.zeros((2, 2, n))
    theta[:, 0] = [0, 0]
    P[:, :, 0] = np.eye(2) * 1000

    for t in range(1, n):
        theta_pred = theta[:, t-1]
        R = P[:, :, t-1] + Vw

        H = np.array([x.iloc[t], 1.0])
        y_pred = np.dot(H, theta_pred)
        error = y.iloc[t] - y_pred

        Q = np.dot(np.dot(H, R), H.T) + Ve
        K = np.dot(R, H.T) / Q
        theta[:, t] = theta_pred + K * error
        P[:, :, t] = R - np.outer(K * Q, K)

    return theta

state = run_kalman_filter(x, y)
dynamic_beta = state[0, :]

Typical observations:

Assets move together → cointegration confirmed
β changes over time → Kalman adapts
Z-Score oscillates → mean reversion visible
Trades cluster at extremes
Equity curve trends upward in stable regimes

This is a market-neutral strategy.

Mathematical Foundation

Ornstein-Uhlenbeck SDE: is a stochastic, continuous-time process that models mean-reverting behavior, where a variable tends to drift towards a long-term mean over time.

dS = θ(μ − S)dt + σdW

Where:

S → Spread
θ → Reversion speed
μ → Mean
σ → Volatility
dW → Brownian noise

Rubber band force vs random shocks.

State-Space Model: A state-space model in finance is a powerful mathematical framework for analyzing complex systems with unobservable (latent) factors, using a state equation (how hidden factors evolve, e.g., market sentiment, volatility) and an observation equation (linking these states to actual data like prices).

Observation:

y(t) = β(t)x(t) + α(t) + ε(t)

Transition:

β(t) = β(t−1) + η(t)

OBSERVATION EQUATION:
    y(t) = β(t) × x(t) + α(t) + ε(t)

    Translation: "Silver price = β × Gold price + intercept + noise"

STATE TRANSITION:
    β(t) = β(t-1) + η(t)

    Translation: "Today's β is yesterday's β plus some small change"

KALMAN GAIN (K):
    K = (Predicted Uncertainty) / (Predicted Uncertainty + Measurement Noise)

    Translation: 
    - If we're very uncertain → K is large → Trust the new data more
    - If we're very certain  → K is small → Stick with our prediction

Kalman Gain balances prediction vs measurement trust.

✨ The Takeaway

"The market isn't chaotic. It's a song with too much static. My job wasn't to predict the next note it was to hear the rhythm underneath."

Next time it rains, listen to the drops hitting your window. There's a rhythm in there fast, slow, fast again. That pattern? That's a mean-reverting process. You've been hearing them your whole life.

You're already a quant. You just didn't know it yet.

Command Palette