MarketSync Docs

This documentation captures the exact formulas powering the arbitrage surfaces you see in MarketSync. Every calculation is deterministic and derived from venue data; as features evolve the derivations below will expand as well.

Quick Reference

MarketSync expresses every venue feed in the same vocabulary before any trading logic runs. The symbols below carry through to the calculations and the UI so that probabilities, odds, and averages always refer to the same constructs.

  • p — decimal probability (0-1) derived directly from the stored percentage quote. This is the canonical form for math utilities and simulations.
  • π — percentage representation (p × 100) that we persist in the database and surface in dashboards for readability.
  • decimal_odds — European-style payout multiple (1 ÷ p) used when comparing against sportsbooks and for translating expected value into currency terms.
  • avg — volume weighted average execution price after consuming order-book levels. The same number powers depth readouts and post-trade reconciliation.
  • threshold — break-even price for the Polymarket leg given the opposing venue's probability. Any average fill above this number destroys the arbitrage margin.
  • edge — surplus probability mass (= 100 − (awayπ + homeπ)) that converts directly into the quoted margin and expected value.

By constraining every calculation to this vocabulary we can detect regressions when new data sources deviate from the agreed semantics.

1. Arbitrage Opportunity Calculation

Detecting arbitrage requires three precise steps: normalising venue percentages, summing the complementary probabilities, and translating the leftover probability mass into a user-facing margin. Each step is mirrored across the backend utilities and the front-end readout to guard against drift.

We start by treating the stored percentages as probabilities. Dividing by 100 removes presentation bias and makes the numbers compatible with deterministic math.

p_away = away_odds / 100
p_home = home_odds / 100

With both legs expressed as probabilities, MarketSync adds the values to measure how close the pair gets to a fully hedged position. When the sum exceeds one, the opportunity disappears; when it falls short, an edge emerges.

combined = p_away + p_home

Finally, we expose the remaining probability mass as a percentage margin. This is the number displayed in the alerts feed and dashboard badges.

profit_margin = (1 - combined) * 100

The backend implementation in back-end/markets/model_utils.py repeats these exact operations after mapping venue-specific naming quirks. The duplication lets us diff any discrepancies between calculation domains and flag stale data before traders rely on it.

2. Order Book Depth & Maximum Safe Buy Size

Once an arbitrage exists we need to know how much liquidity we can safely consume. MarketSync simulates walking the Polymarket ask book, adding size level by level until the resulting average price reaches the break-even threshold implied by the opposing venue.

The pseudocode below mirrors the routine that powers the depth visualisation on the market detail screen. It accumulates size, cost, and recalculates the running average after every fill.

threshold = 1 - other_leg_probability
cumulative_size = 0
cumulative_cost = 0

for each level in order_book_asks:
    price = level.price
    size  = level.size

    new_size = cumulative_size + size
    new_cost = cumulative_cost + price * size
    new_avg  = new_cost / new_size

    if new_avg <= threshold:
        cumulative_size = new_size
        cumulative_cost = new_cost
        continue

    fill_size = (threshold * cumulative_size - cumulative_cost) / (price - threshold)
    cumulative_size += clamp(fill_size, 0, size)
    cumulative_cost += price * clamp(fill_size, 0, size)
    break

The simulation stops as soon as the average fill price would cross the threshold. At that point any additional liquidity would erase the margin generated by the opposing venue. We surface the resulting safe size, notional cost, and residual edge so traders can size their orders accordingly.

Safe Notional Snapshot

Max spend: 770.80 units(Derived from the same simulation.)

Max size: 1640.0 shares/contracts

Post-fill margin: -0.00%

3. Payout Scaling & Payout Bands

After the depth walk completes we translate the executed fills into payout bands. Internally we refer to each executed price as p_in: the first fill is p_in_best, the last fill is p_in_worst, and the size-weighted mean is p_in_avg. These values drive the min/avg/max payout display in the arbitrage calculator.

levels        = normalise(order_book.asks)
remaining     = bet_amount
total_cost    = 0
total_shares  = 0
p_in_best     = None
p_in_worst    = None

for price, size in levels:
    if remaining <= 0:
        break

    level_cost  = price * size
    fill_cost   = min(remaining, level_cost)
    fill_shares = fill_cost / price

    p_in_best  = p_in_best or price
    p_in_worst = price

    total_cost   += fill_cost
    total_shares += fill_shares
    remaining    -= fill_cost

p_in_avg   = total_cost / total_shares
max_payout = total_cost / p_in_best
avg_payout = total_shares
min_payout = total_cost / p_in_worst

4. Practical Considerations

Real trades introduce wrinkles beyond the clean math. The points below capture the operational guardrails we maintain while interpreting simulation output.

  • Venue-specific fees. Exchange fees, funding spreads, and borrow costs are stored per venue and deducted when we reconcile real fills. The theoretical margin shown above intentionally omits fees so we can layer them in contextually.
  • Data freshness. Order book snapshots update with the same cadence as market data. If the Polymarket book is empty or stale we render an explicit warning instead of a misleading depth estimate.
  • Missing depth on peer venues. Kalshi depth is not yet streamed. Once available, we will mirror the same averaging routine to size the opposing leg and compare liquidity symmetrically.

Executing in production always pairs these calculations with venue risk checks, portfolio limits, and live settlement monitoring.

5. Model Guardrails & Utilities

Odds ingestion flows through several utilities before arbitrage math fires. These guardrails keep inputs sane and ensure that cross-venue comparisons reference the same underlying event.

  • Cross-source validation. validate_odds_with_other_source compares each venue's quote against its counterpart. If the sums land near 100% but appear inverted, we automatically flip the legs and stamp the event as inverted, preventing false positives.
  • Identity reconciliation. Helpers like match_outcome_name and ticker extractors normalise team and contract names. This guarantees that YES/NO legs map to the correct outcome objects even when data providers rename events mid-stream.
  • Dual-view parity. We recompute both probability and decimal-odds views of every market. Any drift between the representations raises an alert before payloads reach downstream services.

These utilities run on ingestion and again when alerts trigger, giving us two chances to catch inconsistent data before surfacing opportunities.