Whitepaper

Technical notes on AITrader strategy-model methodology, portfolio construction, and research validation.

Overview

AITrader is a live, forward-only experiment: can an AI system read public market context, rank a large-cap stock universe, and produce portfolios that beat transparent benchmarks after realistic trading costs?

This whitepaper explains the mechanics behind that test. It covers how rated stocks are scored on each strategy's scale, how portfolios are built from those scores, how performance and risk are measured, and how we validate whether strategy models are producing useful cross-sectional signal rather than a lucky headline return.

The goal is not to present a polished backtest. Results are tracked live from inception, benchmarked publicly, and left in place whether they improve or deteriorate.

Rating Methodology

How ratings are produced, how portfolios are built from them, and how we rank models and portfolios.

How it works

Universe selection

Each strategy evaluates every member of its declared index universe on its rebalance cadence. The universe is chosen for liquidity and breadth so the AI has enough cross-sectional comparisons to surface real signal.

AI scoring

Each strategy defines its own inputs, horizon, rating format, and ordering key. For example, AIT-1 Daneel uses web-backed analysis of recent market context, an integer score band, a peer-relative stance, and a 0-1 latent rank; future strategies may use different scales, sources, or calibration rules.

Portfolio selection

Stocks are ranked by each strategy's published sort key (models may expose a continuous latent rank or other ranking signal, where higher = more attractive). Portfolio settings then determine how which top-ranked stocks to add to the portfolio, and how to weight them (equal or cap weight).

Cost deduction

Every rebalance, we compute portfolio turnover. Each strategy declares a transaction cost in basis points; we deduct it per unit of turnover from the gross return at every rebalance. This keeps results grounded in what you would actually earn after trading. Returns shown are pre-tax.

Model ranking

On the performance page, we rank strategy models with a composite score. It favors models whose portfolios work broadly (not just one lucky pick) and whose risk-adjusted results hold up across the middle and top of the portfolio set.

Each ingredient is scaled across strategy models (min–max normalization) and combined with the weights below. Higher is better after normalization. The score blends three components:

Breadth

Share of eligible portfolios with positive total return since inception

50%

Median Sharpe

Median risk-adjusted weekly return across eligible portfolios

30%

Best Sharpe

Highest Sharpe among eligible portfolios

20%

Breadth keeps a model from ranking first on a single outlier portfolio, median Sharpe captures typical risk-adjusted quality, and best Sharpe rewards a strong top end without letting one portfolio dominate the headline.

Only portfolios with a ready composite rank feed these inputs (same eligibility as the per-model portfolio list). Models with no eligible portfolios still appear in the list using fallback metrics.

Portfolio ranking

Explore all portfolios

The AI model produces scores and ranks for every stock in its universe. How you turn that into a portfolio is configurable: six risk levels (different top-N cuts), four rebalance cadences (weekly, monthly, quarterly, yearly), and equal vs. cap weighting.

We rank portfolios with a composite score so order reflects both how money grew (total return and benchmark-relative return) and how you got there (risk-adjusted return, week-to-week steadiness vs the benchmark, and drawdown depth).

Each metric is scaled across this model's portfolios (min–max normalization) and combined with the weights below. Rank is not “highest ending dollar wins” — it rewards strong outcomes alongside discipline. The score blends five components:

Sharpe ratio

Weekly MTM risk-adjusted return (see Sharpe section)

30%

Total return

Cumulative net return from inception capital

35%

Consistency

% of weeks beating the strategy's benchmark that week

15%

Max drawdown

Shallower losses score higher

10%

vs benchmark

Portfolio total return minus the strategy's benchmark over the same dates

10%

Total return and benchmark-relative return capture realized growth; Sharpe, consistency, and drawdown down-rank portfolios that only looked good from one lucky stretch or extreme risk-taking. CAGR is shown on portfolios but is not part of the composite — over short windows annualization can be noisy, and total return plus benchmark-relative return already capture growth.

Portfolios need at least 2 weeks of data to be ranked; those with fewer observations show a "building track record" status. Composite rank appears only when all five inputs are finite: Sharpe, total return, consistency, max drawdown, and excess return vs benchmark.

Measuring Performance

Formulas and definitions for headline performance statistics, risk, costs, and research validation plots.

Performance metrics

Total return is calculated from inception capital:total_return = (ending_equity / starting_capital) − 1

CAGR annualizes growth over elapsed calendar time:CAGR = (ending_equity / starting_capital)^(1 / years_elapsed) − 1

Max drawdown measures the worst peak-to-trough decline in the net equity curve:max_drawdown = min_t ((equity_t / running_peak_t) − 1). It is reported as a negative decimal; values closer to 0 are better.

Consistency measures weekly steadiness versus the strategy's benchmark:consistency = #weeks(portfolio_wow ≥ benchmark_wow) / #weeks_compared, where weekly returns come from the mark-to-market path.

vs benchmark (excess) is benchmark-relative outcome over the same date range:excess_vs_benchmark = portfolio_total_return − benchmark_total_return.

Each strategy uses a fixed starting capital for both the strategy series and its benchmark series. This keeps the model page and performance page consistent; the actual figure appears on the model page.

Readiness gates: Sharpe needs at least 8 observations, CAGR is hidden until about 12 weeks, and composite rank requires all five ranking inputs to be finite.

Sharpe ratio: decision-cadence vs weekly MTM

Holding-period Sharpe (weekly MTM)

This is the headline Sharpe on performance pages and the one that feeds composite ranking.

Inputs: ISO-week closes from the daily mark-to-market equity series
Returns: week-over-week simple returns
Annualization: mean / std × √52 (no risk-free rate)
Weekly MTM is sampled at ISO-week closes regardless of rebalance cadence
Use when: comparing portfolios across different rebalance cadences

Decision-cadence Sharpe

Treats each completed rebalance period as one independent bet.

Inputs: net return at each rebalance observation
Cadence: weekly / monthly / quarterly / yearly by portfolio setting
Annualization: mean / std × √periodsPerYear (52 / 12 / 4 / 1, respectively)
Use when: evaluating decision quality on the portfolio's own schedule

Both versions require at least 8 observations before showing a value; between 8 and about 12 observations they are shown as early estimates. We use naive Sharpe (no risk-free-rate subtraction), and the UI treats values at or above 1 as good. Weekly MTM Sharpe makes portfolios comparable across cadences, while decision-cadence Sharpe is the textbook i.i.d.-returns view for the portfolio's true decision horizon. Showing both avoids hiding cadence-specific tradeoffs.

Turnover & costs

Turnover measures how much the portfolio changes at each rebalance. Rebalances run on the configured cadence (weekly, monthly, quarterly, or yearly), not necessarily every week. Formally:

turnover = ½ × Σ|new_weight − old_weight|

Net return uses multiplicative cost deduction at rebalance:

gross_return = Σ weight_i × (price_i_now / price_i_prev − 1)
transaction_cost = turnover × (transaction_cost_bps / 10_000)
net_factor = (1 + gross_return) × (1 − transaction_cost)
net_return = net_factor − 1

Entry run is treated as a full buy-in: turnover = 1, gross_return = 0, so net_return = −transaction_cost. On non-rebalance dates (for monthly/quarterly/yearly portfolios), turnover stays 0 and only mark-to-market gross return contributes.

A full replacement of all stocks gives turnover = 1.0. Lower turnover means more of the prior portfolio carried forward; higher turnover means rankings or portfolio settings changed enough to require more trading.

Each strategy's transaction_cost_bps value, declared on its model page, is a conservative assumption covering bid-ask spread and market impact for the asset class it trades.

Quintile analysis

We validate ranking quality in two complementary ways: a continuous regression and a discrete quintile sort. Regression asks whether the signal exists; quintiles ask whether it is usable in portfolio construction.

Every stock in the strategy's universe is sorted by latent rank and split into 5 equal quintile groups (Q1 = lowest rated, Q5 = highest rated). We then compute the average forward return over the next rebalance window for each quintile.

On the performance page, Weekly is the primary view and Monthly-smoothed is just a calendar-month average of those same period-level forward-return snapshots (not a separate horizon test).

avg_forward_return[q] = mean_over_stocks_in_q(price_next_period / price_this_period − 1)

A monotonically increasing pattern (Q1 < Q2 < Q3 < Q4 < Q5) indicates the model has genuine cross-sectional predictive signal — not just luck in the top picks.

We also track 4-week non-overlapping quintile returns, computed on a formation-to-realization basis every 4 weeks.

Stocks without a latent rank for a given week are dropped from that week's bucketing entirely. Only when the model errored for a name do we impute a neutral rank of 0.5, which tends to place those names in the middle bucket (Q3).

The Q5 win rate is the fraction of weeks where Q5 outperformed Q1. Above 50% means the AI's top picks outperformed its bottom picks more often than not.

Regression

Each evaluation period, we pair every stock's score with its forward return and fit a straight line:

forward_return = α + β × score

This is a cross-sectional regression — not tracking one stock over time, but comparing many stocks against each other at the same point in time. AI score on the x-axis, forward return on the y-axis, best-fit line through one point per stock in the universe. If the line slopes up (β > 0), higher-rated stocks tend to outperform.

β (Beta) — does the signal work?

How much return increases per 1-point increase in score. This is the core signal metric — if beta isn't positive, nothing else matters.

β > 0 → higher scores → higher returns (working)
β ≈ 0 → no relationship
β < 0 → signal is inverted

Example: β = 0.002 → a 5-point score gap implies ~+1% return spread.

Good: any positive value

Strong: > 0.002

Cross-sectional equity literature often treats ~0.002 per score point as economically meaningful (rough Fama-MacBeth-style guide, not a universal cutoff).

Illustrative examples — synthetic data, not live results

Positive β

Higher AI scores tend to go with higher next-week returns — the relationship you want.

Negative β

Higher scores pair with lower returns — the signal is inverted or noise-dominated that week.

Same axes in each panel: score (−5 to +5) vs next-week return. Slopes are exaggerated for clarity.

R² — how much does it explain?

The percentage of differences between stock returns explained by the AI score alone. Even small values matter — stock returns are dominated by noise (company-specific events, random fluctuations), and no single signal explains most of the variation.

Baseline: 0.00 (no signal)

Meaningful: 0.01 – 0.05

Exceptional: > 0.05

Single-factor stock-return regressions are noisy; 1–5% is commonly considered meaningful, while >5% is unusual.

α (Alpha) — market context

The average return across all stocks that week. Positive means the market was broadly up; negative means down. This is background context, not a measure of model quality.

How to read results together

β positive + some R² → signal is working

β ≈ 0 → no edge

β negative → inverted signal

This test isolates the pure ranking ability of the model — it ignores portfolio portfolio, position sizing, and trading strategy. It answers only: “if I rank stocks by score, do the higher-ranked ones actually outperform?”

Quintile vs. regression

Regression (β, R²)

Uses every data point exactly as-is. Higher scores are treated as stronger than lower scores. Fits one line across all stocks.

Measures true signal strength
Detects subtle, continuous relationships
More statistically efficient
Can be skewed by outliers

Think of it as: “Is there a real relationship?”

Quintiles (Q1–Q5)

Throws away precision and groups stocks into 5 buckets. Stronger-rated stocks land in the top bucket; weaker-rated stocks land in the bottom bucket. Then compares: did the top outperform the bottom?

Measures practical portfolio outcome
Very intuitive — “did the best outperform the worst?”
Robust to noise and outliers
Ignores granularity within buckets

Think of it as: “Can I make money from ranking?”

When they disagree

β positive, quintiles weak: Signal exists but is too noisy to cleanly separate buckets.

Quintiles strong, β weak: Signal may be nonlinear — only the extremes matter. Regression underestimates it.

Bottom line

Regression = signal detection (continuous)

Quintiles = strategy outcome (discrete)

You want β > 0 consistently and Q5 > Q1 consistently. If both align, the signal is strong and reliable. If only one works, investigate further.

Scientific grounding

Why we are taking this approach. AITrader is built on findings from two peer-reviewed papers in Finance Research Letters: Pelster & Val (2024) showed that an AI rating stocks on a relative attractiveness scale can produce signal that survives even in negative-return regimes; Ko & Lee (2024) extended this from individual ratings to full portfolios across asset classes.

Our first deployed strategy, AIT-1 Daneel, was built directly from these methodologies: the same forward-only experiment philosophy, the same relative-scoring idea, and the same OLS + quintile validation framework. See the AIT-1 model page for the full paper cards, AIT-1's specific universe and score scale, and the alignment notes between paper and implementation.

Future strategies inherit the same validation framework but may run on different universes, lookback windows, and score scales. Each strategy's model page documents its specific design choices and any additional research it builds on.

See experiment performance

All models Start for free