AIT-1

AIT-1 Daneel

active
Top performing

Inception: Feb 17, 2026 · 7 weekly runs

Weekly Top-20 Nasdaq-100 portfolio: stocks ranked by AI, equal weight, rebalanced every week, with trading costs included.

Outperformance

vs Nasdaq-100 (cap-weight)

Outperformance

vs S&P 500 (cap-weight)

Beta (β)

Signal — score vs next-week return (latest week)

-0.0061

Positive means higher-rated names tended to outperform next week.

Signal strength details

Model overview

Provideropenai
Modelgpt-5.2
UniverseNASDAQ100 (all ~100 members)
Stocks rated per run100
Rating scale−5 to +5 (integer) + latent rank 0–1
Data per stockLive web search, last 30 days
Run frequencyweekly

Prompt design

Every stock is evaluated using the same structured prompt. Key instructions:

  • Scores each stock from −5 (very unattractive) to +5 (very attractive) relative to the next ~30 days of expected performance.
  • Uses a single live web search per stock to gather the latest 30 days of news, earnings, guidance, analyst revisions, and market reactions.
  • Graded on a curve against all other Nasdaq-100 members (not rated in isolation). A +3 means the stock looks meaningfully better than most of the index right now, regardless of whether the overall market is up or down.
  • Assigns a continuous latent rank (0 to 1) as a fine-grained ordinal signal. This is what drives how the portfolio is built from ratings (not the integer score directly).
  • Maps scores to buckets for transparency: buy (≥ +2), hold (−1 to +1), sell (≤ −2). Buckets are a readability layer; the actual sort is by latent rank.
  • Requires 2 to 6 explicit risks per rating. At least one must address information uncertainty, model error, or conflicting signals.
  • Tracks change from the prior week's rating. If the bucket changes, the model must explain why.

How it works

1

Universe selection

We evaluate all ~100 current members of the Nasdaq-100 every week. The Nasdaq-100 is a curated index of the largest non-financial US companies — high liquidity, broad sector coverage, and globally recognized names. This gives the AI enough diversity to surface real cross-sectional signal.

2

AI scoring

Each stock receives a live web search for the latest 30 days of news, earnings, guidance, and analyst revisions. The AI scores it from −5 to +5 relative to the other 99 stocks — not in isolation. This cross-sectional comparison is what makes the signal useful: the AI doesn't need to predict the market, just which stocks look stronger than the rest. It also outputs a continuous latent rank (0–1) for fine-grained ordering.

3

Portfolio selection

Stocks are sorted by latent rank (highest = most attractive). Your portfolio settings determine how many top-ranked stocks to hold (Top 5 through Top 30) and how to weight them (equal or cap weight). No discretionary overrides — same inputs produce the same portfolio every rebalance.

4

Cost deduction

Every rebalance, we compute portfolio turnover (how much changed). We then deduct 15 basis points per unit of turnover from the gross return. This keeps results grounded in what you would actually earn after trading. Returns shown are pre-tax.

How we rank models

We order strategy models with a composite score so the headline reflects both how broadly the model's portfolio configs are working (not just one lucky configuration) and how strong risk-adjusted results look in the middle and at the top of the config set.

Each ingredient is scaled relative to other strategy models (min–max normalization), then combined with the weights below. Higher is better for all three after normalization.

The score blends three dimensions:

Breadth

Share of eligible configs with positive total return since inception

50%

Median Sharpe

Median risk-adjusted weekly return across eligible configs

30%

Best Sharpe

Highest Sharpe among eligible configs

20%

Why this mix: breadth keeps a model from ranking first on a single outlier portfolio; median Sharpe captures typical risk-adjusted quality; best Sharpe still rewards a strong top end without letting it dominate the headline.

Only portfolio configs with a ready composite rank feed these inputs (same eligibility as the per-model portfolio list). Models with no eligible configs still appear in the list using fallback metrics so the page does not break.

Methodology

Detailed technical notes on how each component is designed and measured.

Portfolios

The AI model only produces scores and ranks for every Nasdaq-100 stock. How you turn that into a portfolio is configurable: six risk levels (different top-N cuts), four rebalance cadences (weekly, monthly, quarterly, yearly), and equal vs. cap weighting.

Explore all portfolio portfolios

How we rank portfolios

We rank portfolios with a composite score so order reflects both how money grew (total return and vs the Nasdaq-100 cap-weight benchmark) and how you got there (risk-adjusted return, week-to-week steadiness vs that benchmark, and drawdown depth).

Each metric is scaled relative to other portfolios for this model (min–max normalization), then combined with the weights below. That means rank is not “highest ending dollar wins,” but it does reward strong outcomes alongside discipline.

The score blends six dimensions:

Sharpe ratio

Risk-adjusted weekly return

30%

CAGR

Annualized return from inception

25%

Consistency

% of weeks beating Nasdaq-100 (cap) that week

15%

Max drawdown

Shallower losses score higher

10%

Total return

Cumulative return vs $10k start

10%

vs Nasdaq-100 (cap)

Portfolio total return minus benchmark over the same dates

10%

Why both growth and risk: total return and benchmark-relative return keep the list aligned with what you see on portfolio cards, while Sharpe, consistency, and drawdown still down-rank configs that only looked good from one lucky stretch or extreme risk-taking.

Portfolios require at least 2 weeks of data to be ranked. Those with fewer observations are shown with a "building track record" status.

Scoring

Each stock is scored on a discrete integer scale from −5 to +5. The score reflects relative attractiveness over the next ~30 days, calibrated across the full Nasdaq-100. The AI is explicitly instructed to avoid defaulting to 0 unless information is genuinely mixed.

In addition to the integer score, the AI produces a latent rank — a continuous value between 0 and 1. The portfolio layer sorts by latent rank (highest first). This separation allows the portfolio to capture ordering signal even when two stocks share the same integer score.

Scores are calibrated relative to other Nasdaq-100 members, not in absolute isolation. A +3 means the stock looks meaningfully more attractive than most of the other 99 stocks in the index right now.

Why relative, not absolute? Think of it like grading on a curve. Predicting whether any single stock will go up or down requires guessing the overall market direction (something nobody can do reliably). But picking out which stocks look stronger compared to their peers is a more tractable problem. In a falling market, every stock might drop, but the highest-ranked ones tend to drop less. In a rising market, they tend to rise more. Pelster & Val (2024) confirmed this in a live experiment: even during a stretch when every portfolio lost money in absolute terms, the top-rated stocks still outperformed the bottom-rated ones by a statistically significant margin. The relative signal held when absolute scores would have been meaningless.

Performance metrics

Total return is calculated from inception capital:total_return = (ending_equity / starting_capital) − 1

CAGR annualizes growth over elapsed calendar time:CAGR = (ending_equity / starting_capital)^(1 / years_elapsed) − 1

We use a fixed $10,000 starting capital for strategy and benchmark series. This keeps the model page and performance page consistent.

Turnover & costs

Turnover measures how much the portfolio changes each week. Formally:

turnover = ½ × Σ|new_weight − old_weight|

A full replacement of all stocks gives turnover = 1.0. Typical weekly turnover for a Top-20 equal-weight portfolio is 0.15 to 0.35 depending on how much the ranking changes week to week.

Net return = gross return − (turnover × 15 bps). On the first run (no prior portfolio), turnover defaults to 1 (full buy-in).

15 bps per traded dollar is a conservative assumption covering both bid-ask spread and market impact for liquid large-cap stocks.

Quintile analysis

Every week, all ~100 Nasdaq-100 stocks are sorted by latent rank and split into 5 equal quintile groups (Q1 = lowest rated, Q5 = highest rated). We then compute the average 1-week forward return for each quintile.

A monotonically increasing pattern (Q1 < Q2 < Q3 < Q4 < Q5) indicates the model has genuine cross-sectional predictive signal — not just luck in the top 20 picks. This is the same methodology used in Pelster & Val (2024).

We also track 4-week non-overlapping quintile returns, computed on a formation-to-realization basis every 4 weeks.

The Q5 win rate is the fraction of weeks where Q5 outperformed Q1. Above 50% means the AI's top picks outperformed its bottom picks more often than not.

Regression

Each week, we test a single question: do higher AI scores lead to higher next-week returns? We take ~100 stocks, pair each stock's score with its next-week return, and fit a straight line:

forward_return = α + β × score

This is a cross-sectional regression — not tracking one stock over time, but comparing many stocks against each other at the same point in time. AI score on the x-axis, next-week return on the y-axis, best-fit line through ~100 points. If the line slopes up (β > 0), higher-rated stocks tend to outperform.

β (Beta) — does the signal work?

How much return increases per 1-point increase in score. This is the core signal metric — if beta isn't positive, nothing else matters.

  • β > 0 → higher scores → higher returns (working)
  • β ≈ 0 → no relationship
  • β < 0 → signal is inverted

Example: β = 0.002 → a score of +5 vs 0 implies ~+1% return spread.

Good: any positive value

Strong: > 0.002

Illustrative examples — synthetic data, not live results

Positive β

Higher AI scores tend to go with higher next-week returns — the relationship you want.

-5-3-10135AI score-6.4%-3.0%0.3%3.6%6.9%β = +0.0025

Negative β

Higher scores pair with lower returns — the signal is inverted or noise-dominated that week.

-5-3-10135AI score-7.1%-3.8%-0.4%3.0%6.4%β = −0.0025

Same axes in each panel: score (−5 to +5) vs next-week return. Slopes are exaggerated for clarity.

R² — how much does it explain?

The percentage of differences between stock returns explained by the AI score alone. Even small values matter — stock returns are dominated by noise (company-specific events, random fluctuations), and no single signal explains most of the variation.

Baseline: 0.00 (no signal)

Meaningful: 0.01 – 0.05

Exceptional: > 0.05

Literature-derived benchmarks (not custom-tuned)

The β bands above come from cross-sectional equity research: any positive slope is the minimum bar; a weekly slope around 0.002 per score point is often treated as economically meaningful in academic settings (e.g. Fama–MacBeth–style regressions) — a rough guide, not a universal cutoff.

The bands reflect how noisy individual stock returns are: a single predictor rarely explains much of the cross-section. Values in the 1–5% range are commonly cited as meaningful for one factor; above 5% is unusually strong.

α (Alpha) — market context

The average return across all stocks that week. Positive means the market was broadly up; negative means down. This is background context, not a measure of model quality.

How to read results together

β positive + some R² → signal is working

β ≈ 0 → no edge

β negative → inverted signal

This test isolates the pure ranking ability of the model — it ignores portfolio portfolio, position sizing, and trading strategy. It answers only: “if I rank stocks by score, do the higher-ranked ones actually outperform?”

Quintile vs. regression

Both tests ask the same underlying question — does score predict return? — but in fundamentally different ways.

Regression (β, R²)

Uses every data point exactly as-is. A score of +5 is treated as stronger than +3; a score of −4 is treated as worse than −1. Fits one line across all stocks.

  • Measures true signal strength
  • Detects subtle, continuous relationships
  • More statistically efficient
  • Can be skewed by outliers

Think of it as: “Is there a real relationship?”

Quintiles (Q1–Q5)

Throws away precision and groups stocks into 5 buckets. Both +5 and +3 land in “top bucket”; both −4 and −1 land in “bottom bucket.” Then compares: did the top outperform the bottom?

  • Measures practical portfolio outcome
  • Very intuitive — “did the best outperform the worst?”
  • Robust to noise and outliers
  • Ignores granularity within buckets

Think of it as: “Can I make money from ranking?”

When they disagree

β positive, quintiles weak: Signal exists but is too noisy to cleanly separate buckets.

Quintiles strong, β weak: Signal may be nonlinear — only the extremes matter. Regression underestimates it.

Bottom line

Regression = signal detection (continuous)

Quintiles = strategy outcome (discrete)

You want β > 0 consistently and Q5 > Q1 consistently. If both align, the signal is strong and reliable. If only one works, investigate further.

Scientific grounding

This strategy is inspired by two peer-reviewed papers published in Finance Research Letters. We treat their findings as a testable hypothesis and verify them live, on real market data, with no lookahead bias.

Pelster & Val (2024) — “Can ChatGPT assist in picking stocks?”

Read paper

Finance Research Letters · Primary reference

Core idea: Live experiment testing whether ChatGPT-4 with web access can rate S&P 500 stocks on a −5 to +5 relative attractiveness scale and produce ratings that predict future returns.

Why no backtest: Historical testing is invalid because ChatGPT may have been trained on future data. They run a live forward-only experiment — the same approach we use.

Setup: S&P 500 universe, ~2 months during the Q2 2023 earnings season. Each stock rated from −5 to +5 on both earnings surprise and relative attractiveness. Web search results (last ~30 days) summarized and fed into the prompt — very similar to our pipeline.

Why relative scoring matters: Ratings were explicitly framed as cross-sectional — “how attractive is this stock compared to all other S&P 500 stocks?” This is what makes the signal robust. Even during a period when every quintile portfolio had negative absolute returns, the highest-rated stocks still lost less than the lowest-rated ones (spread of +0.07%/day, t‑stat 4.35). The AI couldn't predict market direction, but it could reliably rank which stocks were relatively stronger.

Key findings:

  • AI attractiveness ratings positively correlate with future stock returns
  • Relative ranking holds even in negative-return markets
  • AI adjusts ratings in response to earnings and news in near real-time
  • Earnings forecasts add signal beyond analyst consensus

Limitations:

  • Short time period (~2 months)
  • Not a production portfolio — quintile analysis only
  • Not tested over long horizons or different market regimes

Our alignment:

  • Same live experiment approach, no backtesting
  • Same relative −5 to +5 attractiveness rating scale
  • Same live web search for recent news, earnings, and analyst data
  • Same cross-sectional quintile and OLS regression framework
  • Extended to Nasdaq-100 and automated for continuous weekly execution

Ko & Lee (2024) — “Can ChatGPT improve investment decisions?”

Read paper

Finance Research Letters · Portfolio extension

Core idea: Extended the research from individual stock ratings to building full portfolios. Asked whether ChatGPT can select assets and build diversified portfolios that outperform random selection — across stocks, bonds, commodities, and more.

Key findings:

  • AI-selected portfolios show statistically better diversification than random selection
  • Portfolios built from AI picks outperform random portfolios
  • AI identifies abstract relationships between assets across different classes
  • Demonstrates AI potential as a co-pilot for portfolio management decisions

Our alignment:

  • Portfolio from AI-ranked picks (Top 5 to Top 30, configurable)
  • Benchmarked against both cap-weight and equal-weight Nasdaq-100
  • Tracked live and unedited over multiple market conditions
What we add beyond the papers: A fully automated, live production system with real-time web search, versioned model portfolios, forward-only performance tracking, transparent cost modeling, and public auditability. No backtests used as marketing. No retroactive edits.