Weekly Top-20 Nasdaq-100 portfolio: stocks ranked by AI, equal weight, rebalanced every week, with trading costs included.
Outperformance
vs Nasdaq-100 (cap-weight)
Outperformance
vs S&P 500 (cap-weight)
Beta (β)
Signal — score vs next-week return (latest week)
Model overview
Prompt design
Every stock is evaluated using the same structured prompt. Key instructions:
- Scores each stock from −5 (very unattractive) to +5 (very attractive) relative to the next ~30 days of expected performance.
- Uses a single live web search per stock to gather the latest 30 days of news, earnings, guidance, analyst revisions, and market reactions.
- Graded on a curve against all other Nasdaq-100 members (not rated in isolation). A +3 means the stock looks meaningfully better than most of the index right now, regardless of whether the overall market is up or down.
- Assigns a continuous latent rank (0 to 1) as a fine-grained ordinal signal. This is what drives how the portfolio is built from ratings (not the integer score directly).
- Maps scores to buckets for transparency: buy (≥ +2), hold (−1 to +1), sell (≤ −2). Buckets are a readability layer; the actual sort is by latent rank.
- Requires 2 to 6 explicit risks per rating. At least one must address information uncertainty, model error, or conflicting signals.
- Tracks change from the prior week's rating. If the bucket changes, the model must explain why.
How it works
Universe selection
We evaluate all ~100 current members of the Nasdaq-100 every week. The Nasdaq-100 is a curated index of the largest non-financial US companies — high liquidity, broad sector coverage, and globally recognized names. This gives the AI enough diversity to surface real cross-sectional signal.
AI scoring
Each stock receives a live web search for the latest 30 days of news, earnings, guidance, and analyst revisions. The AI scores it from −5 to +5 relative to the other 99 stocks — not in isolation. This cross-sectional comparison is what makes the signal useful: the AI doesn't need to predict the market, just which stocks look stronger than the rest. It also outputs a continuous latent rank (0–1) for fine-grained ordering.
Portfolio selection
Stocks are sorted by latent rank (highest = most attractive). Your portfolio settings determine how many top-ranked stocks to hold (Top 5 through Top 30) and how to weight them (equal or cap weight). No discretionary overrides — same inputs produce the same portfolio every rebalance.
Cost deduction
Every rebalance, we compute portfolio turnover (how much changed). We then deduct 15 basis points per unit of turnover from the gross return. This keeps results grounded in what you would actually earn after trading. Returns shown are pre-tax.
How we rank models
We order strategy models with a composite score so the headline reflects both how broadly the model's portfolio configs are working (not just one lucky configuration) and how strong risk-adjusted results look in the middle and at the top of the config set.
Each ingredient is scaled relative to other strategy models (min–max normalization), then combined with the weights below. Higher is better for all three after normalization.
The score blends three dimensions:
Breadth
Share of eligible configs with positive total return since inception
50%
Median Sharpe
Median risk-adjusted weekly return across eligible configs
30%
Best Sharpe
Highest Sharpe among eligible configs
20%
Why this mix: breadth keeps a model from ranking first on a single outlier portfolio; median Sharpe captures typical risk-adjusted quality; best Sharpe still rewards a strong top end without letting it dominate the headline.
Only portfolio configs with a ready composite rank feed these inputs (same eligibility as the per-model portfolio list). Models with no eligible configs still appear in the list using fallback metrics so the page does not break.
Methodology
Detailed technical notes on how each component is designed and measured.
Portfolios
The AI model only produces scores and ranks for every Nasdaq-100 stock. How you turn that into a portfolio is configurable: six risk levels (different top-N cuts), four rebalance cadences (weekly, monthly, quarterly, yearly), and equal vs. cap weighting.
How we rank portfolios
We rank portfolios with a composite score so order reflects both how money grew (total return and vs the Nasdaq-100 cap-weight benchmark) and how you got there (risk-adjusted return, week-to-week steadiness vs that benchmark, and drawdown depth).
Each metric is scaled relative to other portfolios for this model (min–max normalization), then combined with the weights below. That means rank is not “highest ending dollar wins,” but it does reward strong outcomes alongside discipline.
The score blends six dimensions:
Sharpe ratio
Risk-adjusted weekly return
30%
CAGR
Annualized return from inception
25%
Consistency
% of weeks beating Nasdaq-100 (cap) that week
15%
Max drawdown
Shallower losses score higher
10%
Total return
Cumulative return vs $10k start
10%
vs Nasdaq-100 (cap)
Portfolio total return minus benchmark over the same dates
10%
Why both growth and risk: total return and benchmark-relative return keep the list aligned with what you see on portfolio cards, while Sharpe, consistency, and drawdown still down-rank configs that only looked good from one lucky stretch or extreme risk-taking.
Portfolios require at least 2 weeks of data to be ranked. Those with fewer observations are shown with a "building track record" status.
Scoring
Each stock is scored on a discrete integer scale from −5 to +5. The score reflects relative attractiveness over the next ~30 days, calibrated across the full Nasdaq-100. The AI is explicitly instructed to avoid defaulting to 0 unless information is genuinely mixed.
In addition to the integer score, the AI produces a latent rank — a continuous value between 0 and 1. The portfolio layer sorts by latent rank (highest first). This separation allows the portfolio to capture ordering signal even when two stocks share the same integer score.
Scores are calibrated relative to other Nasdaq-100 members, not in absolute isolation. A +3 means the stock looks meaningfully more attractive than most of the other 99 stocks in the index right now.
Why relative, not absolute? Think of it like grading on a curve. Predicting whether any single stock will go up or down requires guessing the overall market direction (something nobody can do reliably). But picking out which stocks look stronger compared to their peers is a more tractable problem. In a falling market, every stock might drop, but the highest-ranked ones tend to drop less. In a rising market, they tend to rise more. Pelster & Val (2024) confirmed this in a live experiment: even during a stretch when every portfolio lost money in absolute terms, the top-rated stocks still outperformed the bottom-rated ones by a statistically significant margin. The relative signal held when absolute scores would have been meaningless.
Performance metrics
Total return is calculated from inception capital:total_return = (ending_equity / starting_capital) − 1
CAGR annualizes growth over elapsed calendar time:CAGR = (ending_equity / starting_capital)^(1 / years_elapsed) − 1
We use a fixed $10,000 starting capital for strategy and benchmark series. This keeps the model page and performance page consistent.
Turnover & costs
Turnover measures how much the portfolio changes each week. Formally:
A full replacement of all stocks gives turnover = 1.0. Typical weekly turnover for a Top-20 equal-weight portfolio is 0.15 to 0.35 depending on how much the ranking changes week to week.
Net return = gross return − (turnover × 15 bps). On the first run (no prior portfolio), turnover defaults to 1 (full buy-in).
15 bps per traded dollar is a conservative assumption covering both bid-ask spread and market impact for liquid large-cap stocks.
Quintile analysis
Every week, all ~100 Nasdaq-100 stocks are sorted by latent rank and split into 5 equal quintile groups (Q1 = lowest rated, Q5 = highest rated). We then compute the average 1-week forward return for each quintile.
A monotonically increasing pattern (Q1 < Q2 < Q3 < Q4 < Q5) indicates the model has genuine cross-sectional predictive signal — not just luck in the top 20 picks. This is the same methodology used in Pelster & Val (2024).
We also track 4-week non-overlapping quintile returns, computed on a formation-to-realization basis every 4 weeks.
The Q5 win rate is the fraction of weeks where Q5 outperformed Q1. Above 50% means the AI's top picks outperformed its bottom picks more often than not.
Regression
Each week, we test a single question: do higher AI scores lead to higher next-week returns? We take ~100 stocks, pair each stock's score with its next-week return, and fit a straight line:
This is a cross-sectional regression — not tracking one stock over time, but comparing many stocks against each other at the same point in time. AI score on the x-axis, next-week return on the y-axis, best-fit line through ~100 points. If the line slopes up (β > 0), higher-rated stocks tend to outperform.
β (Beta) — does the signal work?
How much return increases per 1-point increase in score. This is the core signal metric — if beta isn't positive, nothing else matters.
- β > 0 → higher scores → higher returns (working)
- β ≈ 0 → no relationship
- β < 0 → signal is inverted
Example: β = 0.002 → a score of +5 vs 0 implies ~+1% return spread.
Good: any positive value
Strong: > 0.002
Illustrative examples — synthetic data, not live results
Positive β
Higher AI scores tend to go with higher next-week returns — the relationship you want.
Negative β
Higher scores pair with lower returns — the signal is inverted or noise-dominated that week.
Same axes in each panel: score (−5 to +5) vs next-week return. Slopes are exaggerated for clarity.
R² — how much does it explain?
The percentage of differences between stock returns explained by the AI score alone. Even small values matter — stock returns are dominated by noise (company-specific events, random fluctuations), and no single signal explains most of the variation.
Baseline: 0.00 (no signal)
Meaningful: 0.01 – 0.05
Exceptional: > 0.05
Literature-derived benchmarks (not custom-tuned)
The β bands above come from cross-sectional equity research: any positive slope is the minimum bar; a weekly slope around 0.002 per score point is often treated as economically meaningful in academic settings (e.g. Fama–MacBeth–style regressions) — a rough guide, not a universal cutoff.
The R² bands reflect how noisy individual stock returns are: a single predictor rarely explains much of the cross-section. Values in the 1–5% range are commonly cited as meaningful for one factor; above 5% is unusually strong.
α (Alpha) — market context
The average return across all stocks that week. Positive means the market was broadly up; negative means down. This is background context, not a measure of model quality.
How to read results together
β positive + some R² → signal is working
β ≈ 0 → no edge
β negative → inverted signal
This test isolates the pure ranking ability of the model — it ignores portfolio portfolio, position sizing, and trading strategy. It answers only: “if I rank stocks by score, do the higher-ranked ones actually outperform?”
Quintile vs. regression
Both tests ask the same underlying question — does score predict return? — but in fundamentally different ways.
Regression (β, R²)
Uses every data point exactly as-is. A score of +5 is treated as stronger than +3; a score of −4 is treated as worse than −1. Fits one line across all stocks.
- Measures true signal strength
- Detects subtle, continuous relationships
- More statistically efficient
- Can be skewed by outliers
Think of it as: “Is there a real relationship?”
Quintiles (Q1–Q5)
Throws away precision and groups stocks into 5 buckets. Both +5 and +3 land in “top bucket”; both −4 and −1 land in “bottom bucket.” Then compares: did the top outperform the bottom?
- Measures practical portfolio outcome
- Very intuitive — “did the best outperform the worst?”
- Robust to noise and outliers
- Ignores granularity within buckets
Think of it as: “Can I make money from ranking?”
When they disagree
β positive, quintiles weak: Signal exists but is too noisy to cleanly separate buckets.
Quintiles strong, β weak: Signal may be nonlinear — only the extremes matter. Regression underestimates it.
Bottom line
Regression = signal detection (continuous)
Quintiles = strategy outcome (discrete)
You want β > 0 consistently and Q5 > Q1 consistently. If both align, the signal is strong and reliable. If only one works, investigate further.
Scientific grounding
This strategy is inspired by two peer-reviewed papers published in Finance Research Letters. We treat their findings as a testable hypothesis and verify them live, on real market data, with no lookahead bias.
Pelster & Val (2024) — “Can ChatGPT assist in picking stocks?”
Read paperFinance Research Letters · Primary reference
Core idea: Live experiment testing whether ChatGPT-4 with web access can rate S&P 500 stocks on a −5 to +5 relative attractiveness scale and produce ratings that predict future returns.
Why no backtest: Historical testing is invalid because ChatGPT may have been trained on future data. They run a live forward-only experiment — the same approach we use.
Setup: S&P 500 universe, ~2 months during the Q2 2023 earnings season. Each stock rated from −5 to +5 on both earnings surprise and relative attractiveness. Web search results (last ~30 days) summarized and fed into the prompt — very similar to our pipeline.
Why relative scoring matters: Ratings were explicitly framed as cross-sectional — “how attractive is this stock compared to all other S&P 500 stocks?” This is what makes the signal robust. Even during a period when every quintile portfolio had negative absolute returns, the highest-rated stocks still lost less than the lowest-rated ones (spread of +0.07%/day, t‑stat 4.35). The AI couldn't predict market direction, but it could reliably rank which stocks were relatively stronger.
Key findings:
- AI attractiveness ratings positively correlate with future stock returns
- Relative ranking holds even in negative-return markets
- AI adjusts ratings in response to earnings and news in near real-time
- Earnings forecasts add signal beyond analyst consensus
Limitations:
- Short time period (~2 months)
- Not a production portfolio — quintile analysis only
- Not tested over long horizons or different market regimes
Our alignment:
- Same live experiment approach, no backtesting
- Same relative −5 to +5 attractiveness rating scale
- Same live web search for recent news, earnings, and analyst data
- Same cross-sectional quintile and OLS regression framework
- Extended to Nasdaq-100 and automated for continuous weekly execution
Ko & Lee (2024) — “Can ChatGPT improve investment decisions?”
Read paperFinance Research Letters · Portfolio extension
Core idea: Extended the research from individual stock ratings to building full portfolios. Asked whether ChatGPT can select assets and build diversified portfolios that outperform random selection — across stocks, bonds, commodities, and more.
Key findings:
- AI-selected portfolios show statistically better diversification than random selection
- Portfolios built from AI picks outperform random portfolios
- AI identifies abstract relationships between assets across different classes
- Demonstrates AI potential as a co-pilot for portfolio management decisions
Our alignment:
- Portfolio from AI-ranked picks (Top 5 to Top 30, configurable)
- Benchmarked against both cap-weight and equal-weight Nasdaq-100
- Tracked live and unedited over multiple market conditions