Historical Judgment Context (4 Weeks)
Injects the analyst's own prior signals, EPS/PE deviations, actual outcomes, and reflection notes for the past 4 weeks. Allows the AI to learn from recent history within a single prompt.
{judgment_history_4w}The system runs automatically every Monday 05:00 UTC via cron. It covers 15 semiconductor tickers across 6 steps. Each ticker's analysis is self-contained and uses no future data.
Large Cap: NVDA, AMD, TSM, QCOM, AVGO, TXN, INTC, MU, AMAT, LRCX, KLAC, ADI, MCHP, ON, WOLF
Coverage spans AI/data center accelerators, memory, EDA, analog, power, and foundry subsectors.
Signal from Week N → Trade in Week N+1. The pipeline runs after market close on Friday. Positions are entered at Monday open of the following week and held until Friday close.
This eliminates any use of future data in signal generation.
Cron runs every Monday 05:00 UTC, executing data collection → pipeline → backfill → snapshot refresh automatically. No manual intervention required for normal weeks.
| Event | Timing | Detail |
|---|---|---|
| Signal Generation | Monday 05:00 UTC (Week N+1) | Pipeline runs using Week N's price close. Output = buy/hold/sell signal for each ticker. |
| Trade Entry | Monday Open (Week N+1) | entry_price = Monday open of trade week. This is the actual simulated buy price. |
| Trade Exit | Friday Close (Week N+1) | week_close_price = Friday close. weekly_return = (Friday close / prev Friday close − 1) × 100. |
| Actual Return (UI) | Week N+1's weekly_return | The "Actual Return" shown in UI for Week N signal = the return of the following week's trade. |
Every week, the prompt is rebuilt from scratch by injecting fresh data into each section in order. The diagram below shows every section in injection sequence, with its template variable and data source. The LLM only sees the final assembled text — it never accesses the database directly.
Analyst identity: Buy-side fundamental equity analyst, semiconductor sector. Strong prior that consensus is usually right.
Valuation formula: AI_EPS = Consensus_EPS × (1 + eps_deviation_pct/100) | AI_PE = Forward_PE × (1 + pe_adjustment_pct/100) | PT = AI_EPS × AI_PE
Hard caps: EPS ±15%, PE ±20%, ER ≤ 30%. Signal/conviction computed by code — NOT output by LLM.
11 decision rules (default=no change, sanity-check data, price-run = already priced in, >40% PT divergence = reassess) + Sector knowledge (cycle dynamics, cross-company signals, sub-sector PE ranges, EPS principles)
Per-week row: EPS forecast, PE, PT, signal, conviction, actual close, actual weekly_return, key_insight. Includes direction-correct count and bullish-bias pattern summary. Lets the AI audit its own recent track record before making a new call.
Prompt instruction: "Review your pattern: Are you consistently biased in one direction?"
Up to 3 event-matched correction examples. Format: "W{n} ({error_type}): Your prediction={signal}, ER={pct}% → Actual={pct}% → Root cause → Lesson." Filtered by event type (earnings lessons only on earnings weeks, non-earnings lessons otherwise).
Consensus (Street View):
PE Context:
Usage rules injected: ≥80th pct → street already optimistic, require NEW evidence for bullish; ≤20th pct → street conservative, modest bullish more defensible; 20–80% → normal. Primarily affects MAGNITUDE not DIRECTION. Used for pe_adjustment_pct, NOT eps_deviation_pct.
Persistent company-specific analyst memory. Each card: narrative_id, stability (STRUCTURAL/CYCLICAL/TACTICAL), claim, confirm_signal, falsifier, state (ACTIVE/STABLE/WEAKENED). These are the narratives the AI evaluates in Q1–Q4. Updated via memory_update after each run.
Historical outcomes grouped by narrative_id. Format: "[W{n} {net_direction} | reinforced/challenged] reason text." Shows the AI how each specific narrative has played out historically — if "AI demand narrative" was reinforced but the stock fell 3 times, the AI must factor that in.
"This Week" block: New reports published this week — parsed by GPT-4.1 into structured JSON (bank, EPS estimates, target price, rating, key thesis, catalysts, risks). This is the ONLY source for Step 0 analyst_claims extraction.
"Recent Context" block: Prior 3 weeks of reports — background only. NOT used for analyst_claims. Used in Q2/Q3 for trend assessment.
Running audit trail: past AI predictions vs settled actual outcomes. Format: prediction made → result observed → error type → root cause summary. Creates accountability for past calls and forces re-examination of persistent thesis errors.
Raw news filtered by GPT-4.1 for relevance to this ticker's investment thesis. Shows {n_news} filtered items from {total_news} total collected. Only material news passes. Lower weight than research reports — can flip a marginal call but cannot override clear report signals.
Earnings week triggers discipline rules: higher uncertainty → default toward 0/0 deviations. Price trend used by Rules 8–10 (rally = already priced in, decline = market sees risk).
Extract every EPS/PT/rating from Section 4 "This Week" reports. One entry per bank × narrative pair — if one bank addresses two narratives, output two entries. Map each claim to the closest narrative_id from Section 3c.
Self-check: Count banks in "This Week". If count > 0, analyst_claims must be non-empty. analyst_claims = [] only if zero new reports this week.
For each narrative in Section 3c: does this week's data match its confirm_signal? Cite specific report/news. No vague claims.
Does data trigger any narrative's falsifier? CYCLICAL/TACTICAL narrative un-reinforced for multiple weeks = weakening. Export controls present for weeks = priced in, only flag NEW escalation.
STRUCTURAL narratives rarely change. Require multi-source evidence of genuine shift — never from a single data point. Single-week news alone is insufficient.
Aggregate: STRUCTURAL reinforced = +1.5 weight; TACTICAL challenged = +0.5 weight toward bearish. Rules: reinforced > 2× challenged → bullish; challenged > 2× reinforced → bearish; within 1 → neutral. Result: net_direction field.
CYCLICAL narrative reinforced → justify above-consensus EPS? lean_bullish → modest +1% to +3% bias, not forced to zero. STRUCTURAL intact + below historical PE → PE expansion defensible? Section 3b percentile modulates SIZE.
The system prompt defines the analyst's identity, valuation methodology, and behavioral rules. It does NOT change week to week.
Buy-side fundamental equity analyst covering the semiconductor sector. The AI must behave like an institutional analyst: disciplined, skeptical of hype, anchored to consensus data, with a strong prior that consensus is usually right.
The AI does NOT produce a target price directly. It predicts deviations from market anchors:
eps_deviation_pct: How much AI thinks EPS will differ from Street consensus (±15% cap)pe_adjustment_pct: How much AI thinks the market will re-rate PE vs forward PE (±20% cap)Code then computes: AI_EPS = Consensus_EPS × (1 + dev/100)
AI_PE = Forward_PE × (1 + adj/100)
Target Price = AI_EPS × AI_PE
Expected Return = (Target / Current Price − 1) × 100%
Identity Property: When both deviations = 0%, Target Price = Current Price → Expected Return = 0% → Signal = hold. This is the correct default.
The system prompt also provides structural industry knowledge that the AI uses to contextualise weekly data.
The weekly delta prompt is re-assembled each week with fresh data injected into each section. Template variables are shown in {curly_braces}.
Injects the analyst's own prior signals, EPS/PE deviations, actual outcomes, and reflection notes for the past 4 weeks. Allows the AI to learn from recent history within a single prompt.
{judgment_history_4w}Curated examples of past analytical mistakes and corrections, formatted as "Situation → Error → Correction" rules. These are company-specific rule-based nudges derived from prediction tracking.
{few_shot_lessons}Provides the market anchors that the AI will deviate from. These are live-computed values:
PE Context (critical distinction):
pe_adjustment_pct adjusts from. {trailing_pe}Shows where the current implied upside (consensus PT vs price) sits within this stock's own historical distribution — using p25, p50, p75 percentiles. Injected variables:
{consensus_anchor_pt} {consensus_anchor_dynamic_er_pct} {consensus_anchor_dynamic_er_percentile_pct} {consensus_anchor_posture} {consensus_anchor_takeaway}Usage Rules (injected into prompt):
pe_adjustment_pct, NOT eps_deviation_pct — do NOT move EPS solely because the percentile is high or lowPersistent, company-specific analyst memory. Contains crystallised knowledge about this company's business model, typical patterns, and recurring themes. Updated by the AI after each prediction cycle.
{narrative_kb_section}Past prediction mistakes grouped by narrative type (e.g. "AI demand narrative", "memory cycle recovery narrative"). Each entry shows: what narrative was used, what the prediction was, what actually happened, and what to adjust next time.
{narrative_experiences_section}Structured extraction of analyst research reports, split into two blocks:
Running log of past AI predictions vs actual outcomes, at the point when results were known. Includes: what was predicted, what happened, how large the error was, and whether the AI's reasoning held up. This creates a self-audit trail.
{experience_summary}Filtered company news for the current week. Raw news is first filtered by GPT-4.1 for relevance to this ticker's investment thesis. Only material news is passed to the main prompt. The total count and filtered count are both shown to the AI.
{news_summary}Current week's price data: open, close, weekly change, 4-week trend, 52-week high/low position. Also includes the upcoming earnings date if within 2 weeks, which triggers earnings-week discipline rules.
After the data sections, the prompt instructs the AI to follow a structured reasoning sequence before outputting any numbers.
Before any reasoning, the AI must extract raw bank-by-bank evidence from Section 4's "This Week" reports into analyst_claims. This is separate from the narrative synthesis in Q1–Q5.
narrative_id from Section 3c's active narratives.analyst_claims may be [] ONLY if "This Week" shows zero new reports.narrative_reasoning — the full claim must appear in analyst_claims.analyst_claims must be non-empty.Review each active narrative from Section 3c. For each STABLE or WEAKENED narrative: does this week's data (reports, news) provide confirming evidence matching that narrative's confirm_signal? If yes, mark it as reinforced. Must cite which report/news said what — no vague claims.
For each narrative: does this week's data trigger the falsifier? Is a CYCLICAL or TACTICAL narrative fading due to lack of confirming evidence? A narrative that goes un-reinforced for multiple weeks should be flagged as weakening. Specific evidence required.
STRUCTURAL narratives (competitive moat, technology position) rarely change. Only flag if there is multi-source evidence of a genuine structural shift — not a single week's data point.
Aggregate reinforced vs challenged narratives using weighted scoring:
| Condition | net_direction |
|---|---|
| STRUCTURAL reinforced AND ≥50% of remaining reinforced | bullish |
| Reinforced count > 2× challenged | bullish |
| Reinforced count > challenged | lean_bullish |
| Reinforced ≈ challenged (within 1) | neutral |
| Challenged > reinforced | lean_bearish |
| Challenged > 2× reinforced | bearish |
Weighting rules: STRUCTURAL reinforced = +1.5 weight. TACTICAL challenged = only +0.5 weight toward bearish. TACTICAL challenges should NOT drag direction bearish if STRUCTURAL + CYCLICAL are intact. Export controls/regulatory risks present for multiple weeks are considered "priced in" — only flag if NEW escalation.
eps_deviation_pct as NET adjustment vs consensus (±15% max)pe_adjustment_pct FROM forward PE anchor (±20% max)The AI outputs a structured JSON object. Signal and conviction are NOT in this output — they are computed by code from the numeric deviations.
Signal (strong_buy / buy / hold / sell / strong_sell) is computed by code based on eps_deviation_pct and pe_adjustment_pct thresholds. Conviction (high/medium/low) is derived from the magnitude of the combined deviation. This ensures consistent, rule-based signal generation that cannot be "talked into" by narrative.
The system maintains three distinct memory layers that persist across weeks, allowing the AI to improve its analytical accuracy over time without retraining the model.
Company-specific, persistent knowledge that accumulates over time. Contains crystallised insights about a company's business model, typical PE ranges by cycle phase, management communication patterns, and recurring analytical pitfalls.
Updated by: LLM's memory_update output field after each prediction
Injected via: Section 3c of weekly delta prompt
analyst_narrative_kbWhen a prediction outcome is known (week N+1 actual return available), the system runs a settlement step: compare prediction vs outcome, classify error type, and write a corrective rule. These rules are formatted as few-shot examples for future prompts.
Updated by: Settlement job (runs after results available)
Injected via: Section 2 (few-shot lessons) + Section 5 (experience summary)
analyst_experience_rulesSame as Layer 2 but grouped by narrative type. If the AI used "AI demand narrative" to justify a bullish EPS call 5 times and was wrong 4 times, Section 3d will explicitly show this pattern, forcing the AI to discount that narrative class.
Updated by: Same settlement job, with narrative_type tagging
Injected via: Section 3d of weekly delta prompt
analyst_experience_rules (narrative_type)When Week N+1's week_close_price and weekly_return are filled (by the backfill job), the system knows the trade week result for Week N's signals.
Compare prediction direction (bullish/bearish/neutral) vs actual return direction (up/down). Classify: correct, directionally wrong, or magnitude error. Record error_magnitude.
For significant errors, generate a corrective rule in natural language. Example: "When NVDA quarterly guidance is in-line but forward PE is above 45x, avoid PE expansion assumptions — the market typically doesn't re-rate further."
The LLM's own memory_update field from Week N's output is applied to the narrative KB. This allows the AI to proactively update its knowledge from new information, not just from errors.
Over 57+ weeks, the system accumulates patterns like: "Bullish AI demand narratives for NVDA tend to over-predict EPS by +8% on average. Discount AI-demand-driven EPS calls by at least half."
The KB learns that NVDA's PE compresses aggressively when guidance disapppoints, while TXN's PE is historically stable ±5% even in weak quarters. Each company gets custom calibration.
The system tracks whether cross-company signals (TSMC beat → NVDA bullish) actually predicted well historically. Unreliable cross-company signals are downweighted in Section 5 experience rules.
The system learns to reduce deviation confidence in the week before earnings (high uncertainty → default to 0/0). This prevents over-confident pre-earnings calls that have historically been wrong.
All strategies are evaluated simultaneously on the same 15-ticker universe, same weekly signals, SHIFT=1 methodology. No strategy has access to future data. Performance is based on 55+ weeks of live signals starting March 2025.
The primary strategy the investment process is built around. adjusted_er = expected_return × conviction_multiplier × risk_multiplier. This ensures higher-confidence calls get marginally more exposure but avoids extreme concentration.
Proportional to signal strength. Avoids over-weighting marginal buy signals. Goes to cash if no buy signals exist.
Broader diversification than Top 5 — all positive-ER tickers equally weighted. Lower concentration risk, smoother returns.
The consistently best-performing strategy. Higher-upside signals get proportionally larger allocations, creating natural concentration toward the strongest calls without arbitrary cutoffs.
Adds a concentration guard. Prevents a single dominant call (e.g. NVDA at 80%) from making the portfolio too dependent on one name.
Quality filter on top of ER Prop. Removes low-conviction marginal buys. Typical effect: fewer positions but higher average conviction.
Regime-aware strategy. Adds a defensive half-position when the semiconductor sector has been collectively underperforming the NASDAQ for 4 consecutive weeks, signaling potential sector headwinds.
Most sophisticated AI-based weighting. Incorporates conviction quality, sector regime (bull/neutral/bear), and data quality (how many reports available). Positions with thin data or low conviction are systematically downweighted.
Uses the internal bank report data. Tickers with more bullish than bearish bank views get higher weight. Pure consensus signal with no AI adjustment.
Mimics AI Top 5 strategy but using Bloomberg sell-side consensus target prices. Used as a baseline to compare AI performance vs professional street consensus.
Bloomberg's equivalent of ER Proportional. Direct apples-to-apples comparison of AI's proportional allocation vs Bloomberg consensus allocation.
Bloomberg's equivalent of ER Equal Weight. Holds any stock with positive Bloomberg upside equally.
Pure passive benchmark within the semiconductor universe. Eliminates all stock selection — any outperformance vs this baseline reflects the value of the AI signals.
Simulates holding the initial 15-stock basket without rebalancing. Winners naturally grow larger. Good for measuring how the universe performs in a "set and forget" approach.
Simulates an index-like approach within the semiconductor universe. Naturally overweights NVDA, TSM, QCOM. Good for checking whether AI adds value vs passive market-cap exposure.
Quantitative portfolio construction using Modern Portfolio Theory. Uses AI signals as return inputs but optimizes the portfolio for risk-adjusted returns using historical covariance. Compare to see if quant optimization adds value vs simpler ER-proportional allocation.
The backtest is designed to be as realistic as possible for a long-only weekly rebalancing strategy. All methodology decisions are conservative — we prefer to undercount performance rather than overcount.
Signal from Week N is applied to trade Week N+1. The pipeline runs after Friday close (Monday 05:00 UTC), after Week N prices are final. No future price information is ever used in signal generation.
This is the most important design decision.
entry_price = Monday open of the trade week. This is the actual price at which a trader would have executed. We do not use week N's Friday close as the entry — that would be slightly optimistic.
weekly_return = (this week's Friday close / previous Friday close − 1) × 100
This is close-to-close, not open-to-close. The "actual return" in the UI uses this for the benchmark comparison. Trade return (open-to-close) is separately tracked via entry_price vs week_close_price.
The backtest does not subtract transaction costs, slippage, or market impact. For a 15-ticker weekly-rebalancing strategy with institutional-size trades, these are non-trivial. Actual returns would be lower by approximately 10–30bps per week depending on implementation.
When the pipeline runs, it first checks for any past weeks where week_close_price or weekly_return is NULL (e.g. from a previous pipeline run that couldn't fetch prices). These gaps are automatically filled from stock_prices before computing the current week's signals.
After each pipeline run, backfill_accuracy computes prediction accuracy metrics: did the signal direction match the actual return direction? Win rate, error magnitude, and EPS accuracy vs actual reported EPS are all tracked.
The most recent signal week is shown in the UI even before its trade week closes. It appears as a "Preview" row with 0 return placeholders, so portfolio managers can see current holdings and signals. The preview row is automatically replaced once real returns are available.
is_preview: true in snapshotThe backtest snapshot (weekly_portfolio_snapshot) is refreshed every Monday after the pipeline runs. Any missing price data that was later filled by the data provider is automatically incorporated in the next snapshot refresh.
Performance based on approximately 55 completed trade weeks starting March 2025. All strategies use SHIFT=1 methodology on the same 15-ticker universe.
| Strategy | Signal Source | Allocation Logic | Key Characteristic |
|---|---|---|---|
| ER Proportional | AI Expected Return | Weight ∝ ER (positive only) | Historically strongest total return |
| QA-ER | AI (conviction-adjusted) | Weight ∝ adj_er, 20% cap | Best risk-adjusted return |
| ER Prop + Hurdle | AI Expected Return | Weight ∝ ER, ER≥7% only | Fewer but higher-quality positions |
| Top 5 AI | AI adjusted_er | Equal 20% each, top 5 | Primary display strategy |
| ER Equal Weight | AI Expected Return | Equal weight, ER>0 universe | Broader diversification |
| Signal Weighted | AI signal score | Weight ∝ signal strength | Buy/strong_buy only |
| BBG Proportional | Bloomberg consensus | Weight ∝ BBG upside | Street consensus benchmark |
| BBG Top 5 | Bloomberg consensus | Equal 20%, top 5 BBG upside | AI vs BBG comparison |
| Bank Consensus | PDF-extracted bank ratings | Weight ∝ bullish ratio | Internal consensus baseline |
| Equal Weight | None (passive) | 6.67% each, 15 tickers | Passive benchmark |
| Market Cap Weighted | None (passive) | Proportional to market cap | Index-like benchmark |
| QQQ | None | Single ETF | Market benchmark |
Past performance over ~55 weeks is a short track record. The strategy has operated through one semiconductor sector cycle (2025 up-cycle). Performance in a sustained downturn or high-volatility macro environment has not been fully stress-tested. Transaction cost assumptions are zero — real implementation returns will be lower.
Based on MSCI/Bloomberg methodology + ASC 606/IFRS 15 revenue disaggregation + AI-driven alignment scoring. Data is theme-agnostic (extract once, score for any theme). → View Live Dashboard
layer (CORE/ENABLER/ADJACENT) + alignment_pct (0-100)
V2 (old): Fixed factors per layer — CORE=1.0, ENABLER=0.5, ADJACENT=0.25. Same factor for all segments in a layer.
V3 (new): AI scores each segment individually with alignment_pct (0-100%). A CORE segment might get 95% or 70% depending on actual relevance. Financial notes (PP&E, SBC, Intangibles) provide additional context for AI judgment.
Financial notes data provides supporting evidence for thematic alignment. Extracted from annual report and displayed as insight cards on the company detail page.
All data used in the weekly signal pipeline is pulled from the DEV database or collected by automated ETL scripts. Bloomberg data is used only for the backtest comparison UI, not in signal generation.
| Table / Source | Content | Used In | Update Frequency |
|---|---|---|---|
| report_extractions | Structured extraction of research report PDFs: bank name, EPS estimates, target prices, ratings, key thesis, catalysts, risks | Street consensus anchors, report_summary section, analyst claim ledger | Triggered when new PDF added |
| weekly_news_raw | Raw news articles collected by news_collector ETL. Company + sector news. Fields: headline, body, source, published_at, ticker | News filter (GPT-4.1) → news_summary injected in Section 6 | Weekly (Monday cron, Step C) |
| stock_prices | Daily OHLCV prices for all 15 tickers + QQQ. Fields: ticker, date, open, high, low, close, volume | Price context, entry_price, week_close_price, weekly_return, 4-week trend, 52-week range | Daily (data provider feed) |
| earnings_calendar | Upcoming earnings dates, analyst EPS estimates, actual EPS results, EPS surprise %. Fields: ticker, report_date, eps_estimate, eps_actual | Earnings week detection, EPS surprise backfill, earnings-week discipline | Updated as earnings reported |
| weekly_analyst_journal | All pipeline outputs per ticker per week: signal, conviction, eps/pe deviations, expected_return, summary, week_close_price, weekly_return, prediction accuracy | Historical context (Section 1), experience rules, backtest engine | Weekly (pipeline output) |
| analyst_narrative_kb | Persistent company-specific knowledge: business model insights, PE ranges, management patterns, recurring analytical errors | Section 3c (narrative KB injection) | Updated after each pipeline run via memory_update |
| analyst_experience_rules | Corrective rules generated from past prediction errors. Fields: ticker, narrative_type, situation, error, correction, created_at | Section 2 (few-shot lessons), Section 3d (narrative experiences), Section 5 (experience summary) | Weekly (settlement job post-result) |
| sector_intel (ETL) | Weekly sector-level macro summary: SOX trend, demand signals, supply chain updates, cross-company signals | Available as context for Q5 (cross-company signals) | Weekly (Step B) |
| bloomberg_consensus | Bloomberg consensus target prices and ratings for all 15 tickers. Used ONLY for comparison — never in signal generation | BBG strategy benchmarks in backtest UI only | Weekly (separate Bloomberg feed) |
Bank research PDFs are uploaded to the system. A GPT-4.1 extraction job parses each PDF into structured JSON: analyst name, bank, date, EPS estimates (FY1/FY2/FY3), target price, rating, key thesis, catalysts, and risks. Each report is stored in report_extractions. The pipeline uses the past 90 days of reports per ticker, deduplicated to latest per bank.
Company news is collected weekly for all 15 tickers. A two-stage process: (1) bulk collection of all relevant news from the past week, (2) GPT-4.1 relevance filter that scores each article for relevance to the ticker's investment thesis. Only materially relevant articles pass through to the weekly prompt.
Daily stock prices from a market data provider. Used for: week open/close (entry/exit prices), 4-week price trend, 52-week position (range context), QQQ benchmark, and historical covariance estimation for the MVO strategy. Backfill jobs automatically detect and fill gaps when data is delayed.
Earnings dates are pre-loaded and updated when actuals are reported. The pipeline checks earnings_calendar to determine if the upcoming week is an earnings week. If so, the AI's discipline rules apply: higher uncertainty → default toward 0/0 deviations unless there is very strong conviction from the claim ledger.
Bloomberg consensus target prices and ratings are never injected into the weekly prediction prompt. They are available in the bloomberg_consensus table and are LEFT JOIN'd into the backtest API purely for the purpose of showing AI performance vs street consensus in the UI.
Street consensus in the prompt (Section 3) = average of internal PDF-extracted bank reports, not Bloomberg. This means the AI's valuation anchors are derived from the same research reports that investors read, not from a black-box Bloomberg aggregate.