AIN Asset Management — AI Signal Intelligence Platform

Weekly Prediction Pipeline — End-to-End Flow

The system runs automatically every Monday 05:00 UTC via cron. It covers 15 semiconductor tickers across 6 steps. Each ticker's analysis is self-contained and uses no future data.

Step A–C: Weekly Data Collection (auto, non-fatal)

Thesis Diffs ETLExtract stance changes from prior week reports

Sector IntelGenerate sector-level macro summary

News CollectionCompany + sector news, all 15 tickers

▼

Step 1: Data Loading (per ticker, data_loader.py)

Street Consensus90-day PDF extraction lookback
avg EPS + Target Price per bank

Research ReportsCurrent week + 3-week context
PDF → GPT-4.1 extraction

News FilterCompany news → GPT-4.1 filter
relevance scoring, top-N kept

Price ContextWeek open/close/range
4-week trend, 52-week position

Earnings CalendarUpcoming earnings date
TTM PE, EPS actuals

▼

Step 2: Memory & Knowledge Injection (per ticker)

4-Week Judgment HistoryPrior signals, EPS/PE deviations, reasoning

Narrative Knowledge BasePersistent analyst memory — company-specific insights

Narrative ExperiencesAnchored past mistakes + lessons by narrative type

Few-Shot LessonsRule-based corrections from prior prediction errors

Consensus AnchorPercentile-normalised EPS posture (p25/p50/p75)

▼

Step 3: Prompt Assembly

system.mdAnchored Deviation framework
+ 11 decision rules

👆 click to view

weekly_delta_v18.md7 data sections + Step 0
+ Q1–Q5 narrative framework
+ 3 calibration steps

👆 click to view

▼

Step 4: LLM Judgment (GPT-4.1)

JSON Outputeps_deviation_pct · pe_adjustment_pct
narrative_reasoning · key_insight · memory_update

▼

Step 5: Code Computes (deterministic)

AI_EPS= Street_EPS × (1 + eps_dev%/100)
Cap: ±15%

AI_PE= Forward_PE × (1 + pe_adj%/100)
Cap: ±20%

Target Price= AI_EPS × AI_PE

Expected Return= (Target / Current Price − 1) × 100
Cap: ≤ 30%

Signal + ConvictionDerived from EPS deviation + PE adjustment
(NOT output by LLM)

▼

Step 6: Store + Backtest Refresh

weekly_analyst_journalAll signals, prices, EPS/PE saved

Backfill Missing PricesAuto-fills week_close + weekly_return for past weeks

Backtest SnapshotAll 16 strategies computed + saved to weekly_portfolio_snapshot

15 Tickers Covered

Large Cap: NVDA, AMD, TSM, QCOM, AVGO, TXN, INTC, MU, AMAT, LRCX, KLAC, ADI, MCHP, ON, WOLF

Coverage spans AI/data center accelerators, memory, EDA, analog, power, and foundry subsectors.

SHIFT=1: No Look-Ahead Bias

Signal from Week N → Trade in Week N+1. The pipeline runs after market close on Friday. Positions are entered at Monday open of the following week and held until Friday close.

This eliminates any use of future data in signal generation.

Automatic Weekly Operations

Cron runs every Monday 05:00 UTC, executing data collection → pipeline → backfill → snapshot refresh automatically. No manual intervention required for normal weeks.

Trade Timing — SHIFT=1 Illustration

Event	Timing	Detail
Signal Generation	Monday 05:00 UTC (Week N+1)	Pipeline runs using Week N's price close. Output = buy/hold/sell signal for each ticker.
Trade Entry	Monday Open (Week N+1)	entry_price = Monday open of trade week. This is the actual simulated buy price.
Trade Exit	Friday Close (Week N+1)	week_close_price = Friday close. weekly_return = (Friday close / prev Friday close − 1) × 100.
Actual Return (UI)	Week N+1's weekly_return	The "Actual Return" shown in UI for Week N signal = the return of the following week's trade.

Full Prompt Assembly — Injection Order (weekly_delta_v18.md)

Every week, the prompt is rebuilt from scratch by injecting fresh data into each section in order. The diagram below shows every section in injection sequence, with its template variable and data source. The LLM only sees the final assembled text — it never accesses the database directly.

SYSTEM (static)

MEMORY / LEARNING

VALUATION ANCHORS

RESEARCH / NEWS

PRICE / MARKET

INSTRUCTIONS

LLM OUTPUT

SYSTEM PROMPT — injected as system role (static, never changes)

SYS

system.md — Anchored Deviation Framework

Static file

Analyst identity: Buy-side fundamental equity analyst, semiconductor sector. Strong prior that consensus is usually right.

Valuation formula: AI_EPS = Consensus_EPS × (1 + eps_deviation_pct/100) | AI_PE = Forward_PE × (1 + pe_adjustment_pct/100) | PT = AI_EPS × AI_PE

Hard caps: EPS ±15%, PE ±20%, ER ≤ 30%. Signal/conviction computed by code — NOT output by LLM.

11 decision rules (default=no change, sanity-check data, price-run = already priced in, >40% PT divergence = reassess) + Sector knowledge (cycle dynamics, cross-company signals, sub-sector PE ranges, EPS principles)

▼

WEEKLY DELTA PROMPT (weekly_delta_v18.md) — injected as user role, rebuilt each week

Section 1 — Past 4-Week Judgment History

{judgment_history_4w} weekly_analyst_journal

Per-week row: EPS forecast, PE, PT, signal, conviction, actual close, actual weekly_return, key_insight. Includes direction-correct count and bullish-bias pattern summary. Lets the AI audit its own recent track record before making a new call.

Prompt instruction: "Review your pattern: Are you consistently biased in one direction?"

▼

Section 2 — Past Analytical Lessons (Few-Shot)

{few_shot_lessons} learning_review_outcomes

Up to 3 event-matched correction examples. Format: "W{n} ({error_type}): Your prediction={signal}, ER={pct}% → Actual={pct}% → Root cause → Lesson." Filtered by event type (earnings lessons only on earnings weeks, non-earnings lessons otherwise).

▼

Section 3 — Valuation Anchors

report_extractions + stock_prices

Consensus (Street View):

{consensus_eps} NTM Consensus EPS — Next-Twelve-Months blend (fiscal year weights), avg of last 90 days reports, deduped per bank
{street_avg_tp} Street Avg Target Price | {num_banks_bullish} / {num_banks_bearish} bank count

PE Context:

{trailing_pe} Forward PE = Price / NTM Consensus EPS — this is the pe_adjustment_pct anchor
{ttm_eps} Trailing PE (TTM) = Price / trailing 4Q actuals — reference only, not the anchor
{historical_valuation_context} Historical PE range, EPS beat/miss history, analyst consensus PE

▼

Section 3b — Consensus Anchor (Percentile)

historical consensus data

{consensus_anchor_pt} Consensus PT used as anchor | {consensus_anchor_dynamic_er_pct} Implied upside vs price
{consensus_anchor_dynamic_er_percentile_pct} Historical percentile | {consensus_anchor_posture} Street posture label
p25 {consensus_anchor_hist_p25_pct} / median {consensus_anchor_hist_p50_pct} / p75 {consensus_anchor_hist_p75_pct} | {consensus_anchor_takeaway} Auto-generated takeaway

Usage rules injected: ≥80th pct → street already optimistic, require NEW evidence for bullish; ≤20th pct → street conservative, modest bullish more defensible; 20–80% → normal. Primarily affects MAGNITUDE not DIRECTION. Used for pe_adjustment_pct, NOT eps_deviation_pct.

▼

Section 3c — Narrative Knowledge Base

{narrative_kb_section} research_os.narrative_kb

Persistent company-specific analyst memory. Each card: narrative_id, stability (STRUCTURAL/CYCLICAL/TACTICAL), claim, confirm_signal, falsifier, state (ACTIVE/STABLE/WEAKENED). These are the narratives the AI evaluates in Q1–Q4. Updated via memory_update after each run.

▼

Section 3d — Narrative-Anchored Experiences

{narrative_experiences_section} narrative_experience_links

Historical outcomes grouped by narrative_id. Format: "[W{n} {net_direction} | reinforced/challenged] reason text." Shows the AI how each specific narrative has played out historically — if "AI demand narrative" was reinforced but the stock fell 3 times, the AI must factor that in.

▼

Section 4 — Research Reports (This Week + Recent Context)

{report_summary} research.extractions

"This Week" block: New reports published this week — parsed by GPT-4.1 into structured JSON (bank, EPS estimates, target price, rating, key thesis, catalysts, risks). This is the ONLY source for Step 0 analyst_claims extraction.

"Recent Context" block: Prior 3 weeks of reports — background only. NOT used for analyst_claims. Used in Q2/Q3 for trend assessment.

▼

Section 5 — Research Experience (Settlement Log)

{experience_summary} analyst_experience_rules

Running audit trail: past AI predictions vs settled actual outcomes. Format: prediction made → result observed → error type → root cause summary. Creates accountability for past calls and forces re-examination of persistent thesis errors.

▼

Section 6 — News Digest

{news_summary} {n_news} / {total_news} weekly_news_raw → GPT-4.1 filter

Raw news filtered by GPT-4.1 for relevance to this ticker's investment thesis. Shows {n_news} filtered items from {total_news} total collected. Only material news passes. Lower weight than research reports — can flip a marginal call but cannot override clear report signals.

▼

Section 7 — Price Context + Earnings Calendar

stock_prices + earnings_calendar

{week_close} Friday close | {price_4w_trend} 4-week trend | {vs_52w_high} vs 52-week high/low
{vs_nasdaq_4w} vs NASDAQ (relative performance) | {earnings_context} upcoming earnings date / is_earnings_week flag

Earnings week triggers discipline rules: higher uncertainty → default toward 0/0 deviations. Price trend used by Rules 8–10 (rally = already priced in, decline = market sees risk).

▼

REASONING INSTRUCTIONS — in-prompt task sequence (static template text)

Step 0 — Analyst Claim Ledger (MANDATORY, before Q1–Q5)

Instruction

Extract every EPS/PT/rating from Section 4 "This Week" reports. One entry per bank × narrative pair — if one bank addresses two narratives, output two entries. Map each claim to the closest narrative_id from Section 3c.

Self-check: Count banks in "This Week". If count > 0, analyst_claims must be non-empty. analyst_claims = [] only if zero new reports this week.

▼

Q1 — Which narratives are REINFORCED this week?

Instruction

For each narrative in Section 3c: does this week's data match its confirm_signal? Cite specific report/news. No vague claims.

▼

Q2 — Which narratives are CHALLENGED or WEAKENING?

Instruction

Does data trigger any narrative's falsifier? CYCLICAL/TACTICAL narrative un-reinforced for multiple weeks = weakening. Export controls present for weeks = priced in, only flag NEW escalation.

▼

Q3 — Is any STRUCTURAL narrative changing fundamentally?

Instruction

STRUCTURAL narratives rarely change. Require multi-source evidence of genuine shift — never from a single data point. Single-week news alone is insufficient.

▼

Q4 — NET narrative direction this week?

Instruction

Aggregate: STRUCTURAL reinforced = +1.5 weight; TACTICAL challenged = +0.5 weight toward bearish. Rules: reinforced > 2× challenged → bullish; challenged > 2× reinforced → bearish; within 1 → neutral. Result: net_direction field.

▼

Q5 — How do narratives translate to eps_deviation and pe_adjustment?

Instruction

CYCLICAL narrative reinforced → justify above-consensus EPS? lean_bullish → modest +1% to +3% bias, not forced to zero. STRUCTURAL intact + below historical PE → PE expansion defensible? Section 3b percentile modulates SIZE.

▼

CAL

3 Calibration Steps

Instruction

Step 1: EPS Calibration

Baseline = historical beat/miss avg. Narrative reinforced → add. Challenged → subtract. Cap ±15%. Do NOT move EPS from PE signals.

Step 2: PE Calibration

Current PE vs historical median + consensus PE gap. Section 3b percentile → modulate magnitude. Cap ±20%.

Step 3: Uncertainty Check

Wide estimate spread → eps closer to 0. Multi-source contradiction → revise. Genuinely uncertain → 0%/0% is correct.

▼

LLM OUTPUT — returned as JSON, no markdown

OUT

Required JSON Output

→ code computes signal/conviction

        // analyst_claims: Step 0 output — per-bank × narrative

        "analyst_claims": [{ bank, narrative_id, stance, key_claim }]

        // Q1–Q4 output

        "narrative_reasoning": { reinforced[], challenged[], structural_shift, net_direction, narrative_summary }

        "conflict_arbitration": { stance, key_conflicts[], resolution }

        // Calibration output

        "eps_integration": { bear_eps, base_eps, bull_eps, bear_prob, base_prob, bull_prob, weighted_eps_deviation_pct }

        "pe_assessment": { fundamental_pe_range, sentiment_adjustment_pct, final_pe_adjustment_pct, reasoning }

        // KEY: the two numbers that drive everything downstream

        "eps_deviation_pct": ±15% max    "pe_adjustment_pct": ±20% max

        "eps_deviation_reason", "pe_adjustment_reason"

        "weekly_summary", "key_insight"

        "memory_update": "feeds narrative_kb (Section 3c next week)"

▼

CODE

Code Computes (deterministic, post-LLM)

pipeline.py

AI_EPS

Consensus_EPS × (1 + eps_deviation_pct/100)

AI_PE

Forward_PE × (1 + pe_adjustment_pct/100)

Target Price

AI_EPS × AI_PE

Expected Return

(PT / prev_close − 1) × 100, cap 30%

Signal

ER > 5% → buy | ER < −5% → sell | else → hold

Conviction

Derived from magnitude of combined EPS+PE deviation

system.md — Analyst Identity, Framework & Rules

The system prompt defines the analyst's identity, valuation methodology, and behavioral rules. It does NOT change week to week.

Analyst Identity

Buy-side fundamental equity analyst covering the semiconductor sector. The AI must behave like an institutional analyst: disciplined, skeptical of hype, anchored to consensus data, with a strong prior that consensus is usually right.

Valuation Framework: Anchored Deviation

The AI does NOT produce a target price directly. It predicts deviations from market anchors:

eps_deviation_pct: How much AI thinks EPS will differ from Street consensus (±15% cap)
pe_adjustment_pct: How much AI thinks the market will re-rate PE vs forward PE (±20% cap)

Code then computes: AI_EPS = Consensus_EPS × (1 + dev/100)

AI_PE = Forward_PE × (1 + adj/100)

Target Price = AI_EPS × AI_PE

Expected Return = (Target / Current Price − 1) × 100%

Hard Caps (Enforced by Code)

EPS deviation: maximum ±15% from consensus
PE adjustment: maximum ±20% from forward PE
Expected Return: capped at 30% (prevent over-concentration)

Identity Property: When both deviations = 0%, Target Price = Current Price → Expected Return = 0% → Signal = hold. This is the correct default.

11 Decision Rules

Rule 1: Default is no change — output 0%/0% unless new information materially shifts expectations
Rule 2: Only adjust EPS when new earnings evidence exists (guidance, demand shift, design wins/losses)
Rule 3: Only adjust PE when market's willingness to pay changes (macro, rotation, growth shift)
Rule 4: Distinguish signal from noise — most headlines are noise
Rule 5: Street consensus is an important reference; your deviation is your differentiated view
Rule 6: Use incremental changes — >5% EPS or >10% PE in one week requires exceptional evidence
Rule 7: Sanity-check all input data — negative EPS for profitable companies, PE < 1x or > 200x, EPS surprises > 200% are likely errors. Never blindly trust data.
Rule 8: Sustained rally (>15% in 4 weeks) = good news already priced in → be conservative on positive deviations
Rule 9: Sustained decline (>15% in 4 weeks) = market sees risk → investigate before maintaining bullish estimate
Rule 10: If your implied Target Price diverges >40% from current price, reassess — the market prices information not yet in reports
Rule 11: Signal and conviction are computed by code from your two numbers — you do NOT output them

System Prompt — Semiconductor Sector Knowledge

The system prompt also provides structural industry knowledge that the AI uses to contextualise weekly data.

Cycle Dynamics

Semiconductor cycles last 2–4 years; inventory corrections (down-cycle) last 12–18 months
Leading indicators: TSMC monthly revenue (demand proxy 6–8 weeks ahead), equipment book-to-bill, DRAM spot prices
Inventory normalisation precedes recovery: watch DIO returning to historical range

Cross-Company Signal Transmission

TSMC revenue → AMD/NVDA/AVGO revenue (1 quarter lag) — TSMC is the "canary in the datacenter coal mine"
AMAT/LRCX equipment orders → TSMC capex → chip supply (9–12 month lag)
NVIDIA data center → AMD MI-series competitive pressure → AMD pricing
Memory (MU) leads logic cycle by 6–12 months — memory restocking precedes logic upcycle

Valuation Ranges by Sub-Sector

AI/DC GPU (NVDA, AMD): 30–60x PE in upcycle, 20–30x in maturity
Fabless (QCOM, AVGO): 20–40x PE; premium for recurring software/royalty revenue
Memory (MU): 10–20x PE trough, 15–25x peak — highly cyclical
Equipment (AMAT, LRCX): 20–35x PE
Analog (ADI, TXN, MCHP, ON): 20–30x PE; industrial exposure adds cycle lag
Foundry (TSM): 15–25x PE; premium for leading-edge share

EPS Forecast Principles

Datacenter/AI: 2–3 quarters visibility via hyperscaler capex guidance
Auto/Industrial: long design win cycles (3–5 years); near-term driven by end-demand
Memory: ASP × volume model — ASP can swing ±30% in one quarter
Beat-and-raise is the bullish pattern; guide-down is the bearish signal that moves stocks most

Weekly Delta Prompt (v18) — 7 Data Sections + Reasoning Framework

The weekly delta prompt is re-assembled each week with fresh data injected into each section. Template variables are shown in {curly_braces}.

Historical Judgment Context (4 Weeks)

Injects the analyst's own prior signals, EPS/PE deviations, actual outcomes, and reflection notes for the past 4 weeks. Allows the AI to learn from recent history within a single prompt.

{judgment_history_4w}

Source: weekly_analyst_journal → formatted as structured table per week

Past Analytical Lessons (Few-Shot Learning)

Curated examples of past analytical mistakes and corrections, formatted as "Situation → Error → Correction" rules. These are company-specific rule-based nudges derived from prediction tracking.

{few_shot_lessons}

Source: analyst_experience_rules table, filtered by ticker

Valuation Anchors

Provides the market anchors that the AI will deviate from. These are live-computed values:

NTM Consensus EPS: Next-Twelve-Months blended EPS using fiscal year weights — the industry standard anchor. Averaged from all bank reports in past 90 days, deduplicated per bank. {consensus_eps}
Street Average Target Price: Consensus of bank target prices {street_avg_tp}
Banks: Bullish/bearish count {num_banks_bullish} | {num_banks_bearish}

PE Context (critical distinction):

Forward PE (this week's anchor): = Price / NTM Consensus EPS. This is what pe_adjustment_pct adjusts from. {trailing_pe}
Trailing PE (TTM reference only): = Price / trailing 4-quarter actual EPS. Not used as the adjustment anchor. {ttm_eps}
Historical Valuation Context: Sub-sector PE ranges, historical median, analyst consensus PE {historical_valuation_context}

Source: load_street_consensus() + load_trailing_pe() from report_extractions + earnings_calendar tables

Consensus Anchor (Percentile Context)

Shows where the current implied upside (consensus PT vs price) sits within this stock's own historical distribution — using p25, p50, p75 percentiles. Injected variables:

{consensus_anchor_pt} {consensus_anchor_dynamic_er_pct} {consensus_anchor_dynamic_er_percentile_pct} {consensus_anchor_posture} {consensus_anchor_takeaway}

Usage Rules (injected into prompt):

Percentile ≥ 80%: Street is already unusually optimistic for this stock → require specific NEW evidence before bullish deviation, and size it conservatively
Percentile ≤ 20%: Street is unusually conservative → if this week's evidence is stable-to-better, a modest bullish deviation is more defensible
Percentile 20–80%: Normal → rely mainly on this week's reports/news
Affects MAGNITUDE more than DIRECTION — doesn't flip signals, but sizes them
Primarily for pe_adjustment_pct, NOT eps_deviation_pct — do NOT move EPS solely because the percentile is high or low

Source: historical consensus data, percentile computed weekly from last N weeks of observations

Narrative Knowledge Base

Persistent, company-specific analyst memory. Contains crystallised knowledge about this company's business model, typical patterns, and recurring themes. Updated by the AI after each prediction cycle.

{narrative_kb_section}

Source: analyst_narrative_kb table, company-specific

Narrative-Anchored Experiences

Past prediction mistakes grouped by narrative type (e.g. "AI demand narrative", "memory cycle recovery narrative"). Each entry shows: what narrative was used, what the prediction was, what actually happened, and what to adjust next time.

{narrative_experiences_section}

Source: analyst_experience_rules with narrative_type grouping

Research Reports — Analytical Views

Structured extraction of analyst research reports, split into two blocks:

"This Week" block: NEW reports published this week — these are the primary source for Step 0's analyst claim ledger. Each report is parsed by GPT-4.1 into: bank, EPS estimates, target price, rating, key thesis, catalysts, risks.
"Recent Context" block: Reports from the prior 3 weeks — background only. NOT used for analyst_claims extraction.

{report_summary}

Source: report_extractions (PDF → GPT-4.1 → structured JSON), load_week_reports()

Research Experience (Settlement Log)

Running log of past AI predictions vs actual outcomes, at the point when results were known. Includes: what was predicted, what happened, how large the error was, and whether the AI's reasoning held up. This creates a self-audit trail.

{experience_summary}

Source: analyst_experience_rules + journal settlement data

News Digest

Filtered company news for the current week. Raw news is first filtered by GPT-4.1 for relevance to this ticker's investment thesis. Only material news is passed to the main prompt. The total count and filtered count are both shown to the AI.

{news_summary}

Source: weekly_news_raw → filter_news() (GPT-4.1) → top-N relevant items

Price Context + Earnings Calendar

Current week's price data: open, close, weekly change, 4-week trend, 52-week high/low position. Also includes the upcoming earnings date if within 2 weeks, which triggers earnings-week discipline rules.

Source: stock_prices + earnings_calendar tables

Reasoning Framework — Step 0 + Q1–Q5 + 3 Calibration Steps

After the data sections, the prompt instructs the AI to follow a structured reasoning sequence before outputting any numbers.

Step 0: Analyst Claim Ledger (COMPLETE BEFORE Q1–Q5)

Before any reasoning, the AI must extract raw bank-by-bank evidence from Section 4's "This Week" reports into analyst_claims. This is separate from the narrative synthesis in Q1–Q5.

One entry per bank × narrative pair. If one bank addresses two distinct narratives, output two entries.
Each claim must be mapped to the closest narrative_id from Section 3c's active narratives.
analyst_claims may be [] ONLY if "This Week" shows zero new reports.
Do NOT bury bank evidence only inside narrative_reasoning — the full claim must appear in analyst_claims.
Self-check: Count distinct banks in "This Week". If count > 0, analyst_claims must be non-empty.

Q1: Which narratives are REINFORCED this week?

Review each active narrative from Section 3c. For each STABLE or WEAKENED narrative: does this week's data (reports, news) provide confirming evidence matching that narrative's confirm_signal? If yes, mark it as reinforced. Must cite which report/news said what — no vague claims.

Q2: Which narratives are CHALLENGED or WEAKENING?

For each narrative: does this week's data trigger the falsifier? Is a CYCLICAL or TACTICAL narrative fading due to lack of confirming evidence? A narrative that goes un-reinforced for multiple weeks should be flagged as weakening. Specific evidence required.

Q3: Is any STRUCTURAL narrative changing fundamentally?

STRUCTURAL narratives (competitive moat, technology position) rarely change. Only flag if there is multi-source evidence of a genuine structural shift — not a single week's data point.

Q4: What is the NET narrative direction this week?

Aggregate reinforced vs challenged narratives using weighted scoring:

Condition	net_direction
STRUCTURAL reinforced AND ≥50% of remaining reinforced	bullish
Reinforced count > 2× challenged	bullish
Reinforced count > challenged	lean_bullish
Reinforced ≈ challenged (within 1)	neutral
Challenged > reinforced	lean_bearish
Challenged > 2× reinforced	bearish

Weighting rules: STRUCTURAL reinforced = +1.5 weight. TACTICAL challenged = only +0.5 weight toward bearish. TACTICAL challenges should NOT drag direction bearish if STRUCTURAL + CYCLICAL are intact. Export controls/regulatory risks present for multiple weeks are considered "priced in" — only flag if NEW escalation.

Q5: How do active narratives translate to eps_deviation and pe_adjustment?

Narrative-driven EPS: If a CYCLICAL narrative about demand/pricing is reinforced, does it justify above-consensus EPS? By how much?
When net_direction = lean_bullish: eps_deviation should have a modest positive bias (+1% to +3%), NOT forced to zero just because it's not fully bullish
Narrative-driven PE: If STRUCTURAL narrative is intact and the stock trades below historical PE (per Section 3 PE Context), does narrative confidence justify PE expansion?
Consensus Anchor (Section 3b) modulates SIZE — high percentile → smaller magnitude even if bullish narrative

3 Calibration Steps (Before Final Output)

Step 1: EPS Calibration (anchor: consensus EPS)

Structural baseline: Check historical EPS beat/miss pattern. If company beats by avg X%, starting eps_deviation ≈ +X%
Narrative adjustment: If demand/pricing narrative is reinforced → add to baseline. If challenged → subtract.
Compute final eps_deviation_pct as NET adjustment vs consensus (±15% max)
Do NOT move EPS based on PE/valuation signals — EPS requires earnings or guidance evidence

Step 2: PE Calibration (anchor: forward PE)

Historical PE range: Is current PE below historical median? → positive adjustment likely justified
Analyst consensus gap: Street analysts price at consensus_pe. Market at forward PE. Intact narrative → convergence warranted.
Consensus Anchor percentile (Section 3b): High percentile → smaller positive adjustment
Compute final pe_adjustment_pct FROM forward PE anchor (±20% max)

Step 3: Uncertainty Check

Wide earnings spread → eps_deviation closer to 0
Narrative directly contradicted by multiple independent sources → revise, don't just flag
Genuinely uncertain → output 0% (valid and expected answer)

LLM Output JSON Structure

The AI outputs a structured JSON object. Signal and conviction are NOT in this output — they are computed by code from the numeric deviations.

      // LLM output JSON (signal/conviction computed by code afterward)

      {

        "analyst_claims": [{ bank, narrative_id, stance, key_claim }], // Step 0: per-bank × narrative

        "narrative_reasoning": {

          "reinforced": ["narrative_id: why"],

          "challenged": ["narrative_id: why"],

          "structural_shift": "none|flag",

          "net_direction": "bullish|lean_bullish|neutral|lean_bearish|bearish|mixed",

          "narrative_summary": "2-3 sentence thesis"

        },

        "conflict_arbitration": {

          "stance": "lean_bullish|lean_bearish|neutral|mixed",

          "key_conflicts": ["..."],

          "resolution": "how resolved"

        },

        "eps_integration": {

          "bear_eps": number, "base_eps": number, "bull_eps": number,

          "bear_prob": number, "base_prob": number, "bull_prob": number,

          "weighted_eps_deviation_pct": number

        }, // 3-scenario probabilistic EPS

        "pe_assessment": {

          "fundamental_pe_range": [low, high],

          "sentiment_adjustment_pct": number,

          "final_pe_adjustment_pct": number,

          "reasoning": "string"

        }, // structured PE reasoning

        "eps_deviation_pct": 5.2, // KEY OUTPUT: ±15% cap

        "pe_adjustment_pct": -3.0, // KEY OUTPUT: ±20% cap

        "eps_deviation_reason": "Short text justification...",

        "pe_adjustment_reason": "Short text justification...",

        "weekly_summary": "2-3 sentence summary for UI",

        "key_insight": "Single most important takeaway",

        "memory_update": "Key info to carry forward" // feeds AI Learning (narrative KB)

      }

Signal Derivation (Code, Not LLM)

Signal (strong_buy / buy / hold / sell / strong_sell) is computed by code based on eps_deviation_pct and pe_adjustment_pct thresholds. Conviction (high/medium/low) is derived from the magnitude of the combined deviation. This ensures consistent, rule-based signal generation that cannot be "talked into" by narrative.

AI Learning System — Three Layers of Memory

The system maintains three distinct memory layers that persist across weeks, allowing the AI to improve its analytical accuracy over time without retraining the model.

Layer 1: Narrative Knowledge Base (KB)

Company-specific, persistent knowledge that accumulates over time. Contains crystallised insights about a company's business model, typical PE ranges by cycle phase, management communication patterns, and recurring analytical pitfalls.

Updated by: LLM's memory_update output field after each prediction

Injected via: Section 3c of weekly delta prompt

analyst_narrative_kb

Layer 2: Prediction Experience Rules

When a prediction outcome is known (week N+1 actual return available), the system runs a settlement step: compare prediction vs outcome, classify error type, and write a corrective rule. These rules are formatted as few-shot examples for future prompts.

Updated by: Settlement job (runs after results available)

Injected via: Section 2 (few-shot lessons) + Section 5 (experience summary)

analyst_experience_rules

Layer 3: Narrative-Anchored Experiences

Same as Layer 2 but grouped by narrative type. If the AI used "AI demand narrative" to justify a bullish EPS call 5 times and was wrong 4 times, Section 3d will explicitly show this pattern, forcing the AI to discount that narrative class.

Updated by: Same settlement job, with narrative_type tagging

Injected via: Section 3d of weekly delta prompt

analyst_experience_rules (narrative_type)

Weekly Learning Cycle — Predict → Observe → Settle → Update

Week N: PredictAI outputs eps_deviation_pct
pe_adjustment_pct + memory_update

→

Week N+1: ObserveActual price return known
weekly_return filled in

→

Settlement JobCompare prediction vs actual
Classify error type

→

Memory UpdateWrite corrective rules
Update narrative KB

▼

Week N+2: Next PredictionMemory + lessons + narrative experiences all injected → AI "remembers" past mistakes

Settlement Process — What Happens When Results Arrive

1. Outcome Detection

When Week N+1's week_close_price and weekly_return are filled (by the backfill job), the system knows the trade week result for Week N's signals.

2. Error Classification

Compare prediction direction (bullish/bearish/neutral) vs actual return direction (up/down). Classify: correct, directionally wrong, or magnitude error. Record error_magnitude.

3. Rule Generation

For significant errors, generate a corrective rule in natural language. Example: "When NVDA quarterly guidance is in-line but forward PE is above 45x, avoid PE expansion assumptions — the market typically doesn't re-rate further."

4. Memory Update Application

The LLM's own memory_update field from Week N's output is applied to the narrative KB. This allows the AI to proactively update its knowledge from new information, not just from errors.

What the AI Learns Over Time

Pattern Recognition by Narrative

Over 57+ weeks, the system accumulates patterns like: "Bullish AI demand narratives for NVDA tend to over-predict EPS by +8% on average. Discount AI-demand-driven EPS calls by at least half."

Company-Specific PE Behavior

The KB learns that NVDA's PE compresses aggressively when guidance disapppoints, while TXN's PE is historically stable ±5% even in weak quarters. Each company gets custom calibration.

Cross-Company Signal Reliability

The system tracks whether cross-company signals (TSMC beat → NVDA bullish) actually predicted well historically. Unreliable cross-company signals are downweighted in Section 5 experience rules.

Earnings Week Discipline

The system learns to reduce deviation confidence in the week before earnings (high uncertainty → default to 0/0). This prevents over-confident pre-earnings calls that have historically been wrong.

16 Portfolio Strategies — Backtest Universe

All strategies are evaluated simultaneously on the same 15-ticker universe, same weekly signals, SHIFT=1 methodology. No strategy has access to future data. Performance is based on 55+ weeks of live signals starting March 2025.

AI Signal Strategies

Top 5 AI Primary Strategy

Select: Top 5 tickers by adjusted_er (Conviction-adjusted Expected Return)
Weight: Equal — each selected ticker = 20%
Condition: Only include if adjusted_er > 0 (positive upside required)
If <5 positive ER tickers: hold fewer positions; if 0: go to cash

The primary strategy the investment process is built around. adjusted_er = expected_return × conviction_multiplier × risk_multiplier. This ensures higher-confidence calls get marginally more exposure but avoids extreme concentration.

Benchmark for all other strategy comparisons

Signal Weighted

Universe: Only tickers with signal = buy or strong_buy
Weight ∝ signal_score (strong_buy > buy, high conviction > medium > low)
Normalize: weights sum to 1.0 across selected tickers

Proportional to signal strength. Avoids over-weighting marginal buy signals. Goes to cash if no buy signals exist.

ER Equal Weight

Universe: All tickers with expected_return > 0
Weight: 1 / N_positive — equal weight among positive ER tickers
Condition: ER must be > 0; hold signal tickers excluded

Broader diversification than Top 5 — all positive-ER tickers equally weighted. Lower concentration risk, smoother returns.

ER Proportional Best Performer

Universe: All tickers with expected_return > 0
Weight ∝ expected_return (higher ER = higher allocation)
w(t) = ER(t) / Σ ER(all positive tickers)

The consistently best-performing strategy. Higher-upside signals get proportionally larger allocations, creating natural concentration toward the strongest calls without arbitrary cutoffs.

Historically highest total return

ER Prop + 20% Cap

Same as ER Proportional, then:
Cap any single ticker at 20% maximum weight
Excess weight redistributed proportionally to remaining positions

Adds a concentration guard. Prevents a single dominant call (e.g. NVDA at 80%) from making the portfolio too dependent on one name.

ER Prop + 7% Hurdle

Universe: Only tickers with expected_return ≥ 7%
Weight ∝ expected_return (same as ER Prop)
Logic: <7% upside = "already priced in", not worth holding

Quality filter on top of ER Prop. Removes low-conviction marginal buys. Typical effect: fewer positions but higher average conviction.

ER Prop + NASDAQ Guard

Base: ER Proportional weights
Guard: If sector average vs_nasdaq_4w < 0 (sector lagging NASDAQ for 4 weeks):
→ halve all weights (50% cash buffer)
Rationale: reduce exposure in sector downtrends

Regime-aware strategy. Adds a defensive half-position when the semiconductor sector has been collectively underperforming the NASDAQ for 4 consecutive weeks, signaling potential sector headwinds.

Quality-Adjusted ER (QA-ER)

adj_er = expected_return × conviction_mult × regime_mult × data_quality_mult
Weight ∝ adj_er (only positive adj_er tickers)
Cap: 20% per ticker
conviction_mult: high=1.2, medium=1.0, low=0.8

Most sophisticated AI-based weighting. Incorporates conviction quality, sector regime (bull/neutral/bear), and data quality (how many reports available). Positions with thin data or low conviction are systematically downweighted.

Bloomberg / Bank Consensus Strategies (Benchmarks)

Bank Consensus

Weight ∝ bullish_ratio = num_banks_bullish / (bullish + bearish)
Higher bullish ratio = higher allocation
Default ratio = 0.5 (neutral) if no data

Uses the internal bank report data. Tickers with more bullish than bearish bank views get higher weight. Pure consensus signal with no AI adjustment.

Consensus baseline

BBG Top 5

Select: Top 5 tickers by Bloomberg consensus upside
upside = (BBG_target_price / current_price − 1) × 100
Weight: Equal — each = 20%
Source: bloomberg_consensus table (NOT in pipeline — comparison only)

Mimics AI Top 5 strategy but using Bloomberg sell-side consensus target prices. Used as a baseline to compare AI performance vs professional street consensus.

Street consensus comparison

BBG Proportional

Universe: All tickers with BBG upside > 0
Weight ∝ Bloomberg consensus upside
Source: bloomberg_consensus table

Bloomberg's equivalent of ER Proportional. Direct apples-to-apples comparison of AI's proportional allocation vs Bloomberg consensus allocation.

BBG Equal Weight

Universe: All tickers where BBG consensus target > current price
Weight: 1 / N_positive (equal weight)
Source: bloomberg_consensus table

Bloomberg's equivalent of ER Equal Weight. Holds any stock with positive Bloomberg upside equally.

Passive / Benchmark Strategies

Equal Weight (Passive)

Weight: 1/15 = 6.67% for all 15 tickers, every week
Rebalanced weekly back to equal weight
No signals used

Pure passive benchmark within the semiconductor universe. Eliminates all stock selection — any outperformance vs this baseline reflects the value of the AI signals.

Buy & Hold

Start: Equal weight (1/15 each)
Weights drift with actual price changes each week
No rebalancing — position sizes reflect cumulative price performance

Simulates holding the initial 15-stock basket without rebalancing. Winners naturally grow larger. Good for measuring how the universe performs in a "set and forget" approach.

Market Cap Weighted

Weight: Fixed proportional to initial market capitalization
Rebalanced back to market-cap weights each week
NVDA typically dominates (~30-40% weight)

Simulates an index-like approach within the semiconductor universe. Naturally overweights NVDA, TSM, QCOM. Good for checking whether AI adds value vs passive market-cap exposure.

MVO (Mean-Variance Optimization)

Expected returns (μ): from AI expected_return per ticker
Covariance matrix (Σ): estimated from 52-week price history with Ledoit-Wolf shrinkage
Objective: Maximize Sharpe Ratio (PyPortfolioOpt)
Constraint: long-only, weights sum to 1

Quantitative portfolio construction using Modern Portfolio Theory. Uses AI signals as return inputs but optimizes the portfolio for risk-adjusted returns using historical covariance. Compare to see if quant optimization adds value vs simpler ER-proportional allocation.

Backtest Design Principles

The backtest is designed to be as realistic as possible for a long-only weekly rebalancing strategy. All methodology decisions are conservative — we prefer to undercount performance rather than overcount.

SHIFT=1: Zero Look-Ahead Bias

Signal from Week N is applied to trade Week N+1. The pipeline runs after Friday close (Monday 05:00 UTC), after Week N prices are final. No future price information is ever used in signal generation.

This is the most important design decision.

Realistic Entry Price

entry_price = Monday open of the trade week. This is the actual price at which a trader would have executed. We do not use week N's Friday close as the entry — that would be slightly optimistic.

Weekly Return Definition

weekly_return = (this week's Friday close / previous Friday close − 1) × 100

This is close-to-close, not open-to-close. The "actual return" in the UI uses this for the benchmark comparison. Trade return (open-to-close) is separately tracked via entry_price vs week_close_price.

No Transaction Costs

The backtest does not subtract transaction costs, slippage, or market impact. For a 15-ticker weekly-rebalancing strategy with institutional-size trades, these are non-trivial. Actual returns would be lower by approximately 10–30bps per week depending on implementation.

Return Calculation — Exact Formulas

Weekly Portfolio Return

          R_portfolio(week) =

            Σ_{ticker} weight(ticker) × weekly_return(ticker)

          weekly_return(ticker, week N+1) =

            (close_price[N+1_Friday] / close_price[N_Friday] − 1) × 100

Cumulative Return

          cumulative = 1.0  # start at $1

          For each week:

            cumulative *= (1 + R_portfolio / 100)

          Total Return % = (cumulative − 1) × 100

Sharpe Ratio (Annualized)

          Sharpe = (avg_weekly_return / std_weekly_return)

                    × sqrt(52)

          Risk-free rate: 0 (simplification)

          Uses all completed trade weeks

QQQ Benchmark

          QQQ weekly return =

            (QQQ_trade_week_close / QQQ_trade_week_open) − 1

          Alpha = portfolio_cumulative − QQQ_cumulative

          Note: QQQ uses open→close (same as trade week)

Data Integrity Safeguards

Auto-Backfill of Missing Prices

When the pipeline runs, it first checks for any past weeks where week_close_price or weekly_return is NULL (e.g. from a previous pipeline run that couldn't fetch prices). These gaps are automatically filled from stock_prices before computing the current week's signals.

backfill_missing_prices()

Accuracy Backfill (Weekly)

After each pipeline run, backfill_accuracy computes prediction accuracy metrics: did the signal direction match the actual return direction? Win rate, error magnitude, and EPS accuracy vs actual reported EPS are all tracked.

backfill_accuracy.py

Current Week Preview

The most recent signal week is shown in the UI even before its trade week closes. It appears as a "Preview" row with 0 return placeholders, so portfolio managers can see current holdings and signals. The preview row is automatically replaced once real returns are available.

is_preview: true in snapshot

Snapshot Freshness

The backtest snapshot (weekly_portfolio_snapshot) is refreshed every Monday after the pipeline runs. Any missing price data that was later filled by the data provider is automatically incorporated in the next snapshot refresh.

weekly_portfolio_snapshot

Strategy Performance Summary (As of Last Backtest Run)

Performance based on approximately 55 completed trade weeks starting March 2025. All strategies use SHIFT=1 methodology on the same 15-ticker universe.

Strategy	Signal Source	Allocation Logic	Key Characteristic
ER Proportional	AI Expected Return	Weight ∝ ER (positive only)	Historically strongest total return
QA-ER	AI (conviction-adjusted)	Weight ∝ adj_er, 20% cap	Best risk-adjusted return
ER Prop + Hurdle	AI Expected Return	Weight ∝ ER, ER≥7% only	Fewer but higher-quality positions
Top 5 AI	AI adjusted_er	Equal 20% each, top 5	Primary display strategy
ER Equal Weight	AI Expected Return	Equal weight, ER>0 universe	Broader diversification
Signal Weighted	AI signal score	Weight ∝ signal strength	Buy/strong_buy only
BBG Proportional	Bloomberg consensus	Weight ∝ BBG upside	Street consensus benchmark
BBG Top 5	Bloomberg consensus	Equal 20%, top 5 BBG upside	AI vs BBG comparison
Bank Consensus	PDF-extracted bank ratings	Weight ∝ bullish ratio	Internal consensus baseline
Equal Weight	None (passive)	6.67% each, 15 tickers	Passive benchmark
Market Cap Weighted	None (passive)	Proportional to market cap	Index-like benchmark
QQQ	None	Single ETF	Market benchmark

Important Caveat

Past performance over ~55 weeks is a short track record. The strategy has operated through one semiconductor sector cycle (2025 up-cycle). Performance in a sustained downturn or high-volatility macro environment has not been fully stress-tested. Transaction cost assumptions are zero — real implementation returns will be lower.

Thematic Purity V3 — Revenue Disaggregation + AI Alignment Pipeline

Based on MSCI/Bloomberg methodology + ASC 606/IFRS 15 revenue disaggregation + AI-driven alignment scoring. Data is theme-agnostic (extract once, score for any theme). → View Live Dashboard

84+

Companies

625+

Sub-Segments

Business Types

Pipeline Steps

Themes

~$0.80

Total AI Cost

Phase 1 Data Extraction — Theme-agnostic · Run once · All themes reuse

STEP 1

Segment Extraction

ASC 280 / IFRS 8

Annual report PDF → Gemini 2.5 Flash
Extract L1 reportable segments
Revenue %, amounts, description
Key products & end markets

Output Tables

annual_report_extractions
extracted_segments_v2

🤖 Gemini Flash × N

STEP 2 ⭐

Revenue Disaggregation

ASC 606 / IFRS 15

Same PDF, second Gemini call
End market breakdown
Product type breakdown
Geography breakdown
Cross-check with L1 (±5%)

Output Table

extracted_sub_segments

🤖 Gemini Flash × N

STEP 2.1 🆕

Financial Notes Extraction

ASC 360 / ASC 718 / ASC 350 / IAS 38

Same PDF, third Gemini call
PP&E breakdown — land, buildings, equipment, CIP
SBC allocation — R&D vs Sales vs G&A
Intangible assets — patents, technology, goodwill, customer relationships
Net amounts, useful lives, validation status

Output Tables

extracted_ppe_breakdown
extracted_sbc_allocation
extracted_intangibles_breakdown

🤖 Gemini Flash × N

STEP 3

Business Type Normalization

GICS-like classification

Classify L2 sub-segments (priority)
Fallback to L1 if no L2 data
Standard industry categories
e.g. "AI & HPC Semiconductors"

Output

business_type field

🤖 GPT × 1 call

Phase 2 Theme Scoring — Theme-specific · Per theme · AI-driven alignment

STEP 4 ⭐ CHANGED

AI Per-Segment Alignment Scoring

Gemini 2.5 Flash · Per company · With financial context

NEW: AI scores EACH segment's alignment (0-100%)
Input: segments + business_types + financial notes context (PP&E, SBC, Intangibles)
AI determines: layer (CORE/ENABLER/ADJACENT) + alignment_pct (0-100)
Replaces fixed factors (1.0/0.5/0.25) with granular per-segment scoring
Financial notes provide context for alignment judgment

Output

ai_layer + alignment_pct per segment
stored on extracted_segments_v2 / extracted_sub_segments

🤖 Gemini Flash × N companies

STEP 5 CHANGED

Score Calculation

Mechanical · Uses AI alignment_pct

If L2 exists → use L2 granularity
If no L2 → fallback to L1
NEW formula: Score = Σ(revenue_pct × AI alignment_pct / 100)
OLD formula: Score = Σ(revenue_pct × fixed layer_factor)
Score range: 0~100
MSCI comparison (human_score) for gap analysis

Output Table

thematic_scores (ai_score + human_score + gap)

🟢 Mechanical (post AI alignment)

Scoring Formula — V3 AI Alignment

Thematic_Score(company, theme) =
  Σsegment ( revenue_pct × AI_alignment_pct / 100 )

        Where: alignment_pct = AI judgment per segment (0-100), not fixed factors

        Example: NVDA — "Data Center" 89.7% × 95/100 + "Auto" 1.1% × 85/100 + "Gaming" 7.4% × 0/100 = 86.2

        Score 0 = zero connection, Score 100 = pure-play

V2 → V3 Change

V2 (old): Fixed factors per layer — CORE=1.0, ENABLER=0.5, ADJACENT=0.25. Same factor for all segments in a layer.
V3 (new): AI scores each segment individually with alignment_pct (0-100%). A CORE segment might get 95% or 70% depending on actual relevance. Financial notes (PP&E, SBC, Intangibles) provide additional context for AI judgment.

Financial Evidence Integration (Step 2.1)

Financial notes data provides supporting evidence for thematic alignment. Extracted from annual report and displayed as insight cards on the company detail page.

PP&E — Capital Investment

ASC 360

Land, buildings, equipment, CIP
Gross & net amounts
Auto-insight: manufacturing vs fabless

UI Dimension

Capital Commitment KPI card

SBC — Talent Allocation

ASC 718

R&D vs Sales vs G&A breakdown
% allocation per department
Auto-insight: innovation vs commercial culture

UI Dimension

R&D Intensity KPI card

Intangibles — IP Moat

ASC 350 / IAS 38

Patents, technology, goodwill
Customer relationships, licenses
Auto-insight: organic IP vs acquisition growth

UI Dimension

IP Portfolio KPI card

Database Schema

annual_report_extractions

security, fiscal_year, report_url,
segment_classification, total_revenue,
extraction_model, extracted_at

extracted_segments_v2

extraction_id → FK,
segment_name, revenue_pct,
revenue_amount, description,
business_type, ai_layer, alignment_pct

extracted_sub_segments

extraction_id → FK, parent_segment,
sub_segment_name, revenue_pct,
breakdown_type, business_type,
ai_layer, alignment_pct

thematic_scores

security, run_id → FK,
ai_score, human_score,
scoring_granularity (L1/L2),
scoring_rationale, status

Primary Data Sources — Weekly Prediction Pipeline

All data used in the weekly signal pipeline is pulled from the DEV database or collected by automated ETL scripts. Bloomberg data is used only for the backtest comparison UI, not in signal generation.

Table / Source	Content	Used In	Update Frequency
report_extractions	Structured extraction of research report PDFs: bank name, EPS estimates, target prices, ratings, key thesis, catalysts, risks	Street consensus anchors, report_summary section, analyst claim ledger	Triggered when new PDF added
weekly_news_raw	Raw news articles collected by news_collector ETL. Company + sector news. Fields: headline, body, source, published_at, ticker	News filter (GPT-4.1) → news_summary injected in Section 6	Weekly (Monday cron, Step C)
stock_prices	Daily OHLCV prices for all 15 tickers + QQQ. Fields: ticker, date, open, high, low, close, volume	Price context, entry_price, week_close_price, weekly_return, 4-week trend, 52-week range	Daily (data provider feed)
earnings_calendar	Upcoming earnings dates, analyst EPS estimates, actual EPS results, EPS surprise %. Fields: ticker, report_date, eps_estimate, eps_actual	Earnings week detection, EPS surprise backfill, earnings-week discipline	Updated as earnings reported
weekly_analyst_journal	All pipeline outputs per ticker per week: signal, conviction, eps/pe deviations, expected_return, summary, week_close_price, weekly_return, prediction accuracy	Historical context (Section 1), experience rules, backtest engine	Weekly (pipeline output)
analyst_narrative_kb	Persistent company-specific knowledge: business model insights, PE ranges, management patterns, recurring analytical errors	Section 3c (narrative KB injection)	Updated after each pipeline run via memory_update
analyst_experience_rules	Corrective rules generated from past prediction errors. Fields: ticker, narrative_type, situation, error, correction, created_at	Section 2 (few-shot lessons), Section 3d (narrative experiences), Section 5 (experience summary)	Weekly (settlement job post-result)
sector_intel (ETL)	Weekly sector-level macro summary: SOX trend, demand signals, supply chain updates, cross-company signals	Available as context for Q5 (cross-company signals)	Weekly (Step B)
bloomberg_consensus	Bloomberg consensus target prices and ratings for all 15 tickers. Used ONLY for comparison — never in signal generation	BBG strategy benchmarks in backtest UI only	Weekly (separate Bloomberg feed)

Data Flow — From Source to Signal

Research Reports (PDF)

Bank research PDFs are uploaded to the system. A GPT-4.1 extraction job parses each PDF into structured JSON: analyst name, bank, date, EPS estimates (FY1/FY2/FY3), target price, rating, key thesis, catalysts, and risks. Each report is stored in report_extractions. The pipeline uses the past 90 days of reports per ticker, deduplicated to latest per bank.

News (Company + Sector)

Company news is collected weekly for all 15 tickers. A two-stage process: (1) bulk collection of all relevant news from the past week, (2) GPT-4.1 relevance filter that scores each article for relevance to the ticker's investment thesis. Only materially relevant articles pass through to the weekly prompt.

Price Data

Daily stock prices from a market data provider. Used for: week open/close (entry/exit prices), 4-week price trend, 52-week position (range context), QQQ benchmark, and historical covariance estimation for the MVO strategy. Backfill jobs automatically detect and fill gaps when data is delayed.

Earnings Calendar

Earnings dates are pre-loaded and updated when actuals are reported. The pipeline checks earnings_calendar to determine if the upcoming week is an earnings week. If so, the AI's discipline rules apply: higher uncertainty → default toward 0/0 deviations unless there is very strong conviction from the claim ledger.

What Bloomberg Data Is NOT Used For

Bloomberg Is Comparison-Only

Bloomberg consensus target prices and ratings are never injected into the weekly prediction prompt. They are available in the bloomberg_consensus table and are LEFT JOIN'd into the backtest API purely for the purpose of showing AI performance vs street consensus in the UI.

Street consensus in the prompt (Section 3) = average of internal PDF-extracted bank reports, not Bloomberg. This means the AI's valuation anchors are derived from the same research reports that investors read, not from a black-box Bloomberg aggregate.