AUTHOR: Tony Mudau

Market Agent — Concept (Sources & Scoring Logic)

This note explains where data comes from, how articles are scored and filtered, and how those scores combine into a single view per currency pair. It does not reference source files or implementation details.

Purpose

The market agent answers: for this FX (or XAU) symbol, what does the current headline set imply in direction and confidence? It combines web news APIs, rule-based scoring, and optional language models into one structured response. It does not execute trades.

External sources

Source	What it provides	Role in the pipeline
NewsAPI (`everything` search)	English articles matching a built query: the normalized pair, labels for the base and quote economies, plus shared macro terms (CPI, inflation, central banks, rates, oil, gold, etc.).	Breadth: macro and regional stories that name economies or themes; typically fewer FX-ticker style headlines.
Investing.com RSS (global top-news style feed)	Titles, links, short blurbs, publication times; dollar, risk, and commodity stories appear often.	Steady stream of market-facing lines; some items are enriched with more body text when enabled.
OpenAI (optional)	Two uses: (1) short labels on a batch of Investing headlines relative to the traded pair; (2) a single macro JSON (direction for the pair, confidence, summary, drivers, risks).	Only when a key is configured; otherwise the stack falls back to rules.

Manual operator notes (if your deployment records them) are merged into the narrative; they are not from NewsAPI or RSS.

Processing order (logic only)

Normalize the symbol so base and quote currencies are unambiguous (broker suffixes stripped).
Cache: if a complete analysis for this pair was already stored within the last hour, return that snapshot and stop (avoids re-hitting APIs and models on every run).
Ingest NewsAPI and RSS into one pool; normalize times to ISO and a human pubDate for models.
Dedupe by near-duplicate titles so the same story is not double-counted.
Micro-sentiment (LLM, optional): for the top Investing rows, assign bullish / bearish / neutral for this pair (long base vs quote).
Relevance filter: each article gets a relevance score; only articles at or above a minimum enter the set used for heuristics and for the big macro prompt.
Heuristic aggregate: from that relevant set, compute a numeric stack, a discrete bias (bullish / bearish / neutral), and a confidence in that bias.
Macro analyst (LLM, optional): one JSON with pair-level buy / sell / neutral, text, drivers, and risks; feeds narrative and a parallel “market” confidence.
Themes and decay: classify themes (rates, inflation, etc.), apply age decay so old headlines matter less in macro fields.
Persist a bounded snapshot and refresh the hourly cache for the pair.

If the feed is empty, a last-known-good snapshot may be returned so the rest of the system is not left with a void.

How “relevance” is scored (per article, before the main news score)

Relevance answers: is this article worth treating as about this market? It is not the same as bullish/bearish; it is “does this apply to the symbol’s story?”

Currency language: a dictionary maps each standard currency (USD, GBP, EUR, JPY, etc., plus XAU, ZAR, …) to aliases in text—e.g. dollar, greenback, pound, sterling—and adds a score when those appear. That catches headlines that do not use wire codes.
Cross-asset hooks: short lists link broad themes to legs—e.g. Treasury / yield language bumps relevance when the pair includes USD; risk-off / safe-haven language bumps JPY and CHF legs; oil / OPEC style language bumps CAD as the commodity-linked leg, and so on.
Symbol and geography: the exact pair string, full country/region names used in the query, and a shared set of macro keyword terms add further weight; obvious non-macro noise (e.g. entertainment) is penalized.
Threshold: only articles with a total relevance at or above a cut-off are kept for scoring and for the main LLM context. The rest are still in the download counts but are not used for directional stacking.

So: relevance = “about this world”; the next section is “lean which way for this pair.”

Impact tiers (importance, not direction)

Each article (often after seeing the headline+blurb) is given an impact label: high, medium, or low—driven by rules (central bank, inflation, key data, broad macro vs. softer stories). That label is used in two ways:

For the heuristic news score, impact maps to fixed weights (e.g. high = strongest weight, low = weakest) when summing directional contributions.
For macro theme / decay fields, the same idea is combined with time since publish so stale items fade.

Direction does not come from the impact label alone; it comes from sentiment (below).

How direction is assigned per article (before aggregation)

The pipeline needs one effective label per article: bullish, bearish, or neutral for the traded pair (long the pair = long base, short quote).

USD-centric headlines: if the text is clearly about the US dollar moving up or down (typical “dollar firms / dollar slips” language), the system first classifies a dollar up / dollar down story, then maps that to the pair:
- If USD is the quote (e.g. EUR/USD, GBP/USD, XAU/USD), a stronger dollar is generally worse for a long in that pair, and vice versa.
- If USD is the base (e.g. USD/JPY), the mapping flips to match how the pair is quoted.
If that USD shortcut does not apply, the model uses, in order: the per-headline LLM label when present, else a small lexical check (positive vs negative word lists) to guess neutral vs tilt.

Those per-article bullish / bearish / neutral labels are the inputs to the weighted sum, not the raw RSS title alone.

Heuristic news score, news bias, and news confidence (aggregate)

From the relevant articles only:

For each article, take its impact weight and multiply by +1 (bullish for the pair), -1 (bearish), or 0 (neutral).
Sum those contributions into a single heuristic number (can be positive, negative, or near zero).
News bias:
- If the sum is close to zero, the bias is neutral and news confidence (for that directional read) is treated as zero so downstream logic can ignore direction.
- If the sum is clearly positive or negative beyond a small band, the bias is bullish or bearish for the pair, and confidence rises with the strength of the sum and how many lines contributed—capped in ([0,1]).

This triplet (number + bias + confidence) is what orchestration is designed to use as a macro filter (boost vs conflict vs ignore), separate from a single “headline said buy” rule.

Macro LLM output (narrative layer)

A second model call (when enabled) takes a compact list of the same relevant items (titles, times, source, and the micro-sentiment labels) and returns strict JSON: whether to lean buy / sell / hold the pair in words, a confidence, a short analysis, and lists of drivers and risks. The model is instructed to respect time fields for recency. If the model omits list fields, the implementation may backfill driver lines from headlines so the response is not empty.

Macro bias / macro confidence and active themes are derived in parallel from the same article pool and decay rules; they give a second, theme-oriented read alongside the heuristic triplet.

Caching behavior (conceptual)

Hourly, per pair: a full response for a symbol is stored and reused for roughly one hour so run cycles do not re-query APIs and models every time.
A shorter in-process memory can avoid re-reading the store within the same process window.

What the response represents (summary)

Counts show how many articles were pulled vs how many passed relevance.
Heuristic fields express a data-driven, pair-aware lean from many headlines, not a single story.
LLM fields add a human-readable story and a parallel pair direction; they supplement the heuristic, especially when rules are thin.
Operator-only note lists stay empty until someone injects them through your product’s manual path—they are not populated by the news APIs above.

How this is meant to be used downstream

The heuristic bias and confidence are the primary levers to align or disagree with a technical trade idea: nudge confidence when they agree, pull it down (or block in strong setups) when they fight. The agent’s job is context and filtering, not signal generation on its own.