[{"content":"One of the core tasks in quantitative investing is mining alpha factors — finding signals that predict asset returns. The traditional approach relies on researchers manually constructing factor expressions, or using automated search methods like Genetic Programming (GP) to brute-force combinations in the operator space. The former depends on human experience and intuition — low efficiency but high interpretability. The latter is efficient but produces deeply nested operator expressions that are nearly impossible for researchers to interpret.\nAlphaGPT (paper) brings large language models into the factor mining pipeline, using an LLM as the factor \u0026ldquo;generator.\u0026rdquo; The follow-up work, AlphaGPT 2.0 (paper), further introduces a human-in-the-loop closed cycle.\nThe Problem with Traditional Factor Mining A quantitative factor is essentially a mathematical expression. It takes market data as input — prices, volumes, financial metrics — and outputs a numerical score for ranking stocks or constructing portfolios. Classic examples: momentum factor is just returns over the past N days, value factor might be the inverse of P/E ratio.\nSimple factors have been thoroughly mined. Finding new effective factors typically requires more complex expressions — combining multiple base operators (rank, std, corr, delay, etc.). The search space grows exponentially, making manual exploration impractical.\nGenetic Programming can search this space automatically, but faces two problems. First, search efficiency is low — massive computational resources are spent evaluating meaningless expressions. Second, the output factors lack interpretability. An expression like rank(corr(delay(close, 5), volume, 10)) might backtest well, but researchers can\u0026rsquo;t articulate the economic logic behind it. Factors without economic justification carry high overfitting risk.\nAlphaGPT\u0026rsquo;s Core Design AlphaGPT replaces or augments the factor generation step in the GP pipeline with an LLM. The process starts by describing the factor mining task to the LLM in natural language — available operators, data fields, and syntax rules for factor expressions. This effectively gives the LLM a \u0026ldquo;factor DSL\u0026rdquo; specification.\nThe LLM then generates candidate factor expressions. It has seen vast amounts of financial literature and quantitative code during pretraining, giving it some intuition about what factors might work. Its output tends to be more structured than random search, and more interpretable.\nGenerated factors are backtested to compute metrics like IC (Information Coefficient — correlation between factor values and future returns) and IR (Information Ratio). The backtest results are fed back to the LLM, allowing it to adjust direction in the next generation round — creating a generate-evaluate-feedback iterative loop.\nThe key insight: LLM-generated factors carry semantic meaning. They\u0026rsquo;re not blind operator combinations, but expressions constructed from an understanding of financial concepts. For example, the LLM might generate a \u0026ldquo;volume-weighted price deviation\u0026rdquo; factor — something with an inherently interpretable economic rationale.\nAlphaGPT 2.0: Human-AI Collaboration AlphaGPT 2.0 introduces human researchers into the loop, creating a Human-in-the-Loop closed cycle:\n+-------------------+ | Research Ideas | \u0026lt;-- Human researcher provides direction +-------------------+ | v +-------------------+ | LLM Generates | \u0026lt;-- LLM generates candidate factors | Alpha Factors | +-------------------+ | v +-------------------+ | Backtest \u0026amp; | \u0026lt;-- Automated backtest evaluation | Evaluation | +-------------------+ | v +-------------------+ | Human Review \u0026amp; | \u0026lt;-- Researcher reviews, filters, | Feedback | provides new direction +-------------------+ | +-----------\u0026gt; Next iteration Researchers can intervene at several points: providing initial research directions (e.g., \u0026ldquo;explore the relationship between volume anomalies and short-term reversal\u0026rdquo;), selecting factors with plausible economic logic from candidates, and offering qualitative judgments on backtest results (e.g., \u0026ldquo;this factor\u0026rsquo;s outperformance in small caps might be a liquidity illusion\u0026rdquo;).\nThe value of this design lies in combining strengths from both sides: the LLM\u0026rsquo;s search efficiency and breadth, plus human researchers\u0026rsquo; domain knowledge and judgment. Pure LLM approaches tend to produce many \u0026ldquo;statistically significant but economically meaningless\u0026rdquo; factors. Pure manual research is too slow. Human-AI collaboration strikes a balance.\nResults and Limitations Based on the paper\u0026rsquo;s experimental results, AlphaGPT produces factors with higher average IC and IR than traditional GP methods on the Chinese A-share market, with better interpretability.\nBut this direction has clear limitations.\nThe LLM\u0026rsquo;s financial knowledge comes from pretraining data, which is inherently lagging. Markets evolve dynamically — factor logic that worked last year may have decayed this year. LLMs can\u0026rsquo;t perceive changes in market microstructure the way human researchers can.\nThe search space definition — available operators, data fields — still requires human design. The LLM only searches within a given space; it won\u0026rsquo;t invent new operators or discover new data sources. True alpha innovation often comes from finding data nobody else has looked at, not from finding more complex combinations of the same data.\nAdditionally, LLM-generated factors may have high mutual correlation. Without proper deduplication and orthogonalization, combining these factors won\u0026rsquo;t provide additional information gain.\nImplications for Quantitative Research The direction AlphaGPT represents is essentially using LLMs as a quantitative researcher\u0026rsquo;s copilot. It won\u0026rsquo;t replace researchers, but can significantly accelerate hypothesis generation and preliminary validation. Researchers can focus on higher-value work: assessing economic logic, designing portfolio construction schemes, monitoring factor decay.\nFrom a broader perspective, LLM applications in quantitative finance extend beyond factor mining. Sentiment analysis, event-driven signal extraction, automated research report summarization, code generation for backtesting — all of these directions are seeing active exploration. AlphaGPT\u0026rsquo;s contribution is providing a relatively complete framework for LLM-assisted factor mining, establishing a reference baseline for future work.\nThat said, alpha is a zero-sum game. When everyone is using LLMs to mine factors, how long these factors remain effective is a question worth watching.\n","permalink":"https://coriva.eu.org/en/alphagpt-paper-review/","summary":"\u003cp\u003eOne of the core tasks in quantitative investing is mining alpha factors — finding signals that predict asset returns. The traditional approach relies on researchers manually constructing factor expressions, or using automated search methods like Genetic Programming (GP) to brute-force combinations in the operator space. The former depends on human experience and intuition — low efficiency but high interpretability. The latter is efficient but produces deeply nested operator expressions that are nearly impossible for researchers to interpret.\u003c/p\u003e\n\u003cp\u003eAlphaGPT (\u003ca href=\"https://arxiv.org/abs/2308.00016\"\u003epaper\u003c/a\u003e) brings large language models into the factor mining pipeline, using an LLM as the factor \u0026ldquo;generator.\u0026rdquo; The follow-up work, AlphaGPT 2.0 (\u003ca href=\"https://arxiv.org/abs/2402.09746\"\u003epaper\u003c/a\u003e), further introduces a human-in-the-loop closed cycle.\u003c/p\u003e","title":"AlphaGPT: Mining Quantitative Factors with LLMs"},{"content":"The worst part about quantitative trading isn\u0026rsquo;t having a bad strategy. It\u0026rsquo;s not knowing whether your strategy is good or bad. A strategy with 30% annualized returns sounds great, until you realize the max drawdown was 60% — you\u0026rsquo;d never have held through it. A Sharpe ratio of 2.0 looks impressive, but if it\u0026rsquo;s propped up by a few windfall trades in extreme market conditions, the Sortino ratio will tell a very different story.\nMetrics aren\u0026rsquo;t decorations for backtesting reports. They\u0026rsquo;re the tools that help you decide whether a strategy is worth putting real money behind.\nReturn Metrics Return metrics answer the fundamental question: how much money does this strategy make?\nAnnualized return is the most basic metric. It converts returns from any time period into a yearly figure for easy comparison. The calculation uses compound returns:\n$$R_{annual} = (1 + R_{total})^{252/n} - 1$$\nHere $n$ is the number of trading days, and 252 is the standard number of trading days per year. Using compound returns rather than simple division matters — simple division overstates returns for longer periods.\nAlpha measures the portion of returns that beats the benchmark. More precisely, alpha is the excess return after stripping out market risk exposure (Beta):\n$$\\alpha = R_p - [R_f + \\beta \\times (R_m - R_f)]$$\n$R_p$ is the strategy return, $R_f$ is the risk-free rate, $R_m$ is the market return. Positive alpha means the strategy generates genuine edge, not just riding the market.\nBeta describes how sensitive the strategy is to market movements. Beta = 1 means the strategy moves in lockstep with the market. Beta = 0.5 means it captures half the market\u0026rsquo;s movement. Market-neutral strategies aim for beta near zero, while long-only strategies typically run 0.8-1.2. Beta isn\u0026rsquo;t inherently good or bad — you just need to know what risk you\u0026rsquo;re taking.\nRisk Metrics Return metrics tell you how much you earned. Risk metrics tell you what you endured to earn it.\nVolatility is the most common risk measure, typically the annualized standard deviation of daily returns:\n$$\\sigma_{annual} = \\sigma_{daily} \\times \\sqrt{252}$$\nHigh volatility isn\u0026rsquo;t necessarily bad — it depends on your strategy type. But for most investors, annualized volatility above 25% means the account will regularly show drawdowns that are hard to stomach.\nMaximum drawdown is the largest peak-to-trough decline. It\u0026rsquo;s the most intuitive \u0026ldquo;pain metric,\u0026rdquo; answering: what\u0026rsquo;s the worst it gets?\n$$MDD = \\max_{t} \\left(\\frac{Peak_t - Trough_t}{Peak_t}\\right)$$\nIn practice, max drawdown under 20% is manageable, 20-40% is high risk, and above 40%, almost nobody runs the strategy live. Not because the strategy is broken, but because human psychology can\u0026rsquo;t handle it — deep drawdowns lead to panic exits at the worst possible time.\nVaR (Value at Risk) answers: at a given confidence level, what\u0026rsquo;s the most I can lose? For example, \u0026ldquo;95% VaR = -2%\u0026rdquo; means there\u0026rsquo;s a 95% probability that the daily loss won\u0026rsquo;t exceed 2%. The catch is VaR doesn\u0026rsquo;t tell you what happens in that other 5%, which is why it\u0026rsquo;s often paired with CVaR (Conditional VaR, also called Expected Shortfall) — the average loss in scenarios that exceed the VaR threshold.\nRisk-Adjusted Return Metrics Looking at returns or risk alone isn\u0026rsquo;t enough. Risk-adjusted metrics combine both, answering: how much return per unit of risk?\nSharpe ratio is the classic:\n$$Sharpe = \\frac{R_p - R_f}{\\sigma_p}$$\nNumerator is excess return, denominator is volatility. Sharpe above 1.0 is decent, above 2.0 is excellent, above 3.0 is either genius or overfitting. The problem with Sharpe is that it penalizes upside and downside volatility equally — but for investors, upward volatility isn\u0026rsquo;t risk.\nSortino ratio fixes this by using only downside volatility:\n$$Sortino = \\frac{R_p - R_f}{\\sigma_{downside}}$$\nIt only penalizes volatility in the loss direction. If a strategy has right-skewed returns (occasional big wins), Sortino will be significantly higher than Sharpe. That\u0026rsquo;s a good sign — it means the volatility is coming from the profitable side.\nCalmar ratio uses maximum drawdown as the denominator:\n$$Calmar = \\frac{R_{annual}}{|MDD|}$$\nCalmar directly relates to the worst pain you\u0026rsquo;ll experience. A Calmar above 1 means annual returns exceed the max drawdown, which makes the strategy psychologically easier to hold. This metric is particularly valuable for medium to long-term strategy evaluation.\nInformation ratio is similar to Sharpe, but benchmarked against an index rather than the risk-free rate:\n$$IR = \\frac{R_p - R_{benchmark}}{\\sigma_{tracking}}$$\nThe denominator is tracking error — the volatility of the difference between strategy and benchmark returns. A high information ratio means the strategy is consistently beating the benchmark, not just getting lucky on a few days. Fund managers are typically evaluated on this metric.\nTrading Metrics The metrics above focus on outcomes. Trading metrics focus on the process.\nWin rate is the proportion of profitable trades. Sounds simple, but high win rate doesn\u0026rsquo;t guarantee profits. A strategy with 90% win rate but where each loss is 10x the average win will lose money overall. Win rate must always be examined alongside the profit/loss ratio.\nProfit/loss ratio is the average win divided by the average loss:\n$$ProfitLossRatio = \\frac{AvgWin}{|AvgLoss|}$$\nTrend-following strategies typically have low win rates (30-40%) but high profit/loss ratios (3:1 or higher), relying on a few big wins to cover many small losses. Mean-reversion strategies work the opposite way — high win rate, low profit/loss ratio. Both approaches can be profitable. The trap is trying to optimize for both high win rate and high profit/loss ratio at the same time — that\u0026rsquo;s almost always overfitting.\nThe relationship between win rate and profit/loss ratio has a simple breakeven formula:\n$$WinRate_{breakeven} = \\frac{1}{1 + ProfitLossRatio}$$\nA strategy with 2:1 profit/loss ratio only needs a win rate above 33.3% to be profitable. At 1:1, you need above 50%. This formula gives you a quick sanity check on whether a set of trading statistics makes sense.\nTurnover measures trading frequency, typically defined as total traded value divided by average portfolio value over a period. High turnover means high transaction costs and greater slippage impact. Many strategies that look great in backtests see their returns collapse once realistic commissions and slippage are factored in. Always run cost sensitivity analysis before going live.\nPutting It All Together No single metric tells the full story. In practice, strategy evaluation uses them in combination.\nSharpe and Sortino reveal risk-adjusted performance. If Sharpe is decent but Sortino is significantly higher, the volatility is mostly on the upside — a good sign.\nMaximum drawdown and Calmar expose tail risk. A strategy with Sharpe 2.0 but 50% max drawdown? You probably can\u0026rsquo;t hold it.\nWin rate and profit/loss ratio together reveal the profit model. This determines your psychological state during losing streaks. Ten consecutive losses is normal for a 35% win-rate trend-following strategy. For an 80% win-rate mean-reversion strategy, it might signal the strategy is broken.\nAlpha and Beta clarify where returns come from. If Alpha is near zero and Beta near 1, your strategy is essentially just going long on the market. You\u0026rsquo;d be better off buying an index fund and saving yourself the effort and transaction costs.\nNo single metric can definitively judge a strategy. But when all the metrics point to the same conclusion, that conclusion is probably right.\n","permalink":"https://coriva.eu.org/en/quant-metrics-guide/","summary":"\u003cp\u003eThe worst part about quantitative trading isn\u0026rsquo;t having a bad strategy. It\u0026rsquo;s not knowing whether your strategy is good or bad. A strategy with 30% annualized returns sounds great, until you realize the max drawdown was 60% — you\u0026rsquo;d never have held through it. A Sharpe ratio of 2.0 looks impressive, but if it\u0026rsquo;s propped up by a few windfall trades in extreme market conditions, the Sortino ratio will tell a very different story.\u003c/p\u003e\n\u003cp\u003eMetrics aren\u0026rsquo;t decorations for backtesting reports. They\u0026rsquo;re the tools that help you decide whether a strategy is worth putting real money behind.\u003c/p\u003e","title":"A Complete Guide to Quantitative Trading Metrics"}]