Mispricing Detection in the S&P 500 ETF Options Market Based on an XGBoost Model
DOI:
https://doi.org/10.54097/bm3g3b60Keywords:
Option pricing, mispricing identification, XGBoost, machine learningAbstract
Based on high-frequency data for S&P 500 index options (SPX) obtained from the WRDS database, this paper builds an empirical framework for identifying option pricing mismatches. First, taking the Black–Scholes model as the theoretical pricing benchmark and combining it with actual market quotes, we introduce a relative mispricing measure and adopt a quantile standardization strategy to construct mispricing labels, which are divided into three categories: overpriced, underpriced, and fairly priced. Then, we construct a mispricing identification model using an XGBoost multiclass classifier, trained and predicted with multidimensional features including contract attributes, trading indicators, and volatility. In data preprocessing, we clean both option and underlying information and strictly align them using a dual primary key of trading date and contract identifier to ensure the accuracy and consistency of label construction. The empirical results show that the model achieves good identification performance in the out-of-sample test period (2023), with an overall accuracy of 77.6%. The precision and recall for the underpriced category are particularly strong, indicating that the model can effectively identify pricing deviations with potential arbitrage value. At the same time, the results reflect the existence of certain structural inefficiency in the options market. Although the Black–Scholes framework relies on idealized assumptions, it still has important reference value in mispricing identification. This study not only verifies the feasibility of using machine learning to identify option mispricing, but also provides a methodological foundation for subsequent quantitative trading strategy design.
Downloads
References
[1] George M. Constantinides, Jens Carsten Jackwerth, and Stylianos Perrakis. (2009) Mispricing of S&P 500 index options. The Review of Financial Studies 22.3: 1247-1277.
[2] Gu, Shihao, Bryan Kelly, and Dacheng Xiu. (2020) Empirical asset pricing via machine learning. The Review of Financial Studies 33.5: 2223-2273.
[3] Chen, T. (2016) XGBoost: A scalable tree boosting system. Cornell University: 785–794.
[4] French, Dan W., and Linda J. Martin. (1988) The measurement of option mispricing. Journal of Banking & Finance 12.4: 537-550.0.
[5] Black, Fischer, and Myron Scholes. (1973) The pricing of options and corporate liabilities. Journal of political economy 81.3: 637-654.
[6] Merton, Robert C. (1976) Option pricing when underlying stock returns are discontinuous. Journal of financial economics 3.1-2: 125-144.
[7] Bollerslev, Tim. (1986) Generalized autoregressive conditional heteroskedasticity. Journal of econometrics 31.3: 307-327.
[8] Bakshi G., Panayotov G., (2008). A framework for studying option mispricing: a general test and empirical evidence. In: AFA 2009 San Francisco Meetings. San Francisco.
[9] Dumas, Bernard, Jeff Fleming, and Robert E. Whaley. (1998) Implied volatility functions: Empirical tests. The Journal of Finance 53.6: 2059-2106.
[10] Bakshi, Gurdip, Charles Cao, and Zhiwu Chen. (1997) Empirical performance of alternative option pricing models. The Journal of finance 52.5: 2003-2049.
[11] Kelly, Bryan, and Seth Pruitt. (2013) Market expectations in the cross‐section of present values. The Journal of Finance 68.5: 1721-1756.
[12] Yan, S. (2011). Jump risk, stock returns, and slope of implied volatility smile. Journal of Financial Economics, 99(1), 216-233.
[13] Gençay, Ramazan, and Aslihan Salih. (2003) Degree of mispricing with the Black-Scholes model and nonparametric cures. Annals of Economics and Finance 4: 73-102.
[14] Tashman, Leonard J. (2000) Out-of-sample tests of forecasting accuracy: an analysis and review. International journal of forecasting 16.4: 437-450.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Journal of Finance and Investment

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







