Machine Learning-Based Estimation of Oil Recovery Factor Using XGBoost: Insights from Classification and Data-Driven Analyses
DOI:
https://doi.org/10.69631/ipj.v2i3nr53Keywords:
Classification, Machine learning, Oil recovery factor, Extreme gradient boostAbstract
In petroleum engineering, it is essential to determine the ultimate recovery factor (RF) particularly before exploitation and exploration. However, accurately estimating requires data that may not be necessarily available or measured at early stages of reservoir development. To rectify this, we applied machine learning (ML) to estimate oil RF from readily available features. To construct the ML models, we applied the XGBoost classification algorithm. Classification was chosen over regression because recovery factor is bounded from 0 to 1, much like probability. Three databases with various reservoir properties and recovery factors were used, leaving us with four different combinations to first train and test the ML models and then further evaluate them using an independent database including unseen data. Cross-validation with ten folds was applied on the training datasets to assess the effectiveness of the models. To evaluate the accuracy and reliability of the models, the accuracy, within-1 accuracy, precision, recall, macro-averaged f1 score and R2 were determined. Overall, results showed that the XGBoost classification algorithm could estimate the RF class with accuracies as high as 0.77 in the training datasets, 0.36 in the testing datasets and 0.24 in the independent databases used. We found that the reliability of the XGBoost classification model depended on the data in the training dataset, indicating that the ML models were database dependent. The feature importance analysis and the Shapley Additive exPlanations (SHAP) approach showed that the most important features were reserves, reservoir area and thickness.
Downloads
References
1. Agbadze, O. K., Qiang, C., & Jiaren, Y. (2022). Acoustic impedance and lithology-based reservoir porosity analysis using predictive machine learning algorithms. Journal of Petroleum Science and Engineering, 208, 109656. https://doi.org/10.1016/j.petrol.2021.109656
2. Ahmadi, M. A., & Chen, Z. (2019). Comparison of machine learning methods for estimating permeability and porosity of oil reservoirs via petro-physical logs. Petroleum, 5(3), 271–284. https://doi.org/10.1016/j.petlm.2018.06.002
3. Ahmadisharaf, A., Nematirad, R., Sabouri, S., Pachepsky, Y., & Ghanbarian, B. (2024). Representative sample size for estimating saturated hydraulic conductivity via machine learning: A proof‐of‐concept study. Water Resources Research, 60(8), e2023WR036783. https://doi.org/10.1029/2023WR036783
4. Aliyuda, K., & Howell, J. (2019). Machine-learning algorithm for estimating oil-recovery factor using a combination of engineering and stratigraphic dependent parameters. Interpretation, 7(3), SE151–SE159. https://doi.org/10.1190/INT-2018-0211.1
5. Aliyuda, K., Howell, J., & Humphrey, E. (2020). Impact of geological variables in controlling oil-reservoir performance: An insight from a machine-learning technique. SPE Reservoir Evaluation & Engineering, 23(04), 1314–1327. https://doi.org/10.2118/201196-PA
6. Alpak, F. O., Araya–Polo, M., & Onyeagoro, K. (2019). Simplified dynamic modeling of faulted turbidite reservoirs: A deep-learning approach to recovery-factor forecasting for exploration. SPE Reservoir Evaluation & Engineering, 22(04), 1240–1255. https://doi.org/10.2118/197053-PA
7. Anifowose, F. A., Ewenla, A. O., & Eludiora, S. I. (2011). Prediction of oil and gas reservoir properties using support vector machines. International Petroleum Technology Conference, IPTC-14514-MS. https://doi.org/10.2523/IPTC-14514-MS
8. Burgess, G. L., Cross, K. K., & Kazanis, E. G. (2019, December 31). Outer Continental Shelf Estimated Oil and Gas Reserves Gulf of Mexico OCS Region. U.S. Department of the Interior Bureau of Ocean Energy Management Gulf of Mexico OCS Region. https://www.boem.gov/sites/default/files/documents/renewable-energy/state-activities/2019-EOGR.pdf
9. Carpenter, C. (2021). Machine-learning work flow identifies brittle, fracable, producible rock using drilling data. Journal of Petroleum Technology, 73(10), 61–62. https://doi.org/10.2118/1021-0061-JPT
10. Chen, H., Chen, H., Liu, Z., Sun, X., & Zhou, R. (2020). Analysis of factors affecting the severity of automated vehicle crashes using XGBoost model combining POI data. Journal of Advanced Transportation, 2020, 1–12. https://doi.org/10.1155/2020/8881545
11. Chen, L., Gao, X., & Li, X. (2021). Using the motor power and XGBoost to diagnose working states of a sucker rod pump. Journal of Petroleum Science and Engineering, 199, 108329. https://doi.org/10.1016/j.petrol.2020.108329
12. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
13. Chen, Z., Yu, W., Liang, J.-T., Wang, S., & Liang, H. (2022). Application of statistical machine learning clustering algorithms to improve EUR predictions using decline curve analysis in shale-gas reservoirs. Journal of Petroleum Science and Engineering, 208, 109216. https://doi.org/10.1016/j.petrol.2021.109216
14. Cronshaw, M. (2021). Energy in Perspective (p. 222). Springer International Publishing. https://doi.org/10.1007/978-3-030-63541-1
15. Dawson, H. L., Dubrule, O., & John, C. M. (2023). Impact of dataset size and convolutional neural network architecture on transfer learning for carbonate rock classification. Computers & Geosciences, 171, 105284. https://doi.org/10.1016/j.cageo.2022.105284
16. Denney, D. (2007). Reserves and resources classification, definitions, and guidelines: Defining the standard! Journal of Petroleum Technology, 59(12), 63–67. https://doi.org/10.2118/1207-0063-JPT
17. Desorcy, G. J., Warne, G. A., Ashton, B. R., Campbell, G. R., Collyer, D. R., et al. (1993). Definitions and guidelines for classification of oil and gas reserves. Journal of Canadian Petroleum Technology, 32(05). https://doi.org/10.2118/93-05-01
18. Dias, L. O., Bom, C. R., Faria, E. L., Valentín, M. B., Correia, M. D., et al. (2020). Automatic detection of fractures and breakouts patterns in acoustic borehole image logs using fast-region convolutional neural networks. Journal of Petroleum Science and Engineering, 191, 107099. https://doi.org/10.1016/j.petrol.2020.107099
19. Dong, Y., Qiu, L., Lu, C., Song, L., Ding, Z., Yu, Y., & Chen, G. (2022). A data-driven model for predicting initial productivity of offshore directional well based on the physical constrained eXtreme gradient boosting (XGBoost) trees. Journal of Petroleum Science and Engineering, 211, 110176. https://doi.org/10.1016/j.petrol.2022.110176
20. Esfandi, T., Sadeghnejad, S., & Jafari, A. (2024). Effect of reservoir heterogeneity on well placement prediction in CO2-EOR projects using machine learning surrogate models: Benchmarking of boosting-based algorithms. Geoenergy Science and Engineering, 233, 212564. https://doi.org/10.1016/j.geoen.2023.212564
21. Fang, K., Kifer, D., Lawson, K., Feng, D., & Shen, C. (2022). The data synergy effects of time‐series deep learning models in hydrology. Water Resources Research, 58(4), e2021WR029583. https://doi.org/10.1029/2021WR029583
22. Fetkovich, M. J., Fetkovich, E. J., & Fetkovich, M. D. (1996). Useful concepts for decline-curve forecasting, reserve estimation, and analysis. SPE Reservoir Engineering, 11(01), 13–22. https://doi.org/10.2118/28628-PA
23. Gu, Y., Zhang, D., Lin, Y., Ruan, J., & Bao, Z. (2021). Data-driven lithology prediction for tight sandstone reservoirs based on new ensemble learning of conventional logs: A demonstration of a Yanchang member, Ordos Basin. Journal of Petroleum Science and Engineering, 207, 109292. https://doi.org/10.1016/j.petrol.2021.109292
24. Gupta, S., Saputelli, L. A., Verde, A., Vivas, J. A., & Narahara, G. M. (2016). Application of an advanced data analytics methodology to predict hydrocarbon recovery factor variance between early phases of appraisal and post-sanction in gulf of mexico deep offshore assets. Offshore Technology Conference, D041S056R005. https://doi.org/10.4043/27127-MS
25. Han, B., & Bian, X. (2018). A hybrid PSO-SVM-based model for determination of oil recovery factor in the low-permeability reservoir. Petroleum, 4(1), 43–49. https://doi.org/10.1016/j.petlm.2017.06.001
26. Hartmann, D. J., & Beaumont, E. A. (1999). Predicting reservoir system quality and performance. In E. A. Beaumont & N. H. Foster, Exploring for Oil and Gas Traps. American Association of Petroleum Geologists. https://doi.org/10.1306/TrHbk624C9
27. He, M., Gu, H., & Xue, J. (2022). Log interpretation for lithofacies classification with a robust learning model using stacked generalization. Journal of Petroleum Science and Engineering, 214, 110541. https://doi.org/10.1016/j.petrol.2022.110541
28. Helmy, T., & Fatai, A. (2010). Hybrid computational intelligence models for porosity and permeability prediction of petroleum reservoirs. International Journal of Computational Intelligence and Applications, 09(04), 313–337. https://doi.org/10.1142/S1469026810002902
29. Holdaway, K. R. (Ed.). (2014). Harness Oil and Gas Big Data with Analytics (1st ed.). Wiley. https://doi.org/10.1002/9781118910948
30. Kaczmarczyk, R., Herbas, J., & Del Castillo, J. (2013). Approximations of primary, secondary and tertiary recovery factor in viscous and heavy oil reservoirs. SPE Offshore Europe Oil and Gas Conference and Exhibition, SPE-166583-MS. https://doi.org/10.2118/166583-MS
31. Kalam, S., Yousuf, U., Abu-Khamsin, S. A., Waheed, U. B., & Khan, R. A. (2022). An ANN model to predict oil recovery from a 5-spot waterflood of a heterogeneous reservoir. Journal of Petroleum Science and Engineering, 210, 110012. https://doi.org/10.1016/j.petrol.2021.110012
32. Kapoor, S., & Narayanan, A. (2023). Leakage and the reproducibility crisis in machine-learning-based science. Patterns. https://doi.org/10.1016/j.patter.2023.100804
33. Karacan, C. Ö. (2020). A fuzzy logic approach for estimating recovery factors of miscible CO2-EOR projects in the United States. Journal of Petroleum Science and Engineering, 184, 106533. https://doi.org/10.1016/j.petrol.2019.106533
34. Knopp, C. R., & Ramsey, L. A. (1960). Correlation of oil formation volume factor and solution gas-oil ratio. Journal of Petroleum Technology, 12(08), 27–29. https://doi.org/10.2118/1433-G
35. Kong, B., Chen, Z., Chen, S., & Qin, T. (2021). Machine learning-assisted production data analysis in liquid-rich Duvernay Formation. Journal of Petroleum Science and Engineering, 200, 108377. https://doi.org/10.1016/j.petrol.2021.108377
36. Kotsiantis, S. B. (2013). Decision trees: A recent overview. Artificial Intelligence Review, 39(4), 261–283. https://doi.org/10.1007/s10462-011-9272-4
37. Kumar, M., Swaminathan, K., Rusli, A., & Thomas-Hy, A. (2022). Applying data analytics & machine learning methods for recovery factor prediction and uncertainty modelling. SPE Asia Pacific Oil & Gas Conference and Exhibition, D021S008R003. https://doi.org/10.2118/210769-MS
38. Lake, L., Johns, R. T., Rossen, W. R., & Pope, G. A. (2014). Fundamentals of enhanced oil recovery. Society of Petroleum Engineers. https://doi.org/10.2118/9781613993286
39. Lee, B. B., & Lake, L. W. (2015). Using data analytics to analyze reservoir databases. SPE Annual Technical Conference and Exhibition, D031S030R008. https://doi.org/10.2118/174900-MS
40. Li, S., Zhou, K., Zhao, L., Xu, Q., & Liu, J. (2022). An improved lithology identification approach based on representation enhancement by logging feature decomposition, selection and transformation. Journal of Petroleum Science and Engineering, 209, 109842. https://doi.org/10.1016/j.petrol.2021.109842
41. Lin, J., De Weck, O., & MacGowan, D. (2012). Modeling epistemic subsurface reservoir uncertainty using a reverse Wiener jump–diffusion process. Journal of Petroleum Science and Engineering, 84–85, 8–19. https://doi.org/10.1016/j.petrol.2012.01.015
42. Lin, W.-C., & Tsai, C.-F. (2020). Missing value imputation: A review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53(2), 1487–1509. https://doi.org/10.1007/s10462-019-09709-4
43. Ling, K., Wu, X., Zhang, H., & He, J. (2013). Tactics and pitfalls in production decline curve analysis. SPE Production and Operations Symposium, SPE-164503-MS. https://doi.org/10.2118/164503-MS
44. Liu, W., Liu, W. D., Gu, J., & Shen, X. (2019). Predictive model for water absorption in sublayers using a machine learning method. Journal of Petroleum Science and Engineering, 182, 106367. https://doi.org/10.1016/j.petrol.2019.106367
45. Long, R. (2016). TORIS: An Integrated Decision Support System for Petroleum E&P Policy Evaluation. USGEO, AmeriGEO. https://data.amerigeoss.org/dataset/toris-an-integrated-decision-support-system-for-petroleum-e-p-policy-evaluation
46. Lundberg, S., & Lee, S.-I. (2017). A unified approach to interpreting model predictions (No. arXiv:1705.07874). arXiv. https://doi.org/10.48550/arXiv.1705.07874
47. Mahmoud, A., Elkatatny, S., Chen, W., & Abdulraheem, A. (2019). Estimation of oil recovery factor for water drive sandy reservoirs through applications of artificial intelligence. Energies, 12(19), 3671. https://doi.org/10.3390/en12193671
48. Makhotin, I., Orlov, D., Koroteev, D., Burnaev, E., Karapetyan, A., & Antonenko, D. (2022). Machine learning for recovery factor estimation of an oil reservoir: A tool for derisking at a hydrocarbon asset evaluation. Petroleum, 8(2), 278–290. https://doi.org/10.1016/j.petlm.2021.11.005
49. Male, F., Jensen, J. L., & Lake, L. W. (2020). Comparison of permeability predictions on cemented sandstones with physics-based and machine learning approaches. Journal of Natural Gas Science and Engineering, 77, 103244. https://doi.org/10.1016/j.jngse.2020.103244
50. Maselugbo, A. O., Onolemhemhen, R. U., Denloye, A. O., Salufu, S. O., & Isehunwa, S. O. (2017). Optimization of gas recovery using co-production technique in water drive reservoir. Journal of Petroleum and Gas Engineering, 8(6), 42–48. https://doi.org/10.5897/JPGE2017.0269
51. Matkerim, B., Mukhanbet, A., Kassymbek, N., Daribayev, B., Mustafin, M., & Imankulov, T. (2024). Machine learning analysis using the black oil model and parallel algorithms in oil recovery forecasting. Algorithms, 17(8), 354. https://doi.org/10.3390/a17080354
52. Meng, H., Wang, X., & Wang, X. (2018). Expressway crash prediction based on traffic big data. Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, 11–16. https://doi.org/10.1145/3297067.3297093
53. Mohaghegh, S. (2000). Virtual-intelligence applications in petroleum engineering: Part 1—artificial neural networks. Journal of Petroleum Technology, 52(9). https://doi.org/10.2118/58046-MS
54. Mousavi, S. M., Bakhtiarimanesh, P., Enzmann, F., Kersten, M., & Sadeghnejad, S. (2024). Machine-learned surrogate models for efficient oil well placement under operational reservoir constraints. SPE Journal, 29(01), 518–537. https://doi.org/10.2118/217467-PA
55. Omoniyi, O. A., & Adeolu, S. (2014). Decline Curve Analysis and Material Balance, as Methods for Estimating Reserves (A Case Study of D4 and E1 Fields). The International Journal of Innovative Research and Development, 3(11), 207–218. https://www.internationaljournalcorner.com/index.php/ijird_ojs/article/view/135465
56. Pan, S., Zheng, Z., Guo, Z., & Luo, H. (2022). An optimized XGBoost method for predicting reservoir porosity using petrophysical logs. Journal of Petroleum Science and Engineering, 208, 109520. https://doi.org/10.1016/j.petrol.2021.109520
57. Parish, R. G., Calderbank, V. J., Watkins, A. J., Muggeridge, A. H., Goode, A. T., & Robinson, P. R. (1993). Effective history matching: The application of advanced software techniques to the history-matching process. SPE Symposium on Reservoir Simulation, SPE-25250-MS. https://doi.org/10.2118/25250-MS
58. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., et al. (2011). Scikit-learn: Machine Learning in Python. The Journal of Machine Learning Research, 12, 2825–2830.
59. Pirizadeh, M., Alemohammad, N., Manthouri, M., & Pirizadeh, M. (2021). A new machine learning ensemble model for class imbalance problem of screening enhanced oil recovery methods. Journal of Petroleum Science and Engineering, 198, 108214. https://doi.org/10.1016/j.petrol.2020.108214
60. Pooladi-Darvish, M., Tabatabaie, S. H., & Rodriguez Cadena, C. (2022). Development of a machine learning technique in conjunction with reservoir complexity index to predict recovery factor using data from 18,000 reservoirs. ADIPEC, D032S173R006. https://doi.org/10.2118/211410-MS
61. Roustazadeh, A., Ghanbarian, B., Shadmand, M. B., Taslimitehrani, V., & Lake, L. W. (2024). Estimating hydrocarbon recovery factor at reservoir scale via machine learning: Database-dependent accuracy and reliability. Engineering Applications of Artificial Intelligence, 128, 107500. https://doi.org/10.1016/j.engappai.2023.107500
62. Schaap, M. G., & Leij, F. J. (1998). Database-related accuracy and uncertainty of pedotransfer functions. Soil Science, 163(10), 765–779. https://doi.org/10.1097/00010694-199810000-00001
63. Shapley, L. S. (1953). 17. A value for n-person games. In H. W. Kuhn & A. W. Tucker (Eds.), Contributions to the Theory of Games (AM-28), Volume II (pp. 307–318). Princeton University Press. https://doi.org/10.1515/9781400881970-018
64. Sharma, A., Srinivasan, S., & Lake, L. W. (2010). Classification of oil and gas reservoirs based on recovery factor: A data-mining approach. SPE Annual Technical Conference and Exhibition, SPE-130257-MS. https://doi.org/10.2118/130257-MS
65. Sheng, J. J. (2013). Surfactant enhanced oil recovery in carbonate reservoirs. In Enhanced Oil Recovery Field Case Studies (pp. 281–299). Elsevier. https://doi.org/10.1016/B978-0-12-386545-8.00012-9
66. Srivastava, P., Wu, X., Amirlatifi, A., & Devegowda, D. (2016). Recovery factor prediction for deepwater gulf of Mexico oilfields by integration of dimensionless numbers with data mining techniques. SPE Intelligent Energy International Conference and Exhibition, SPE-181024-MS. https://doi.org/10.2118/181024-MS
67. Talluru, G., & Wu, X. (2017). Using data analytics on dimensionless numbers to predict the ultimate recovery factors for different drive mechanisms of gulf of Mexico oil fields. SPE Annual Technical Conference and Exhibition, D031S030R008. https://doi.org/10.2118/187269-MS
68. Tang, J., Fan, B., Xiao, L., Tian, S., Zhang, F., et al. (2021). A new ensemble machine-learning framework for searching sweet spots in shale reservoirs. SPE Journal, 26(01), 482–497. https://doi.org/10.2118/204224-PA
69. Tewari, S., Dwivedi, U. D., & Shiblee, M. (2019). Assessment of big data analytics based ensemble estimator module for the real-time prediction of reservoir recovery factor. SPE Middle East Oil and Gas Show and Conference, D041S038R003. https://doi.org/10.2118/194996-MS
70. Tunkiel, A. T., Sui, D., & Wiktorski, T. (2022). Impact of data pre-processing techniques on recurrent neural network performance in context of real-time drilling logs in an automated prediction framework. Journal of Petroleum Science and Engineering, 208, 109760. https://doi.org/10.1016/j.petrol.2021.109760
71. Vo Thanh, H., Sheini Dashtgoli, D., Zhang, H., & Min, B. (2023). Machine-learning-based prediction of oil recovery factor for experimental CO2-Foam chemical EOR: Implications for carbon utilization projects. Energy, 278, 127860. https://doi.org/10.1016/j.energy.2023.127860
72. Woods, R. W., & Muskat, M. (1945). An analysis of material-balance calculations. Transactions of the AIME, 160(01), 124–139. https://doi.org/10.2118/945124-G
73. Zhao, X., Chen, X., Huang, Q., Lan, Z., Wang, X., & Yao, G. (2022). Logging-data-driven permeability prediction in low-permeable sandstones based on machine learning with pattern visualization: A case study in Wenchang A Sag, Pearl River Mouth Basin. Journal of Petroleum Science and Engineering, 214, 110517. https://doi.org/10.1016/j.petrol.2022.110517

Downloads
Additional Files
Published
Issue
Section
License
Copyright (c) 2025 Alireza Roustazadeh, Frank Male, Behzad Ghanbarian, Mohammad B. Shadmand, Vahid Taslimitehrani, Larry W. Lake

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Unless otherwise stated above, this is an open access article published by InterPore under either the terms of the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0) (https://creativecommons.org/licenses/by-nc-nd/4.0/).
Article metadata are available under the CCo license.