Machine Learning-Based Estimation of Oil Recovery Factor Using XGBoost: Insights from Classification and Data-Driven Analyses

Authors

DOI:

https://doi.org/10.69631/ipj.v2i3nr53

Keywords:

Classification, Machine learning, Oil recovery factor, Extreme gradient boost

Abstract

In petroleum engineering, it is essential to determine the ultimate recovery factor (RF) particularly before exploitation and exploration. However, accurately estimating requires data that may not be necessarily available or measured at early stages of reservoir development. To rectify this, we applied machine learning (ML) to estimate oil RF from readily available features. To construct the ML models, we applied the XGBoost classification algorithm. Classification was chosen over regression because recovery factor is bounded from 0 to 1, much like probability. Three databases with various reservoir properties and recovery factors were used, leaving us with four different combinations to first train and test the ML models and then further evaluate them using an independent database including unseen data. Cross-validation with ten folds was applied on the training datasets to assess the effectiveness of the models. To evaluate the accuracy and reliability of the models, the accuracy, within-1 accuracy, precision, recall, macro-averaged f1 score and R2 were determined. Overall, results showed that the XGBoost classification algorithm could estimate the RF class with accuracies as high as 0.77 in the training datasets, 0.36 in the testing datasets and 0.24 in the independent databases used. We found that the reliability of the XGBoost classification model depended on the data in the training dataset, indicating that the ML models were database dependent. The feature importance analysis and the Shapley Additive exPlanations (SHAP) approach showed that the most important features were reserves, reservoir area and thickness.

Downloads

Download data is not yet available.

References

1. Agbadze, O. K., Qiang, C., & Jiaren, Y. (2022). Acoustic impedance and lithology-based reservoir porosity analysis using predictive machine learning algorithms. Journal of Petroleum Science and Engineering, 208, 109656. https://doi.org/10.1016/j.petrol.2021.109656

2. Ahmadi, M. A., & Chen, Z. (2019). Comparison of machine learning methods for estimating permeability and porosity of oil reservoirs via petro-physical logs. Petroleum, 5(3), 271–284. https://doi.org/10.1016/j.petlm.2018.06.002

3. Ahmadisharaf, A., Nematirad, R., Sabouri, S., Pachepsky, Y., & Ghanbarian, B. (2024). Representative sample size for estimating saturated hydraulic conductivity via machine learning: A proof‐of‐concept study. Water Resources Research, 60(8), e2023WR036783. https://doi.org/10.1029/2023WR036783

4. Aliyuda, K., & Howell, J. (2019). Machine-learning algorithm for estimating oil-recovery factor using a combination of engineering and stratigraphic dependent parameters. Interpretation, 7(3), SE151–SE159. https://doi.org/10.1190/INT-2018-0211.1

5. Aliyuda, K., Howell, J., & Humphrey, E. (2020). Impact of geological variables in controlling oil-reservoir performance: An insight from a machine-learning technique. SPE Reservoir Evaluation & Engineering, 23(04), 1314–1327. https://doi.org/10.2118/201196-PA

6. Alpak, F. O., Araya–Polo, M., & Onyeagoro, K. (2019). Simplified dynamic modeling of faulted turbidite reservoirs: A deep-learning approach to recovery-factor forecasting for exploration. SPE Reservoir Evaluation & Engineering, 22(04), 1240–1255. https://doi.org/10.2118/197053-PA

7. Anifowose, F. A., Ewenla, A. O., & Eludiora, S. I. (2011). Prediction of oil and gas reservoir properties using support vector machines. International Petroleum Technology Conference, IPTC-14514-MS. https://doi.org/10.2523/IPTC-14514-MS

8. Burgess, G. L., Cross, K. K., & Kazanis, E. G. (2019, December 31). Outer Continental Shelf Estimated Oil and Gas Reserves Gulf of Mexico OCS Region. U.S. Department of the Interior Bureau of Ocean Energy Management Gulf of Mexico OCS Region. https://www.boem.gov/sites/default/files/documents/renewable-energy/state-activities/2019-EOGR.pdf

9. Carpenter, C. (2021). Machine-learning work flow identifies brittle, fracable, producible rock using drilling data. Journal of Petroleum Technology, 73(10), 61–62. https://doi.org/10.2118/1021-0061-JPT

10. Chen, H., Chen, H., Liu, Z., Sun, X., & Zhou, R. (2020). Analysis of factors affecting the severity of automated vehicle crashes using XGBoost model combining POI data. Journal of Advanced Transportation, 2020, 1–12. https://doi.org/10.1155/2020/8881545

11. Chen, L., Gao, X., & Li, X. (2021). Using the motor power and XGBoost to diagnose working states of a sucker rod pump. Journal of Petroleum Science and Engineering, 199, 108329. https://doi.org/10.1016/j.petrol.2020.108329

12. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785

13. Chen, Z., Yu, W., Liang, J.-T., Wang, S., & Liang, H. (2022). Application of statistical machine learning clustering algorithms to improve EUR predictions using decline curve analysis in shale-gas reservoirs. Journal of Petroleum Science and Engineering, 208, 109216. https://doi.org/10.1016/j.petrol.2021.109216

14. Cronshaw, M. (2021). Energy in Perspective (p. 222). Springer International Publishing. https://doi.org/10.1007/978-3-030-63541-1

15. Dawson, H. L., Dubrule, O., & John, C. M. (2023). Impact of dataset size and convolutional neural network architecture on transfer learning for carbonate rock classification. Computers & Geosciences, 171, 105284. https://doi.org/10.1016/j.cageo.2022.105284

16. Denney, D. (2007). Reserves and resources classification, definitions, and guidelines: Defining the standard! Journal of Petroleum Technology, 59(12), 63–67. https://doi.org/10.2118/1207-0063-JPT

17. Desorcy, G. J., Warne, G. A., Ashton, B. R., Campbell, G. R., Collyer, D. R., et al. (1993). Definitions and guidelines for classification of oil and gas reserves. Journal of Canadian Petroleum Technology, 32(05). https://doi.org/10.2118/93-05-01

18. Dias, L. O., Bom, C. R., Faria, E. L., Valentín, M. B., Correia, M. D., et al. (2020). Automatic detection of fractures and breakouts patterns in acoustic borehole image logs using fast-region convolutional neural networks. Journal of Petroleum Science and Engineering, 191, 107099. https://doi.org/10.1016/j.petrol.2020.107099

19. Dong, Y., Qiu, L., Lu, C., Song, L., Ding, Z., Yu, Y., & Chen, G. (2022). A data-driven model for predicting initial productivity of offshore directional well based on the physical constrained eXtreme gradient boosting (XGBoost) trees. Journal of Petroleum Science and Engineering, 211, 110176. https://doi.org/10.1016/j.petrol.2022.110176

20. Esfandi, T., Sadeghnejad, S., & Jafari, A. (2024). Effect of reservoir heterogeneity on well placement prediction in CO2-EOR projects using machine learning surrogate models: Benchmarking of boosting-based algorithms. Geoenergy Science and Engineering, 233, 212564. https://doi.org/10.1016/j.geoen.2023.212564

21. Fang, K., Kifer, D., Lawson, K., Feng, D., & Shen, C. (2022). The data synergy effects of time‐series deep learning models in hydrology. Water Resources Research, 58(4), e2021WR029583. https://doi.org/10.1029/2021WR029583

22. Fetkovich, M. J., Fetkovich, E. J., & Fetkovich, M. D. (1996). Useful concepts for decline-curve forecasting, reserve estimation, and analysis. SPE Reservoir Engineering, 11(01), 13–22. https://doi.org/10.2118/28628-PA

23. Gu, Y., Zhang, D., Lin, Y., Ruan, J., & Bao, Z. (2021). Data-driven lithology prediction for tight sandstone reservoirs based on new ensemble learning of conventional logs: A demonstration of a Yanchang member, Ordos Basin. Journal of Petroleum Science and Engineering, 207, 109292. https://doi.org/10.1016/j.petrol.2021.109292

24. Gupta, S., Saputelli, L. A., Verde, A., Vivas, J. A., & Narahara, G. M. (2016). Application of an advanced data analytics methodology to predict hydrocarbon recovery factor variance between early phases of appraisal and post-sanction in gulf of mexico deep offshore assets. Offshore Technology Conference, D041S056R005. https://doi.org/10.4043/27127-MS

25. Han, B., & Bian, X. (2018). A hybrid PSO-SVM-based model for determination of oil recovery factor in the low-permeability reservoir. Petroleum, 4(1), 43–49. https://doi.org/10.1016/j.petlm.2017.06.001

26. Hartmann, D. J., & Beaumont, E. A. (1999). Predicting reservoir system quality and performance. In E. A. Beaumont & N. H. Foster, Exploring for Oil and Gas Traps. American Association of Petroleum Geologists. https://doi.org/10.1306/TrHbk624C9

27. He, M., Gu, H., & Xue, J. (2022). Log interpretation for lithofacies classification with a robust learning model using stacked generalization. Journal of Petroleum Science and Engineering, 214, 110541. https://doi.org/10.1016/j.petrol.2022.110541

28. Helmy, T., & Fatai, A. (2010). Hybrid computational intelligence models for porosity and permeability prediction of petroleum reservoirs. International Journal of Computational Intelligence and Applications, 09(04), 313–337. https://doi.org/10.1142/S1469026810002902

29. Holdaway, K. R. (Ed.). (2014). Harness Oil and Gas Big Data with Analytics (1st ed.). Wiley. https://doi.org/10.1002/9781118910948

30. Kaczmarczyk, R., Herbas, J., & Del Castillo, J. (2013). Approximations of primary, secondary and tertiary recovery factor in viscous and heavy oil reservoirs. SPE Offshore Europe Oil and Gas Conference and Exhibition, SPE-166583-MS. https://doi.org/10.2118/166583-MS

31. Kalam, S., Yousuf, U., Abu-Khamsin, S. A., Waheed, U. B., & Khan, R. A. (2022). An ANN model to predict oil recovery from a 5-spot waterflood of a heterogeneous reservoir. Journal of Petroleum Science and Engineering, 210, 110012. https://doi.org/10.1016/j.petrol.2021.110012

32. Kapoor, S., & Narayanan, A. (2023). Leakage and the reproducibility crisis in machine-learning-based science. Patterns. https://doi.org/10.1016/j.patter.2023.100804

33. Karacan, C. Ö. (2020). A fuzzy logic approach for estimating recovery factors of miscible CO2-EOR projects in the United States. Journal of Petroleum Science and Engineering, 184, 106533. https://doi.org/10.1016/j.petrol.2019.106533

34. Knopp, C. R., & Ramsey, L. A. (1960). Correlation of oil formation volume factor and solution gas-oil ratio. Journal of Petroleum Technology, 12(08), 27–29. https://doi.org/10.2118/1433-G

35. Kong, B., Chen, Z., Chen, S., & Qin, T. (2021). Machine learning-assisted production data analysis in liquid-rich Duvernay Formation. Journal of Petroleum Science and Engineering, 200, 108377. https://doi.org/10.1016/j.petrol.2021.108377

36. Kotsiantis, S. B. (2013). Decision trees: A recent overview. Artificial Intelligence Review, 39(4), 261–283. https://doi.org/10.1007/s10462-011-9272-4

37. Kumar, M., Swaminathan, K., Rusli, A., & Thomas-Hy, A. (2022). Applying data analytics & machine learning methods for recovery factor prediction and uncertainty modelling. SPE Asia Pacific Oil & Gas Conference and Exhibition, D021S008R003. https://doi.org/10.2118/210769-MS

38. Lake, L., Johns, R. T., Rossen, W. R., & Pope, G. A. (2014). Fundamentals of enhanced oil recovery. Society of Petroleum Engineers. https://doi.org/10.2118/9781613993286

39. Lee, B. B., & Lake, L. W. (2015). Using data analytics to analyze reservoir databases. SPE Annual Technical Conference and Exhibition, D031S030R008. https://doi.org/10.2118/174900-MS

40. Li, S., Zhou, K., Zhao, L., Xu, Q., & Liu, J. (2022). An improved lithology identification approach based on representation enhancement by logging feature decomposition, selection and transformation. Journal of Petroleum Science and Engineering, 209, 109842. https://doi.org/10.1016/j.petrol.2021.109842

41. Lin, J., De Weck, O., & MacGowan, D. (2012). Modeling epistemic subsurface reservoir uncertainty using a reverse Wiener jump–diffusion process. Journal of Petroleum Science and Engineering, 84–85, 8–19. https://doi.org/10.1016/j.petrol.2012.01.015

42. Lin, W.-C., & Tsai, C.-F. (2020). Missing value imputation: A review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53(2), 1487–1509. https://doi.org/10.1007/s10462-019-09709-4

43. Ling, K., Wu, X., Zhang, H., & He, J. (2013). Tactics and pitfalls in production decline curve analysis. SPE Production and Operations Symposium, SPE-164503-MS. https://doi.org/10.2118/164503-MS

44. Liu, W., Liu, W. D., Gu, J., & Shen, X. (2019). Predictive model for water absorption in sublayers using a machine learning method. Journal of Petroleum Science and Engineering, 182, 106367. https://doi.org/10.1016/j.petrol.2019.106367

45. Long, R. (2016). TORIS: An Integrated Decision Support System for Petroleum E&P Policy Evaluation. USGEO, AmeriGEO. https://data.amerigeoss.org/dataset/toris-an-integrated-decision-support-system-for-petroleum-e-p-policy-evaluation

46. Lundberg, S., & Lee, S.-I. (2017). A unified approach to interpreting model predictions (No. arXiv:1705.07874). arXiv. https://doi.org/10.48550/arXiv.1705.07874

47. Mahmoud, A., Elkatatny, S., Chen, W., & Abdulraheem, A. (2019). Estimation of oil recovery factor for water drive sandy reservoirs through applications of artificial intelligence. Energies, 12(19), 3671. https://doi.org/10.3390/en12193671

48. Makhotin, I., Orlov, D., Koroteev, D., Burnaev, E., Karapetyan, A., & Antonenko, D. (2022). Machine learning for recovery factor estimation of an oil reservoir: A tool for derisking at a hydrocarbon asset evaluation. Petroleum, 8(2), 278–290. https://doi.org/10.1016/j.petlm.2021.11.005

49. Male, F., Jensen, J. L., & Lake, L. W. (2020). Comparison of permeability predictions on cemented sandstones with physics-based and machine learning approaches. Journal of Natural Gas Science and Engineering, 77, 103244. https://doi.org/10.1016/j.jngse.2020.103244

50. Maselugbo, A. O., Onolemhemhen, R. U., Denloye, A. O., Salufu, S. O., & Isehunwa, S. O. (2017). Optimization of gas recovery using co-production technique in water drive reservoir. Journal of Petroleum and Gas Engineering, 8(6), 42–48. https://doi.org/10.5897/JPGE2017.0269

51. Matkerim, B., Mukhanbet, A., Kassymbek, N., Daribayev, B., Mustafin, M., & Imankulov, T. (2024). Machine learning analysis using the black oil model and parallel algorithms in oil recovery forecasting. Algorithms, 17(8), 354. https://doi.org/10.3390/a17080354

52. Meng, H., Wang, X., & Wang, X. (2018). Expressway crash prediction based on traffic big data. Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, 11–16. https://doi.org/10.1145/3297067.3297093

53. Mohaghegh, S. (2000). Virtual-intelligence applications in petroleum engineering: Part 1—artificial neural networks. Journal of Petroleum Technology, 52(9). https://doi.org/10.2118/58046-MS

54. Mousavi, S. M., Bakhtiarimanesh, P., Enzmann, F., Kersten, M., & Sadeghnejad, S. (2024). Machine-learned surrogate models for efficient oil well placement under operational reservoir constraints. SPE Journal, 29(01), 518–537. https://doi.org/10.2118/217467-PA

55. Omoniyi, O. A., & Adeolu, S. (2014). Decline Curve Analysis and Material Balance, as Methods for Estimating Reserves (A Case Study of D4 and E1 Fields). The International Journal of Innovative Research and Development, 3(11), 207–218. https://www.internationaljournalcorner.com/index.php/ijird_ojs/article/view/135465

56. Pan, S., Zheng, Z., Guo, Z., & Luo, H. (2022). An optimized XGBoost method for predicting reservoir porosity using petrophysical logs. Journal of Petroleum Science and Engineering, 208, 109520. https://doi.org/10.1016/j.petrol.2021.109520

57. Parish, R. G., Calderbank, V. J., Watkins, A. J., Muggeridge, A. H., Goode, A. T., & Robinson, P. R. (1993). Effective history matching: The application of advanced software techniques to the history-matching process. SPE Symposium on Reservoir Simulation, SPE-25250-MS. https://doi.org/10.2118/25250-MS

58. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., et al. (2011). Scikit-learn: Machine Learning in Python. The Journal of Machine Learning Research, 12, 2825–2830.

59. Pirizadeh, M., Alemohammad, N., Manthouri, M., & Pirizadeh, M. (2021). A new machine learning ensemble model for class imbalance problem of screening enhanced oil recovery methods. Journal of Petroleum Science and Engineering, 198, 108214. https://doi.org/10.1016/j.petrol.2020.108214

60. Pooladi-Darvish, M., Tabatabaie, S. H., & Rodriguez Cadena, C. (2022). Development of a machine learning technique in conjunction with reservoir complexity index to predict recovery factor using data from 18,000 reservoirs. ADIPEC, D032S173R006. https://doi.org/10.2118/211410-MS

61. Roustazadeh, A., Ghanbarian, B., Shadmand, M. B., Taslimitehrani, V., & Lake, L. W. (2024). Estimating hydrocarbon recovery factor at reservoir scale via machine learning: Database-dependent accuracy and reliability. Engineering Applications of Artificial Intelligence, 128, 107500. https://doi.org/10.1016/j.engappai.2023.107500

62. Schaap, M. G., & Leij, F. J. (1998). Database-related accuracy and uncertainty of pedotransfer functions. Soil Science, 163(10), 765–779. https://doi.org/10.1097/00010694-199810000-00001

63. Shapley, L. S. (1953). 17. A value for n-person games. In H. W. Kuhn & A. W. Tucker (Eds.), Contributions to the Theory of Games (AM-28), Volume II (pp. 307–318). Princeton University Press. https://doi.org/10.1515/9781400881970-018

64. Sharma, A., Srinivasan, S., & Lake, L. W. (2010). Classification of oil and gas reservoirs based on recovery factor: A data-mining approach. SPE Annual Technical Conference and Exhibition, SPE-130257-MS. https://doi.org/10.2118/130257-MS

65. Sheng, J. J. (2013). Surfactant enhanced oil recovery in carbonate reservoirs. In Enhanced Oil Recovery Field Case Studies (pp. 281–299). Elsevier. https://doi.org/10.1016/B978-0-12-386545-8.00012-9

66. Srivastava, P., Wu, X., Amirlatifi, A., & Devegowda, D. (2016). Recovery factor prediction for deepwater gulf of Mexico oilfields by integration of dimensionless numbers with data mining techniques. SPE Intelligent Energy International Conference and Exhibition, SPE-181024-MS. https://doi.org/10.2118/181024-MS

67. Talluru, G., & Wu, X. (2017). Using data analytics on dimensionless numbers to predict the ultimate recovery factors for different drive mechanisms of gulf of Mexico oil fields. SPE Annual Technical Conference and Exhibition, D031S030R008. https://doi.org/10.2118/187269-MS

68. Tang, J., Fan, B., Xiao, L., Tian, S., Zhang, F., et al. (2021). A new ensemble machine-learning framework for searching sweet spots in shale reservoirs. SPE Journal, 26(01), 482–497. https://doi.org/10.2118/204224-PA

69. Tewari, S., Dwivedi, U. D., & Shiblee, M. (2019). Assessment of big data analytics based ensemble estimator module for the real-time prediction of reservoir recovery factor. SPE Middle East Oil and Gas Show and Conference, D041S038R003. https://doi.org/10.2118/194996-MS

70. Tunkiel, A. T., Sui, D., & Wiktorski, T. (2022). Impact of data pre-processing techniques on recurrent neural network performance in context of real-time drilling logs in an automated prediction framework. Journal of Petroleum Science and Engineering, 208, 109760. https://doi.org/10.1016/j.petrol.2021.109760

71. Vo Thanh, H., Sheini Dashtgoli, D., Zhang, H., & Min, B. (2023). Machine-learning-based prediction of oil recovery factor for experimental CO2-Foam chemical EOR: Implications for carbon utilization projects. Energy, 278, 127860. https://doi.org/10.1016/j.energy.2023.127860

72. Woods, R. W., & Muskat, M. (1945). An analysis of material-balance calculations. Transactions of the AIME, 160(01), 124–139. https://doi.org/10.2118/945124-G

73. Zhao, X., Chen, X., Huang, Q., Lan, Z., Wang, X., & Yao, G. (2022). Logging-data-driven permeability prediction in low-permeable sandstones based on machine learning with pattern visualization: A case study in Wenchang A Sag, Pearl River Mouth Basin. Journal of Petroleum Science and Engineering, 214, 110517. https://doi.org/10.1016/j.petrol.2022.110517

Graphical abstract

Downloads

Additional Files

Published

2025-08-25

Issue

Section

Original Research Papers

How to Cite

Roustazadeh, A., Male, F., Ghanbarian, B., Shadmand, M. B., Taslimitehrani, V., & Lake, L. W. (2025). Machine Learning-Based Estimation of Oil Recovery Factor Using XGBoost: Insights from Classification and Data-Driven Analyses. InterPore Journal, 2(3), IPJ250825-4. https://doi.org/10.69631/ipj.v2i3nr53