ICD Coding Automation Model of Retinal Detachment Case Using Support Vector Machine and Random Forest

Clinical Coding Automation ICD Machine learning (ML) Natural Language Processing (NLP) Retinal Detachment

Authors

May 7, 2026

Downloads

Health Information Management (HIM) professionals are responsible for maintaining the consistency of ICD-based clinical codes for the health reimbursement and health analytics through the review of medical documentation. The complexity of coding rules and clinical pathways increases the risk of miscoding, but the implementation of Electronic Medical Record (EMR) opens opportunities for the development of automation of ICD coding. This study aims to build an ICD code automation model for retinal detachment cases from eye referral hospital using artificial intelligence through clinical text classification with Natural Language Processing (NLP) and Machine Learning (ML) algorithms. The dataset includes disease resumes, physical examinations, diagnoses, medical procedures, surgical records, and therapies from 300 inpatients. Text preprocessing uses the NLTK library through sentence splitting, abbreviation expansion, case folding, stop word removal, and tokenization functions. Data preparation involves splitting data (80:20 ratio), feature extraction with TF-IDF Vectorizer, and 5-fold cross validation. Classification modeling uses Support Vector Machine (SVM) and Random Forest (RF). Evaluation of the SVM model showed an accuracy of 0.82 (precision 0.84; recall 0.82; F1-Score 0.82), while the RF model achieved an accuracy of 0.87 (precision 0.88; recall 0.87; F1-Score 0.87). Based on confusion metrics, the correct predictions for classes H33.0, H33.2, and H33.4 on SVM are 79, 87, and 80, while RF reaches 83, 88, and 91. The development of this automation requires HIM professional’s role in ensuring the quality of EMR data and accuracy of ICD code as well as intensive model training to handle the complexity of clinical data.