Submitted:
02 December 2025
Posted:
12 December 2025
You are already at the latest version
Abstract
Background: Anatomopathological reports remain predominantly unstructured within Electronic Medical Records, limiting automated data extraction, interoperability between healthcare institutions, and large-scale clinical research applications. Manual entity extraction and standardization processes are inconsistent, costly, and insufficiently scalable for modern healthcare systems.Aim: Our study aimed to (i) develop a domain-specific Named Entity Recognition model using BioBERT for extracting sample type, test performed, and finding entities from anatomopathological reports; (ii) implement a hybrid standardization framework combining BioClinicalBERT classification with Retrieval-Augmented Generation to map entities to SNOMED CT, LOINC, and ICD-11 terminologies; and (iii) evaluate the performance of this pipeline on real-world clinical reports. Methods: We manually annotated 560 anatomopathological reports from the Military Hospital of Tunis, establishing a gold-standard corpus. The pipeline integrated BioBERT v1.1 for entity extraction, trained for three epochs with the AdamW optimizer at a learning rate of 2×10⁻⁵, a batch size of 8, and weight decay of 0.01. Standardization employed BioClinicalBERT for multi-label classification, augmented by dense vector retrieval from official SNOMED CT, LOINC, and ICD-11 databases. Performance evaluation utilized precision, recall, and F1-score metrics with an 80-20 train-test split. Results: BioBERT achieved F1-scores of 0.97 for sample type, 0.98 for test performed, and 0.93 for finding entities, with overall precision of 0.969 and recall of 0.958. Bootstrap-estimated 95% confidence intervals confirmed robust performance stability. Absolute error analysis revealed 45 misclassified tokens in the test (relative error 6.9%) and six tokens in the finding (relative error 1%). One-sample t-tests yielded t-values of 15.71 for recall and 30.24 for F1-score, with all p-values below 0.0001. The hybrid standardization framework demonstrated F1-macro scores of 0.6159 for SNOMED CT, 0.9294 for LOINC, and 0.7201 for ICD-11 mapping. Cohen’s Kappa values ranged from 0.6871 to 0.9773 across ontologies. Statistical comparison between BioClinicalBERT and Fusion/Reranker models showed McNemar test p-values exceeding 0.370 and permutation test p-values ranging from 0.375 to 0.625. Conclusion: This study demonstrates that transformer-based Named Entity Recognition combined with retrieval-augmented standardization achieves clinically validated performance for automated extraction and multi-ontology coding of anatomopathological entities. Multi-institutional validation studies are necessary to assess generalizability before clinical deployment.
