Sort by
NP-Hardness Collapsed: Deterministic Resolution of Spin-Glass Ground States via Information-Geometric Manifolds (Scaling from N=8 to N=100)
Stefan Trauth
Posted: 12 December 2025
AI-Based Prediction of Numerical Earthquakes Using (Pseudo) Acoustic Emission
Piotr Klejment
Posted: 12 December 2025
Information Theory Laws: A Recollection
Tolga Topal
Posted: 12 December 2025
What Is the Radius of Convergence in the Sequence Space Seq(R) ?
Mohsen Soltanifar
Posted: 12 December 2025
Efficient Statistical Significance Approximation for Local Similarity Analysis of High-Throughput Time Series Data Using the Circular Moving Block Bootstrap
Yu Yang
,Zhen Yang
,Wei Shen
,Zhiying Cheng
,Xing Liu
,Shaowen Liu
Posted: 12 December 2025
A Note on Fermat’s Last Theorem
Frank Vega
Posted: 12 December 2025
Information is All It Needs: A First-Principles Foundation for Physics, Cognition, and Reality
Stefan Trauth
Posted: 12 December 2025
Smart E-Waste Recycling Using AI and Blockchain: Enabling Sustainable Resource Recovery for Sustainable Power Solutions
Al Imran
,Md. Koushik Ahmed
,Mahin Mahmud
,Junaid Rahman Mokit
,Redwan Utsab
,Md. Motaharul Islam
Posted: 12 December 2025
Comprehensive Quantitative Evaluation of Transfemoral Prosthetic Socket Fit Using Machine Learning MRI Segmentation and Finite Element Modeling
Ryota Sayama
,Yukio Agarie
,Hironori Suda
,Hiroshi Otsuka
,Kengo Ohnishi
,Shinichiro Kon
,Akihiko Hanahusa
,Motoki Takagi
,Shinichiro Yamamoto
Posted: 12 December 2025
Learning the Grid: Transformer Architectures for Electricity Price Forecasting in the Australian National Market
Mark Sinclair
,Andrew Shepley
,Farshid Hajati
Posted: 12 December 2025
Tool and Agent Selection for Large Language Model Agents in Production: A Survey
Elias Lumer
,Anmol Gulati
,Faheem Nizar
,Dzmitry Hedroits
,Atharva Mehta
,Henry Hwangbo
,Vamse Kumar Subbiah
,Pradeep Honaganahalli Basavaraju
,James A. Burke
Posted: 12 December 2025
Cable Temperature Prediction Algorithm Based on the MSST-Net
Xin Zhou
,Yanhao Li
,Shiqin Zhao
,Xijun Wang
,Lifan Chen
,Minyang Cheng
,Lvwen wen Huang
Posted: 12 December 2025
Accurate Clinical Entity Recognition and Code Mapping of Anatomopathological Reports Using BioClinicalBERT Enhanced by Retrieval-Augmented Generation: A Hybrid Deep Learning Approach
Hamida Abdaoui
,Chamseddine Barki
,Ismail Dergaa
,Karima Tlili
,Halil İbrahim Ceylan
,Nicola Luigi Bragazzi
,Andrea de Giorgio
,Ridha Ben Salah
,Hanene Boussi Rahmouni
Background: Anatomopathological reports remain predominantly unstructured within Electronic Medical Records, limiting automated data extraction, interoperability between healthcare institutions, and large-scale clinical research applications. Manual entity extraction and standardization processes are inconsistent, costly, and insufficiently scalable for modern healthcare systems.Aim: Our study aimed to (i) develop a domain-specific Named Entity Recognition model using BioBERT for extracting sample type, test performed, and finding entities from anatomopathological reports; (ii) implement a hybrid standardization framework combining BioClinicalBERT classification with Retrieval-Augmented Generation to map entities to SNOMED CT, LOINC, and ICD-11 terminologies; and (iii) evaluate the performance of this pipeline on real-world clinical reports. Methods: We manually annotated 560 anatomopathological reports from the Military Hospital of Tunis, establishing a gold-standard corpus. The pipeline integrated BioBERT v1.1 for entity extraction, trained for three epochs with the AdamW optimizer at a learning rate of 2×10⁻⁵, a batch size of 8, and weight decay of 0.01. Standardization employed BioClinicalBERT for multi-label classification, augmented by dense vector retrieval from official SNOMED CT, LOINC, and ICD-11 databases. Performance evaluation utilized precision, recall, and F1-score metrics with an 80-20 train-test split. Results: BioBERT achieved F1-scores of 0.97 for sample type, 0.98 for test performed, and 0.93 for finding entities, with overall precision of 0.969 and recall of 0.958. Bootstrap-estimated 95% confidence intervals confirmed robust performance stability. Absolute error analysis revealed 45 misclassified tokens in the test (relative error 6.9%) and six tokens in the finding (relative error 1%). One-sample t-tests yielded t-values of 15.71 for recall and 30.24 for F1-score, with all p-values below 0.0001. The hybrid standardization framework demonstrated F1-macro scores of 0.6159 for SNOMED CT, 0.9294 for LOINC, and 0.7201 for ICD-11 mapping. Cohen’s Kappa values ranged from 0.6871 to 0.9773 across ontologies. Statistical comparison between BioClinicalBERT and Fusion/Reranker models showed McNemar test p-values exceeding 0.370 and permutation test p-values ranging from 0.375 to 0.625. Conclusion: This study demonstrates that transformer-based Named Entity Recognition combined with retrieval-augmented standardization achieves clinically validated performance for automated extraction and multi-ontology coding of anatomopathological entities. Multi-institutional validation studies are necessary to assess generalizability before clinical deployment.
Background: Anatomopathological reports remain predominantly unstructured within Electronic Medical Records, limiting automated data extraction, interoperability between healthcare institutions, and large-scale clinical research applications. Manual entity extraction and standardization processes are inconsistent, costly, and insufficiently scalable for modern healthcare systems.Aim: Our study aimed to (i) develop a domain-specific Named Entity Recognition model using BioBERT for extracting sample type, test performed, and finding entities from anatomopathological reports; (ii) implement a hybrid standardization framework combining BioClinicalBERT classification with Retrieval-Augmented Generation to map entities to SNOMED CT, LOINC, and ICD-11 terminologies; and (iii) evaluate the performance of this pipeline on real-world clinical reports. Methods: We manually annotated 560 anatomopathological reports from the Military Hospital of Tunis, establishing a gold-standard corpus. The pipeline integrated BioBERT v1.1 for entity extraction, trained for three epochs with the AdamW optimizer at a learning rate of 2×10⁻⁵, a batch size of 8, and weight decay of 0.01. Standardization employed BioClinicalBERT for multi-label classification, augmented by dense vector retrieval from official SNOMED CT, LOINC, and ICD-11 databases. Performance evaluation utilized precision, recall, and F1-score metrics with an 80-20 train-test split. Results: BioBERT achieved F1-scores of 0.97 for sample type, 0.98 for test performed, and 0.93 for finding entities, with overall precision of 0.969 and recall of 0.958. Bootstrap-estimated 95% confidence intervals confirmed robust performance stability. Absolute error analysis revealed 45 misclassified tokens in the test (relative error 6.9%) and six tokens in the finding (relative error 1%). One-sample t-tests yielded t-values of 15.71 for recall and 30.24 for F1-score, with all p-values below 0.0001. The hybrid standardization framework demonstrated F1-macro scores of 0.6159 for SNOMED CT, 0.9294 for LOINC, and 0.7201 for ICD-11 mapping. Cohen’s Kappa values ranged from 0.6871 to 0.9773 across ontologies. Statistical comparison between BioClinicalBERT and Fusion/Reranker models showed McNemar test p-values exceeding 0.370 and permutation test p-values ranging from 0.375 to 0.625. Conclusion: This study demonstrates that transformer-based Named Entity Recognition combined with retrieval-augmented standardization achieves clinically validated performance for automated extraction and multi-ontology coding of anatomopathological entities. Multi-institutional validation studies are necessary to assess generalizability before clinical deployment.
Posted: 12 December 2025
Minimal Surfaces and Analytic Number Theory: The Enneper-Riemann Spectral Bridge
Felipe Oliveira Souto
Posted: 12 December 2025
The Empirical Bayes Estimators of the Variance Parameter of the Normal Distribution with a Normal-Inverse-Gamma Prior under Stein's Loss Function
Ying-Ying Zhang
Posted: 12 December 2025
LLM-Based Multi-Agent Systems for Mathematical Problem Solving: A Comprehensive Literature Review
Bektur Toktobekov
,Burul Shambetova
Posted: 12 December 2025
Explainable Representation Learning in Large Language Models for Fine-Grained Sentiment and Opinion Classification
Yue Xing
,Ming Wang
,Yingnan Deng
,Heyao Liu
,Yun Zi
Posted: 11 December 2025
Visual Analytics of Singapore’s Waste System: Behavioural, Industrial, and Policy Dimensions
Noor Ul Amin
,Addy Arif Bin Mahathir
,Sivamuganathan Mohana Dass
,Sai Rama Mahalingam
,Priyanshu Das
Posted: 11 December 2025
Improving the Time Efficiency of a Script Identification Algorithm Using a Unicode-Based Regular Expression Matching Strategy
Mamtimin Qasim
,Wushour Silamu
Script identification is the first step in most multilingual text processing systems. To improve the time efficiency of language identification algorithms, it is first determined whether there is content written in a certain script in the text; if so, the content written in that script is then obtained. Then, it is determined whether the total length of the texts corresponding to the identified scripts is equal to the original text length; if so, the script identification process ends. Finally, considering the frequencies of various scripts on the Internet, those that appear more frequently are prioritized during script identification. Based on these three approaches, an improved script identification algorithm was designed. A comparison experiment was conducted using sentence-level text corpora in 261 languages written in 24 scripts. The training and testing times of the newly proposed method were reduced by 8.61- and 8.56-fold, respectively, while the F1 score for script identification was slightly higher than those reported in our earlier studies. The method proposed in this study effectively improves the time efficiency of script identification algorithms.
Script identification is the first step in most multilingual text processing systems. To improve the time efficiency of language identification algorithms, it is first determined whether there is content written in a certain script in the text; if so, the content written in that script is then obtained. Then, it is determined whether the total length of the texts corresponding to the identified scripts is equal to the original text length; if so, the script identification process ends. Finally, considering the frequencies of various scripts on the Internet, those that appear more frequently are prioritized during script identification. Based on these three approaches, an improved script identification algorithm was designed. A comparison experiment was conducted using sentence-level text corpora in 261 languages written in 24 scripts. The training and testing times of the newly proposed method were reduced by 8.61- and 8.56-fold, respectively, while the F1 score for script identification was slightly higher than those reported in our earlier studies. The method proposed in this study effectively improves the time efficiency of script identification algorithms.
Posted: 11 December 2025
Lightweight Pipeline Defect Detection Algorithm Based on FALW-YOLOv8
Huazhong Wang
,Xuetao Wang
,Lihua Sun
,Qingchao Jiang
Pipelines play a critical role in industrial production and daily life as essential conduits for transportation. However, defects frequently arise because of environmental and manufacturing factors, posing potential safety hazards. To address the limitations of traditional object detection methods, such as inefficient feature extraction and loss of critical information, this paper proposes an improved algorithm named FALW-YOLOv8, based on YOLOv8. The FasterBlock is integrated into the C2f module to replace standard convolutional layers, thereby reducing redundant computations and significantly enhancing the efficiency of feature extraction. Additionally, the ADown module is employed to improve multi-scale feature retention, while the LSKA attention mechanism is incorporated to optimize detection accuracy, particularly for small defects. The Wise-IoU v2 loss function is adopted to refine bounding box precision for complex samples. Experimental results demonstrate that the proposed FALW-YOLOv8 achieves a 5.8% improvement in mAP50, alongside a 34.8% reduction in parameters and a 30.86% decrease in computational cost. This approach effectively balances accuracy and efficiency, making it suitable for real-time industrial inspection applications.
Pipelines play a critical role in industrial production and daily life as essential conduits for transportation. However, defects frequently arise because of environmental and manufacturing factors, posing potential safety hazards. To address the limitations of traditional object detection methods, such as inefficient feature extraction and loss of critical information, this paper proposes an improved algorithm named FALW-YOLOv8, based on YOLOv8. The FasterBlock is integrated into the C2f module to replace standard convolutional layers, thereby reducing redundant computations and significantly enhancing the efficiency of feature extraction. Additionally, the ADown module is employed to improve multi-scale feature retention, while the LSKA attention mechanism is incorporated to optimize detection accuracy, particularly for small defects. The Wise-IoU v2 loss function is adopted to refine bounding box precision for complex samples. Experimental results demonstrate that the proposed FALW-YOLOv8 achieves a 5.8% improvement in mAP50, alongside a 34.8% reduction in parameters and a 30.86% decrease in computational cost. This approach effectively balances accuracy and efficiency, making it suitable for real-time industrial inspection applications.
Posted: 11 December 2025
of 618