Computer Science and Mathematics

Sort by

Article

Signal Processing

Vision-Only Localization of Drones with Optimal Window Velocity Fusion

Seokwon Yeom

Abstract: Drone localization is essential for various purposes such as navigation, autonomous flight, and object tracking. However, this task is challenging when satellite signals are unavailable. This paper addresses vision-only localization of flying drones through op-timal window velocity fusion. Multiple optimal windows are derived from a piecewise linear regression (segment) model of the image-to-real world conversion function. Each window serves as a template to estimate the drone's instantaneous velocity. The multiple velocities obtained from multiple optimal windows are integrated by two fusion rules: one is a weighted average for lateral velocity, and the other is a winner-take-all decision for longitudinal velocity. In the experiments, a drone performed a total of six short-range (about 800 m to 2 km) and high maneuvering flights in rural and urban areas. Four flights in rural areas consist of a forward-backward straight flight, a forward-backward zigzag flight (a snake path), a square path with three banked turns, and a free flight that includes both banked turns and zigzags. Two flights in urban areas are a straight outbound flight and a forward-backward straight flight. The performance was evaluated through the root mean squared error (RMSE) and drift error of the ground-truth trajectory and the rig-id-body rotated vision-only trajectory. The proposed image-based method has been shown to achieve flight errors of a few meters to tens of meters, which corresponds to around 3% of the flight length.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1988.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Faithfulness-Aware Multi-Objective Context Ranking for Retrieval-Augmented Generation

Tian Guan

Sebastian Sun

Bolin Chen

Abstract: Retrieval-Augmented Generation (RAG) has emerged as a dominant paradigm for enhancing Large Language Models (LLMs) with external knowledge. However, existing ranking methods in RAG pipelines primarily optimize for document-query relevance, neglecting crucial factors for generation quality such as factual consistency and information coverage. We propose a novel multiobjective ranking framework that explicitly models three critical dimensions: relevance, coverage, and faithfulness support. Unlike traditional IR-centric approaches, our method introduces a utilitybased scoring mechanism that evaluates each document’s contribution to reducing hallucinations, improving answer completeness, and maintaining relevance. We formulate the ranking problem as a multi-objective optimization task and employ listwise learning with carefully constructed utility labels derived from existing QA datasets. Extensive experiments on Natural Questions, TriviaQA, and HotpotQA demonstrate that our approach achieves substantial improvements over state-of-the-art baselines, with an average 4.8- point increase (8.6% relative improvement) in Exact Match scores, 10.1% improvement in faithfulness metrics, and 35.7% reduction in hallucination rates compared to RankRAG, all while maintaining computational efficiency comparable to traditional reranking methods.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1983.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Addressing Challenges in Multimodal Large Language Model Development

Feidlimid Shyama

Lucas Pereira

João Souza

Ana Costa

Abstract: Multimodal Large Language Models (MLLMs) have emerged as a powerful paradigm in artificial intelligence, enabling systems to process and reason over data from multiple modalities, such as text, images, video, and audio. By combining the strengths of different data types, MLLMs offer the potential to tackle more complex and nuanced tasks than traditional unimodal models. This paper provides a comprehensive survey of the current state of MLLMs, examining their architecture, training strategies, applications, and the challenges that remain in scaling and deploying these models. We begin by reviewing the core components of MLLMs, including the integration of modality-specific encoders and the development of joint multimodal representations. The training strategies that support the learning of multimodal interactions, such as contrastive learning, early and late fusion, and self-supervised pretraining, are discussed in detail. Furthermore, we explore a wide range of applications where MLLMs have demonstrated success, including visual-language understanding tasks like image captioning and visual question answering, multimodal sentiment analysis, and human-robot interaction. Despite their impressive capabilities, MLLMs face a number of significant challenges, such as issues with cross-modal alignment, missing modalities, computational inefficiency, and the presence of bias in multimodal datasets. The ethical concerns associated with fairness, interpretability, and accountability are also highlighted. We conclude by exploring future research directions that could help address these challenges and advance the field, including improvements in cross-modal fusion, multimodal pretraining paradigms, model efficiency, and bias mitigation strategies. As MLLMs continue to evolve, they are poised to play a transformative role in various industries, from healthcare and education to robotics and entertainment, by enabling machines to understand and interact with the world in a more human-like and contextually aware manner. This survey aims to provide a comprehensive overview of the current landscape of MLLMs, offering insights into both their potential and the hurdles that remain for their widespread adoption.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1986.v1

Article

Computer Science and Mathematics

Computer Science

Truncating and Shifting Weights for Max-Plus Automata

Jelena Matejić

Miroslav Ćirić

Jelena Ignjatović

Ivana Micić

Abstract: In this paper, for any real number $\lambdaup$, we transform the complete max-plus semiring $\mathbb{R}_\infty$ into a commutative, complete, additively idempotent semiring $\mathbb{R}_\infty^\lambdaup$, called the lower $\lambdaup$-truncation~of~$\mathbb{R}_\infty$. It is obtained by removing from $\mathbb{R}_\infty$ all real numbers smaller than $\lambdaup$, inheriting the addition operation, shifting the original products by $-\lambdaup$, and appropriately modifying the residuum operation. The purpose of lower truncations is to transfer the iterative procedures for computing the greatest presimulations and prebisimulations between max-plus automata, in cases where they cannot be completed in a finite number of iterations over $\mathbb{R}_\infty$, to $\mathbb{R}_\infty^\lambdaup$, where they could terminate in a finite number of iterations. For instance, we prove that this necessarily happens when working with max-plus automata with integer weights. We also show how presimulations and prebisimulations computed over $\mathbb{R}_\infty^\lambdaup$ can be transformed into presimulations and prebisimulations between the original automata over $\mathbb{R}_\infty$. Although they do not play a significant role from the standpoint of computing presimulations and prebisimulations, for theoretical reasons we also introduce two types of upper truncations of the complete max-plus semiring $\mathbb{R}_\infty$.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1981.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Refinement and Validation of an Artificial Intelligence Pipeline for Robust Greater Caribbean Manatee Detection and Acoustic Individual Counting

Fabricio Quirós-Corella

Athena Rycyk

Beth Brady

Priscilla Cubero-Pardo

Abstract: The Greater Caribbean manatee is classified as vulnerable, yet the lack of data related to population status in the Costa Rican Caribbean severely hinders conservation policy due to limited ecological knowledge. This study aims to address this challenge by refining a pipeline for the automated manatee count method to enhance classification robustness and efficiency for accurate spatial and temporal density estimation. The bioacoustics analysis consists of a deep learning manatee call detector and an unsupervised individual manatee counting. Methodologically, we implemented an offline feature extraction strategy to avoid a substantial initial computational bottleneck, measured at almost 13h, required to convert 43,031 audio samples into labeled images. To mitigate the high risk of overfitting associated with class imbalance, common in bioacoustic databases, a bootstrapping method was applied post-data splitting, generating a labeled dataset of 100,000 spectrograms. Transfer learning with the VGG-16 architecture yielded superior results, achieving a robust mean 10-fold cross-validation accuracy of 98.94% (±0.10%) and normalized F1-scores of 0.99. Furthermore, this optimized fine-tuning was rapidly executed in just 22min and 36s. Subsequently, the unsupervised individual manatee counting utilized k-means clustering on the top three music information retrieval descriptors along with dimensionality reduction, successfully segregating detected calls into three acoustically distinct clusters, likely representing three individuals. This performance was validated by a silhouette coefficient of 79.03%. These validated results confirm the refined automatic manatee count method as a robust and scalable framework ready for deployment on Costa Rican passive acoustic monitoring data to generate crucial scientific evidence for species conservation.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1822.v1

Article

Computer Science and Mathematics

Computer Vision and Graphics

ContextualCLIP: A Context-Aware and Multi-Grained Fusion Framework for Few-Shot Ultrasound Anomaly Analysis

Yao-Tian Chian

Yuxin Zhai

Abstract: Ultrasound (US) imaging is crucial for breast anomaly detection, but its interpretation is subjective and suffers from data scarcity and domain generalization issues. Existing deep learning models struggle to achieve both precise pixel-level localization and fine-grained image-level classification simultaneously, especially in few-shot and cross-domain settings. To address these challenges, we propose ContextualCLIP, a novel few-shot adaptation framework built upon CLIP. ContextualCLIP introduces three core enhancements: (1) a Contextualized Adaptive Prompting (CAP) generator that dynamically creates clinically relevant text prompts by integrating high-order semantic contextual information; (2) a Multi-Grained Feature Fusion Adapter (MGFA) that extracts and adaptively fuses features from different CLIP visual encoder layers using gated attention for multi-scale lesion analysis; and (3) a Domain-Enhanced Memory Bank (DEMB) that improves cross-domain generalization by learning domain-invariant embeddings through a lightweight domain-aware module and contrastive learning. Jointly optimized for localization and classification, ContextualCLIP is evaluated on BUS-UCLM for adaptation and BUSI/BUSZS for zero-extra-adaptation. Results demonstrate that ContextualCLIP consistently achieves superior performance over state-of-the-art baselines across various few-shot settings, yielding substantially higher classification and localization metrics. Ablation studies validate the efficacy of each module, and human evaluation suggests significant augmentation of radiologists' diagnostic accuracy and confidence. ContextualCLIP provides a robust and efficient solution for comprehensive ultrasound anomaly analysis in data-scarce and diverse clinical environments.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1847.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Causal Representation Learning for Robust and Interpretable Audit Risk Identification in Financial Systems

Jingjing Li

Qingmiao Gan

Ruibo Wu

Chen Chen

Ruoyi Fang

Jianlin Lai

Abstract: his study investigates the application of causal representation learning in financial auditing risk identification, aiming to address problems in traditional methods such as spurious correlations, limited interpretability, and unstable recognition. The proposed framework is built around causal-driven latent representations, where nonlinear mapping is used to obtain deep feature representations of financial data, and structural equation models are employed to establish causal dependencies, thereby removing the interference of non-causal features in risk modeling. On this basis, causal regularization constraints are introduced, and the joint optimization of the objective function enhances the consistency and robustness of representations, improving the reliability and interpretability of the model in complex scenarios. Furthermore, in the risk scoring stage, causal representation is combined with intervention effect calculation, which enables risk identification to provide not only outcome judgments but also insights into the underlying driving mechanisms, thereby improving traceability of risk sources. To verify effectiveness, a dataset closely related to financial auditing tasks was constructed, and comparative experiments under an alignment robustness benchmark were conducted. The results show that the proposed method outperforms existing models in ACC, Precision, Recall, and F1-Score, with notable advantages in robustness and interpretability. In addition, hyperparameter sensitivity experiments analyzed the impact of the causal regularization coefficient on model performance, and the results indicate that appropriate causal constraints can significantly improve stability while maintaining predictive accuracy. Overall, the proposed causal representation learning framework enables more precise and reliable risk identification in financial auditing and provides strong support for building intelligent and data-driven auditing systems.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1950.v1

Article

Computer Science and Mathematics

Discrete Mathematics and Combinatorics

On Lexicographic and Colexicographic Orders and the Mirror (Left-Recursive) Reflected Gray Code for m-ary Vectors

Valentin Penev Bakoev

Abstract: In this paper, we investigate the lexicographic and colexicographic orderings of m-ary vectors of length n, as well as the mirror (left-recursive) reflected Gray code, complementing the classical m-ary reflected Gray code. We present efficient algorithms for generating vectors in each of these orders, each achieving constant amortized time per vector. Additionally, we propose algorithms implementing the four fundamental functions in generating combinatorial objects—successor, predecessor, rank, and unrank—each with time complexity Θ(n). The properties and the relationships between these orderings and the set of integers {0,1,…,mn−1} are examined in detail. We define explicit transformations between the different orders and illustrate them as a digraph very close to the complete symmetric digraph. In this way, we provide a unified framework for understanding ranking, unranking, and order conversion. Our approach, based on emulating the execution of nested loops, proves to be powerful and flexible, leading to elegant and efficient algorithms that can be extended to other combinatorial generation problems. The mirror m-ary Gray code introduced here has potential applications in coding theory and related areas. By providing an alternative perspective on m-ary Gray codes, we aim to inspire further research and applications in combinatorial generation and coding.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1941.v1

Article

Computer Science and Mathematics

Algebra and Number Theory

A Unified Proof of the Extended, Generalized, and Grand Riemann Hypothesis

Weicun Zhang

Abstract: The Extended, Generalized, and Grand Riemann Hypotheses are proved under a unified framework, which is based on the divisibility of entire functions contained in the symmetric functional equation, where the uniqueness of zero multiplicities (although unknown) of a given entire function plays a critical role. Consequently, the existence of Landau-Siegel zeros is excluded, thereby confirming the Landau-Siegel zeros conjecture.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202506.0481.v12

Concept Paper

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

AIaaE: Artificial Intelligence as an Experience

Md Twashin Ilahi

Abstract: Contemporary artificial intelligence research and deployment have primarily emphasized task optimization, efficiency, and information processing. While these objectives remain im- portant, they fail to capture an increasingly dominant mode of human-AI interaction: the shaping of human experience. This paper introduces AI-as-an-Experience (AIaaE) as a high- level conceptual paradigm in which artificial intelligence systems are explicitly designed to generate, guide, and sustain human experiences, particularly emotional, psychological, and narrative experiences. The paper formalizes the core concept of AIaaE, situates it within established theories from psychology and human behavior, explains why such a paradigm is becoming inevitable, and examines its long-term societal, ethical, and technological impli- cations. The central claim is that artificial intelligence can be used not merely to perform tasks for humans, but to enable humans to experience structured, meaningful, and evolving states of being.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1920.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

Integrating Large Language Models with Cloud-Native Observability for Automated Root Cause Analysis and Remediation

Chen Wang

Tingzhou Yuan

Cancan Hua

Lu Chang

Xiao Yang

Zhimin Qiu

Abstract: Cloud-native systems based on microservices, containers, and server- less architectures present unprecedented challenges for observabil- ity and incident management. Traditional rule-based monitoring and manual root cause analysis are increasingly inadequate for han- dling the complexity and scale of modern distributed systems. This paper presents a novel framework that leverages large language models (LLMs) to enhance cloud-native observability, enabling automated root cause analysis and self-healing capabilities. Our system integrates OpenTelemetry-based telemetry collection with a domain-adapted LLM capable of performing multimodal analysis over metrics, logs, and traces. Through fine-tuning on operational data and chain-of-thought reasoning, the LLM generates explain- able root cause hypotheses and actionable remediation plans. Exper- imental evaluation on public microservice datasets demonstrates that our approach reduces mean time to resolution (MTTR) by 84.2% compared to rule-based methods, achieving 95% F1-score in anomaly detection while maintaining low computational overhead. The system successfully automated 91% of common incidents with- out human intervention, significantly improving service reliability and reducing operational burden.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1926.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

A Multi-Agent Coding Assistant for Cloud-Native Development: From Requirements to Deployable Microservices

Tian Guan

Abstract: The rapid adoption of cloud-native architectures has created an urgent demand for automated development tools that can translate natural language requirements into deployable cloud-native microservices. While recent advances in large language models (LLMs) have enabled AI-assisted code generation, existing approaches predominantly focus on isolated code completion tasks rather than end-to-end software delivery. This paper presents CloudMAS, a multi-agent coding assistant framework that orchestrates specialized agents to transform user requirements into deployable cloud-native applications. Our system comprises six specialized agents: an Architect Agent for service decomposition and API design, three parallel Coder Agents specialized in backend, frontend, and infrastructure-as- code (IaC) generation respectively, a Tester Agent for automated test synthesis and execution, and an Ops Agent for container configuration and Kubernetes manifest generation. These agents are coordinated by a dedicated Orchestrator Agent that manages workflow execution and conflict resolution. We introduce a novel conflict resolution mechanism that enables agents to iteratively refine outputs through structured feedback loops. To address the lack of systematic benchmarks for end-to-end cloud-native development, we construct CloudDevBench, a publicly available evaluation dataset containing 50 real-world development tasks with associated test suites and deployment validation criteria. Experimental results demonstrate that CloudMAS achieves 92% compilation success, 81% test pass rate, and 84% deployment success rate, substantially outperforming single-LLM and single- agent baselines across all metrics.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1922.v1

Article

Computer Science and Mathematics

Mathematics

Oscillation Detection in Difference Equations with Several Non-Monotone Advanced Arguments via a New Approach

Md Taufiq Nasseef

George Chatzarakis

Emad Attia

Abstract: We investigate the oscillatory behavior of a first-order difference equation with several advanced arguments. New sufficient conditions for oscillation are established, and we show, through carefully constructed counterexamples, that many well-known criteria for equations with a single advanced argument fail to generalize to the several-argument setting, even when each advanced argument is increasing. Several illustrative examples are also provided to demonstrate the sharpness and practical effectiveness of the obtained con- ditions and to highlight their clear improvements over all existing results in the literature

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1829.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

AI-Driven Multimodal Ensemble Framework for Accurate Hardware Failure Detection in Optical Embedded Systems: Eliminating Unnecessary RMAs

Praveen Kumar Pal

Bhavesh Kataria

Jagdish Jangid

Abstract:

Accurately distinguishing true hardware failures from false alarms is a critical requirement in large-scale optical networks, where unnecessary Return Material Authorizations (RMAs) result in significant operational and financial overhead. This paper presents a novel AI-driven predictive framework that integrates multi-domain telemetry fusion, Transformer-based temporal modeling, and a domain-aware hybrid ensemble to deliver carrier-grade hardware failure detection in optical embedded systems. Unlike prior works that rely on single-sensor or threshold-based diagnostics, the proposed approach jointly analyzes optical power fluctuations, laser bias-current drift, TEC thermal instability, voltage dynamics, and DSP-layer soft metrics, enabling the model to capture degradation signatures that emerge only through cross-sensor interactions. A customized ensemble combining Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN)-LSTM, and TimeSeriesBERT is introduced to fuse complementary pattern-recognition capabilities--including long-term drift modeling, high-frequency anomaly detection, and global multi-sensor attention--resulting in superior robustness and generalization. Evaluation of real-time telemetry from optical devices demonstrates the effectiveness of the proposed system, achieving high accuracy with a high F1-score and significantly reducing unnecessary RMAs. These results highlight the novelty and practical value of the presented framework, establishing it as the first comprehensive AI solution tailored for reliable hardware-failure prediction in optical embedded systems.

Abstract:

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1937.v1

Article

Computer Science and Mathematics

Algebra and Number Theory

A Note on Fermat's Last Theorem

Frank Vega

Abstract: Around 1637, Pierre de Fermat famously wrote in the margin of a book that he had a proof for the equation aⁿ + bⁿ = cⁿ having no positive integer solutions for exponents n > 2. While Andrew Wiles provided a complete proof in 1994 using advanced 20th-century machinery, the question of whether a simpler proof exists remains a subject of intense mathematical interest. In this work, we focus on a significant restricted case of the theorem: the situation in which the exponent n possesses a prime divisor p that does not divide the quantity abc. Under this natural arithmetic condition, we develop an elementary argument—based on Barlow’s Relations and p-adic valuations—that leads to a contradiction. These methods lie closer to the classical number-theoretic framework that Fermat himself might have envisioned, and they illuminate structural features of the Fermat equation that persist across related Diophantine problems.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202109.0480.v14

Article

Computer Science and Mathematics

Computer Vision and Graphics

Improving Prostate Cancer Segmentation on T2-Weighted MRI Using Prostate Detection and Cascaded Networks

Nikolay Nefediev

Nikolay Staroverov

Roman Davydov

Abstract: Prostate cancer is one of the most lethal cancers in the male population, and accurate localization of intraprostatic lesions on MRI remains challenging. In this work, we investigate methods for improving prostate cancer segmentation on T2weighted pelvic MRI using cascaded neural networks. We use anonymized dataset of 400 multiparametric MRI studies from two centers, in which experienced radiologists delineated the prostate and clinically significant cancer on the T2 series. Our baseline approach applies 2D and 3D segmentation networks (UNETR, UNET++, SwinUNETR, SegResNetDS, SegResNetVAE) directly to full MRI volumes. We then introduce additional stages that filter slices using DenseNet201 classifiers (cancer/nocancer and prostate/noprostate) and localize the prostate with a YOLObased detector to crop a 3D region of interest before segmentation. Using SwinUNETR as the backbone, the prostate segmentation Dice score increased from 71.37% for direct 3D segmentation to 76.09% when using prostate detection and cropped 3D inputs. For cancer segmentation, the final cascaded pipeline – prostate detection, 3D prostate segmentation, and 3D cancer segmentation within the prostate – improved Dice score from 55.03% for direct 3D segmentation to 67.11%, with a ROC AUC of 0.89 on the test set. These results suggest that cascaded detection and segmentationbased preprocessing of the prostate region can substantially improve automatic prostate cancer segmentation on MRI while remaining compatible with standard segmentation architectures.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1888.v1

Article

Computer Science and Mathematics

Artificial Intelligence and Machine Learning

FlashServe: Cost-Efficient Serverless Inference Scheduling for Large Language Models via Tiered Memory Management and Predictive Autoscaling

Bolin Chen

Abstract: Deploying Large Language Models (LLMs) in cloud environments presents significant challenges due to their substantial memory footprint and computational requirements. While serverless architectures offer attractive pay-per-use economics, they suffer from prohibitively long cold start times when loading multi-gigabyte model weights into GPU memory. This paper presents FlashServe, a serverless LLMinference system that achieves fast cold starts through three key innovations: (1) a tiered memory snapshotting mechanism that pre-stages model checkpoints in host DRAM and leverages high-speed DMAtransfers via PCIe for rapid GPU memory loading, (2) a hybrid Prophet-LSTM prediction model for proactive pod pre-warming based on request arrival patterns, and (3) efficient LoRA adapter multiplexing that enables serving multiple fine-tuned models on shared GPU resources. Extensive experiments on the Azure Functions trace dataset demonstrate that FlashServe reduces cold start latency by up to 49× compared to baseline S3-based loading approaches and by 3.3× compared to state-of-the-art systems like ServerlessLLM. Under realistic bursty workloads, FlashServe achieves 32% reduction in GPU idle costs while maintaining sub-second time-to-first-token (TTFT) latency for 95%of requests. These results demonstrate that FlashServe represents meaningful progress toward practical serverless LLM deployment.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1908.v1

Article

Computer Science and Mathematics

Information Systems

Adaptive Latent Interaction Reasoning for Multimodal Misinformation Analysis

Tyler Anderson

Madeline Brooks

Ava Martinez

Jordan Williams

Abstract: The rapid growth of online social platforms has fundamentally transformed the way information is produced, disseminated, and consumed, while simultaneously amplifying the societal impact of misleading and fabricated content. In response to this challenge, multimodal fake news detection has emerged as a critical research problem, aiming to jointly leverage textual and visual signals embedded in social media posts. Existing methods predominantly rely on direct fusion of unimodal representations or shallow cross-modal interactions, which often fail to explicitly model the semantic alignment and latent inconsistencies across modalities. In particular, the potential of contrastive learning paradigms for learning robust and semantically grounded multimodal representations in fake news scenarios remains underexplored. In this work, we introduce ALIGNER, an Adaptive Latent Interaction Guided coNtrastivE Reasoning framework designed for multimodal fake news detection. ALIGNER adopts a dual-encoder architecture to learn modality-specific semantic representations and employs cross-modal ontrastive learning to explicitly align visual and textual semantics. To address the inherent noise and ambiguity of image–text associations in real-world fake news data, wefurther propose a latent consistency objective that relaxes the rigid one-hot supervision imposed by conventional contrastive losses. This auxiliary learning signal enables the model to capture fine grained semantic relatedness among unpaired or weakly related multimodal samples. Building upon the aligned unimodal features, ALIGNER incorporates a dedicated cross-modal interaction module to capture higher-order correlations between visual and linguistic representations. Moreover, wedesign an attention-based aggregation mechanism equipped with an explicit guidance signal to adaptively weigh the contributions of different modalities during decision making, thereby enhancing both effectiveness and interpretability. Extensive experiments conducted on two widely adopted benchmarks, Twitter and Weibo, demonstrate that ALIGNER consistently surpasses existing state of-the-art approaches by a substantial margin, highlighting the advantages of adaptive contrastive reasoning for multimodal fake news detection.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1914.v1

Article

Computer Science and Mathematics

Other

Efficient Querying of Federated Large-Scale Clinical RDF Knowledge Graphs in the Swiss Personalized Health Network

Andrea Brites Marto

Philip Krauss

Katie Kalt

Vasundra Touré

Deepak Unni

Sabine Österle

Abstract: The Swiss Personalized Health Network developed a national federated framework for semantically described medical data, in particular hospital clinical routine data. Instead of centralizing patient-level information, hospitals perform semantic coding and standardization locally and store SPHN-compliant data in a triple store. These decentralized RDF datasets, following the FAIR (Findable, Accessible, Interoperable, Reusable) principles, together exceed 12 billion triples across more than 800,000 patients, all signed a broad consent. In this work, we address the computational challenge of efficiently querying and integrating these distributed RDF resources through SPARQL. Our use cases focus on feasibility queries and value distribution, which allow researchers to assess the potential availability of patient cohorts across hospitals without disclosing sensitive patient-level information. We present methods for optimizing SPARQL querying, tailored to the characteristics of large-scale federated and complex clinical data. We evaluate these approaches by iteratively testing optimized queries on the SPHN Federated Clinical Routine Dataset, which spans 125 SPHN concepts including demographics, diagnoses, procedures, medications, laboratory results, vital signs, clinical scores, allergies, microbiology, intensive care data, oncology, and biological samples. With this approach, we’ve built a set of rules to consider for gradually optimizing SPARQL queries. Our results demonstrate that optimized SPARQL query planning and execution can significantly reduce response times without compromising semantic interoperability.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1909.v1

Article

Computer Science and Mathematics

Computer Vision and Graphics

The Mumford-Shah Functional for Image Segmentation Applied to Landscape Planning: A Comparison of Numerical Approximation Methods

Amedeo Ganciu

Giovannangela Ricci

Margherita Solci

Abstract: This study investigates the application of the Mumford-Shah functional, a foundational variational model in image segmentation, to the challenge of Land Use/Land Cover (LuLc) mapping within an Object-Based Image Analysis (OBIA) framework. Recognizing that no single segmentation algorithm is universally optimal, the research focuses on comparing two distinct numerical approximations to assess their suitability for processing satellite imagery. The methodological approach involved the systematic evaluation of two algo-rithms: the Ambrosio-Tortorelli method and the Active Contour Snake model. To ensure controlled conditions, a series of synthetic test images were created, featuring basic geo-metric shapes representing common landscape features. These images were designed across multiple scenarios, ranging from those with clean, high-contrast edges to more challenging cases with low contrast and introduced intra-class noise, simulating re-al-world complexities. The performance of each algorithm was rigorously measured using established statistical metrics, namely Cohen's Kappa and the Jaccard Index, to quantify segmentation accuracy against a known ground truth. The findings reveal a clear distinc-tion in algorithmic behavior. While both methods achieved high accuracy in ideal, high-contrast conditions, their performance diverged significantly under stress. The Am-brosio-Tortorelli algorithm proved notably more robust, effectively maintaining closed and coherent object boundaries even in the presence of noise and low spectral contrast. Conversely, the Snake model was highly sensitive to these conditions, often resulting in fragmented contours or complete failure to delineate objects. In conclusion, this compara-tive analysis demonstrates that the choice between these two approaches is not arbitrary but critically dependent on the nature of the input data. The study provides practical guidance, suggesting that the global, variational approach of Ambrosio-Tortorelli is better suited for the noisy and spectrally complex scenes often encountered in territorial analy-sis. Meanwhile, the Snake model may be reserved for more controlled scenarios with sharp, well-defined edges. This work thus contributes a reasoned framework for algorithm selection, aiming to enhance the precision and reliability of segmentation in sustainable landscape monitoring workflows.

Posted: 22 December 2025

https://doi.org/10.20944/preprints202512.1878.v1

of 625