Real-Time and Offline Large Language Models on Edge Devices: A Systematic Review

Erçin Dinçer; Zeynep Hilal Kilimci

doi:10.20944/preprints202512.2383.v1

Submitted:

24 December 2025

Posted:

26 December 2025

You are already at the latest version

Abstract

Large Language Models (LLMs) have recently gained prominence for deployment on edge devices, owing to their potential to support privacy-preserving, low-latency, and offline inference. Nevertheless, their considerable computational and memory requirements present fundamental challenges in both real-time and offline scenarios. This systematic review synthesizes evidence from 49 studies, of which 40 were analyzed in depth, to investigate techniques, challenges, and applications of LLM deployment on edge devices. The studies were identified through a structured search and screening process, and data were extracted regarding model types, hardware platforms, optimization strategies, and performance outcomes. Findings indicate that hardware acceleration, model compression, and hybrid edge–cloud strategies can yield latency reductions of up to 972×, memory savings of up to 130×, and energy efficiency improvements exceeding 1600×, while largely preserving accuracy. Real-time deployments are predominantly applied in robotics, healthcare monitoring, and autonomous driving, whereas offline deployments are tailored to privacy-sensitive or batch-oriented contexts. The review also identifies persistent research gaps, including the absence of standardized benchmarks and the limited generalizability of results to real-world environments. It concludes by outlining future research directions, with particular emphasis on hardware–software co-design, federated learning, and secure task offloading.

Keywords:

large language models

;

edge computing

;

on-device AI

;

real-time inference

;

edge–cloud collaboration

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Real-Time and Offline Large Language Models on Edge Devices: A Systematic Review

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe