Intelligent Orchestration of Distributed Large Foundation Model Inference at the Edge

Fernando Koch; Aladin Djuhera; Alecio Binotto

doi:10.37256/cnc.3220256807

Authors

Fernando Koch Department of Electrical Engineering and Computer Sciences, Florida Atlantic University, Boca Raton, FL, USA
Aladin Djuhera Chair of Theoretical Information Technology, Technical University Munich, Munich, Germany
Alecio Binotto Zeiss Digital Partners, Carl Zeiss AG, Munich, Germany

DOI:

https://doi.org/10.37256/cnc.3220256807

Keywords:

large language models, split inference, edge artificial intelligence, edge computing

Abstract

Large Foundation Models (LFMs), including multi-modal and generative models, promise to unlock new capabilities for next-generation Edge AI applications. However, performing inference with LFMs in resource-constrained and heterogeneous edge environments, such as Multi-access Edge Computing (MEC), presents significant challenges for workload orchestration due to time-varying network, compute, and storage conditions. In particular, current split inference strategies, which partition LFM layers across nodes, are not designed to adapt to fluctuating workloads, dynamic bandwidth conditions, or evolving privacy constraints in high-utilization MEC environments. In this work, we propose a novel adaptive split inference orchestration framework that elevates both the placement and partitioning of LFM layers to runtime-tunable variables. Specifically, our framework enables real-time, quality-of-service (QoS)-aware management of inference workloads by extending conventional orchestrators with three key services: (1) Capacity-aware workload distribution, which continuously profiles node resources and selects an optimal subset of MEC nodes; (2) Dynamic partition migration, which transparently relocates pre-cut LFM segments in response to changes in utilization or network conditions;(3) Real-time reconfiguration, which dynamically re-splits LFM layers to balance latency, throughput, and privacy. We formalize the joint placement-partitioning problem, outline a reference architecture and algorithmic workflow, and discuss applicability in representative smart city, V2X, and industrial edge scenarios.

Intelligent Orchestration of Distributed Large Foundation Model Inference at the Edge

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License