Abstract: The extremely high computational and storage demands of large language models have excluded most edge devices, which were widely used for efficient machine learning, from being viable ...