Abstract: Large language models (LLMs) have demonstrated remarkable success across various domains. However, the substantial computational and memory demands pose significant challenges for deploying ...