/agenda
Generative AI: From Hype to Deployment with Arm Neoverse
The use of ML and generative AI is rapidly shifting from hype into adoption, creating the need for more efficient inferencing at scale. Large language models are getting smaller and more specialized, offering comparable or improved performance at a fraction of the cost and energy. Advances in inferencing techniques, like quantization, sparse-coding, and the rise of specialized, lightweight frameworks like llama.cpp, enable LLMs to run on CPUs and provide good performance for a wide variety of use-cases. Arm has been advancing the capabilities of Neoverse cores to address both the compute and memory needs of LLMs, while maintaining its focus on efficiency. Popular ML frameworks like llama.cpp, PyTorch, and ML compilers allow easy migration of ML models to Arm-based cloud instances. These hardware and software improvements have led to an increase in on-CPU ML performance for use cases like LLMs and recommenders. This gives AI application developers flexibly in choosing CPUs or GPUs, depending on use case and sustainability targets. Arm is also making it possible for ML system designers to create their own bespoke ML solutions including accelerators combined with CPU chiplets to offer the best of both worlds.
Schedule
Generative AI: From Hype to Deployment with Arm Neoverse
The use of ML and generative AI is rapidly shifting from hype into adoption, creating the need for more efficient inferencing at scale. Large language models are getting smaller and more specialized, offering comparable or improved performance at a fraction of the cost and energy. Advances in inferencing techniques, like quantization, sparse-coding, and the rise of specialized, lightweight frameworks like llama.cpp, enable LLMs to run on CPUs and provide good performance for a wide variety of use-cases. Arm has been advancing the capabilities of Neoverse cores to address both the compute and memory needs of LLMs, while maintaining its focus on efficiency. Popular ML frameworks like llama.cpp, PyTorch, and ML compilers allow easy migration of ML models to Arm-based cloud instances. These hardware and software improvements have led to an increase in on-CPU ML performance for use cases like LLMs and recommenders. This gives AI application developers flexibly in choosing CPUs or GPUs, depending on use case and sustainability targets. Arm is also making it possible for ML system designers to create their own bespoke ML solutions including accelerators combined with CPU chiplets to offer the best of both worlds.