Shanghai

Session

Vision AI SoC platform based on an Arm IPs Subsystem

Speaker: Kevin Ku

This Soc is designed for efficient Vision AI perfomance. Unlikely exsiting SoC, it's divided and designed with a dedicated Sub system for specific vision processing and AI processing / operating .
This design approach provides benefits for performance, flexibility and development period.
Main Subsystem & Function
1. CPU subsystem : Main CPU(Cortex-A76 Quad 2clusters)와 L1&L2 Cache, GIC(Generic Interrupt Controller) & Debugging subsystem(Coresight)
2. ISP(Image Signal Processor) : Through Camera sensor, raw image datademosaicing, noise reduction, color correction, white balance adjustment are processing
3. NPU :Through AI algorithm accelerator for Vision, object detection, classification & segmentation using CNN or Transfromer
4. GPU : Addtional Acceleratotion for specific vision or AI tasks in conjuction with NPU
5. Memory subsystem : High bandwidth memory access for high-speed data transfer & sharing between each subsystems.

1. CPU subsystem : Main CPU(Cortex-A76 Quad 2clusters)와 L1&L2 Cache, GIC(Generic Interrupt Controller) & Debugging subsystem(Coresight)

2. ISP(Image Signal Processor) : Through Camera sensor, raw image datademosaicing, noise reduction, color correction, white balance adjustment are processing

3. NPU :Through AI algorithm accelerator for Vision, object detection, classification & segmentation using CNN or Transformer

4. GPU : Addtional Acceleration for specific vision or AI tasks in conjuction with NPU

5. Memory subsystem : High bandwidth memory access for high-speed data transfer & sharing between each subsystem.

Open PDF

Session

Traveling in an AI-Enabled Car

The presence of AI has been in vehicles for at least a decade, though recent advances have supported its pervasiveness. AI is foundational for improving in-vehicle user experiences and automated features, yet it brings a set of unique challenges not faced by other industries. Here we discuss how the future can become reality and where there are opportunities to solve some of the greatest challenges .

Open PDF

Session

AI HPC Server’s latest Lego-like architecture─Arm CSS V3, Chiplet and UCIe

Speaker: Jeffrey Ju, Egis Group

Chip design has once again seen a paradigm shift. From the north and south bridges of the PC era, with the development of Moore's Law, all IPs were integrated into an SoC, and the chiplet architecture was born due to the development limit of Moore's Law. Egis/Alcor Micro will stand on the shoulders of two giants, Arm CSS V3 and TSMC, and combine the two key chiplet technologies - UCIe and CoWoS to launch the scalable, high-performance and most power-saving AI HPC Server solution. We will combine CPU chips, AI Accelerator chips and IO chips in a Lego-style, highly flexible form to create the AI HPC Server product that best meets customer needs.

Open PDF

Session

AI in Mobile and Consumer Devices - Today and Tomorrow

Consumer Devices - Today and Tomorrow
The rapid evolution of AI is reshaping mobile and consumer devices, presenting new opportunities, such as creating innovative AI driven screen experiences and products that adapt seamlessly to user needs, changing levels of productivity and integrating into our daily lives.
Generative AI is one of the buzz word of the year and is becoming one of the pillars of the next generation of AI. Technology advances and the pace of innovation are increasing rapidly.
So how will the next generation of AI change compute requirments and redefine technology parameters in mobile and consumer devices?

Open PDF

Session

Accelerate the Execution of Generative AI Workloads on Arm-based Hardware

KleidiAI is a set of micro-kernels that integrates into machine learning frameworks, accelerating AI inference on Arm-based platforms. KleidiAI’s micro-kernels are hand-optimized in Arm assembly code to leverage modern architecture instructions that greatly speed up AI inference on Arm CPUs. This is an introductory topic for developers who want to learn how to use KleidiAI to accelerate the execution of Generative AI workloads on hardware.

Open PDF

Session

Accelerating Automotive System Development with Virtual Prototyping

Virtual prototyping helps automotive semiconductor, tier 1, and OEM companies speed up development, boost productivity, cut costs and enhance hardware and software quality for future vehicles. This session explores the virtual prototyping offerings delivered by Arm partners using the latest Arm Automotive Enhanced (AE) technology, which can accelerate development cycles by up to two years.

Open PDF

Session

Benchmarking Machine Learning Workloads on Arm Servers

Benchmarking ML inference is crucial for software developers as it ensures optimal performance and efficiency of machine learning workloads. Attendees will learn how to install and run TensorFlow on their Arm-based cloud servers and utilize the MLPerf Inference benchmark suite from MLCommons to evaluate ML performance.

Open PDF

Session

LLM Inference on Android with KleidiAI, MediaPipe, and XNNPACK

Arm has worked with the Google AI Edge team to integrate KleidiAI into the MediaPipe framework through XNNPACK. These improvements increase the throughput of quantized LLMs running on Arm chips that contain the i8mm feature. This presentation will share new techniques for Android developers who want to efficiently run LLMs on-device.

Open PDF

Session

Moving on up to Armv9: The Next-level of Performance and Features

The latest Armv9 architecutre delivers industry-leading architecture enhancements to help increase compute capabilities with more AI performance for each generation, from Matmul and Neon, to SVE2. Join this session for the inside track on enabling more efficienct AI compute for your next-gen solution.

Open PDF

Session

AI on the Edge with ONELab

Speaker: Bill Fletcher

AI at the edge is built on the foundation of robust framework of secure, connected devices to perform critical tasks in real-time. In this session, Linaro wants to stress the accent on the importance of a solution like ONELab, a solution that empowers businesses to build AI-driven edge devices that are secure, compliant, ready for deployment and faster to be launched on the market. With ONELab the full potential of AI at the edge for innovative applications has never been closer and easier.

Open PDF

Session

Driving the Transition to High-Performance IoT Platforms

The IoT ecosystem has already shipped billions of chips based on Arm, so its fair to say the IoT already runs on Arm. The IoT industry however, never stands still. We are seeing a rapid market acceleration that demands even higher performance solutions to deliver new and exciting use cases. We all therefore have to continue to innovate.

In this session, we present our hardware, software, and standards solutions that enable our customers and the entire Arm ecosystem to participate and win in this rapidly changing environment. Join us to learn more about the latest and greatest technology Arm is creating for the leaders in IoT to succeed.

Open PDF

Session

Generative AI: From Hype to Deployment with Arm Neoverse

The use of ML and generative AI is rapidly shifting from hype into adoption, creating the need for more efficient inferencing at scale. Large language models are getting smaller and more specialized, offering comparable or improved performance at a fraction of the cost and energy. Advances in inferencing techniques, like quantization, sparse-coding, and the rise of specialized, lightweight frameworks like llama.cpp, enable LLMs to run on CPUs and provide good performance for a wide variety of use-cases. Arm has been advancing the capabilities of Neoverse cores to address both the compute and memory needs of LLMs, while maintaining its focus on efficiency. Popular ML frameworks like llama.cpp, PyTorch, and ML compilers allow easy migration of ML models to Arm-based cloud instances. These hardware and software improvements have led to an increase in on-CPU ML performance for use cases like LLMs and recommenders. This gives AI application developers flexibly in choosing CPUs or GPUs, depending on use case and sustainability targets. Arm is also making it possible for ML system designers to create their own bespoke ML solutions including accelerators combined with CPU chiplets to offer the best of both worlds.

Open PDF

Session

Unlocking AI and Generative AI Use Cases on Edge IoT Devices

New AI use cases driven by Generative AI hold exciting potential for Edge AI applications, particularly in creating tangible business value with impactful applications across industries and business functions. A trend emerged with the introduction of smaller, more efficient and small models, especially tiny LLMs. These advancements are facilitating the deployment at the edge, ensuring data stays on the device, thus safeguarding individual privacy, and enhancing user experience by reducing latency and improving responsiveness. Join us as we unveil real-world examples of Arm-powered AI and Generative AI solutions spanning diverse industries. Explore how Arm enables you to harness the complete potential of Generative AI at the edge, even on the smallest devices, revolutionizing your business and sculpting a smarter, more interconnected future.

Open PDF

Session

Changes to Arm Compiler for Embedded

Arm Compiler for Embedded (also known as AC6) is very widely used for software development for Arm-based products. Developed and supported by true Arm experts, combining early support for new Arm architectures and cores with highly competitive scores on key embedded benchmarks, and a safety-qualified variant for development of safety-critical systems, Arm Compiler for Embedded is THE professional embedded toolchain for Arm. We’re investing in significant changes for Arm Compiler for Embedded, to bring even more value to developers of Arm-based embedded products. These changes include POSIX support to enable use of Rich embedded Operating Systems, more security features for developers with interests in cyber-security and memory safety, and better compatibility with GCC. We’re also creating a free to use, 100% open source toolchain LLVM technology, with identical functionality and performance to our professional/commercial toolchain. Whichever compilation toolchain you currently use for Arm-based development, the changes discussed in this session are going to be of great interest to you.

Open PDF

Session

Arm Neoverse – Future of Infrastructure is Built on Arm

The rapid expansion of AI has led to a major shift in infrastructure technology. Arm Neoverse emerges as platform of choice, offering the best combination of performance, efficiency, and design flexibility. This roadmap session explores how Arm Neoverse forms the foundation for partner innovation across cloud, wireless, networking, HPC, and edge, enabling the deployment of performant and vastly more efficient AI infrastructure on Arm. From Neoverse IP products to the Arm Neoverse Compute Subsystems (CSS) and Arm Total Design, we show why Arm Neoverse is the platform of choice for industry leaders and how it is pivotal in accelerating and shaping the future of AI infrastructure.

Open PDF

Session

Accelerating Innovation with ADTechnology & Arm: HPC/AI, Chiplets, and Beyond

Speaker: Eun-Joo Lee, ADTechnology

ADTechnology will cover partnership strategies and the latest developments in next-generation semiconductor solutions utilizing Arm's platform architecture. Advanced Design Platform (ADP™) and Advanced Design Flow (Capella™) will be introduced maximizing performance in HPC/AI hyperscaler environments by leveraging a customizable SoC platform for HPC/AI applications and share key technological developments. Through this collaboration and innovation, ADTechnology, together with its partners and industry leaders, aims to contribute to increasing the scalability and competitiveness of modern computing architectures. With these strategies and visions, attendees will gain insight into the specific direction of the partnership between ADTechnology and Arm expected to establish specific directions for positioning itself as a key driver of future technological advancement.

Regarding chip design and implementation, the presentation will cover the evolution to Arm Neoverse V3 and the development progress from 5nm to 2nm nodes, highlighting ADTechnology’s strategy for maximizing performance and energy efficiency. Additionally, it will address business collaboration and co-development opportunities leveraging Arm’s ecosystem to accelerate time-to-market and lower market entry barriers.

Open PDF

Session

Automotive Compute Systems and the Journey to Chiplets

Chiplets are offering automotive OEMs and suppliers more flexibility to customize their silicon as part of the shift to software-defined vehicles. As the industry embraces chiplet technology, it requires standards to ensure compatibility between chiplets from different providers and to create an easy-to-build platform. Here, we explore the excitement around chiplets in the automotive sector and Arm's role in supporting the development of standards and foundational compute platforms for its expansive ecosystem.

Open PDF

Session

Latest Practices for Optimizing LLM Inference on Arm-based Mobile Devices with MNN

Speaker: Wang Zhaode (王召德)

LLMs are extensively utilized in natural language processing, intelligent assistants, and content generation, driving a growing demand for on-device inference capabilities on mobile devices. However, Arm-based mobile devices face significant challenges in efficiently deploying LLMs due to constraints in computational resources, power consumption, and memory capacity. This presentation delves into optimization techniques based on the MNN framework, aiming to enhance the performance and efficiency of LLM inference on Arm architecture mobile devices.

Open PDF

sessions: Track A

Vision AI SoC platform based on an Arm IPs Subsystem

Speaker:

Kevin Ku

前端设计团队CTO CoAsia,

1. CPU subsystem : Main CPU(Cortex-A76 Quad 2clusters)와 L1&L2 Cache, GIC(Generic Interrupt Controller) & Debugging subsystem(Coresight)

2. ISP(Image Signal Processor) : Through Camera sensor, raw image datademosaicing, noise reduction, color correction, white balance adjustment are processing

3. NPU :Through AI algorithm accelerator for Vision, object detection, classification & segmentation using CNN or Transformer

4. GPU : Addtional Acceleration for specific vision or AI tasks in conjuction with NPU

5. Memory subsystem : High bandwidth memory access for high-speed data transfer & sharing between each subsystem.

Open PDF

sessions: Track B

Traveling in an AI-Enabled Car

Speaker:

Open PDF

sessions: Track A

AI HPC Server’s latest Lego-like architecture─Arm CSS V3, Chiplet and UCIe

Speaker:

Jeffrey Ju

Senior Executive Assistant of CEO, Egis Group, Egis Group

Open PDF

sessions: Track B

AI in Mobile and Consumer Devices - Today and Tomorrow

Speaker:

Open PDF

sessions: Track D

Accelerate the Execution of Generative AI Workloads on Arm-based Hardware

Speaker:

Open PDF

sessions: Track C

Accelerating Automotive System Development with Virtual Prototyping

Speaker:

Open PDF

sessions: Track D

Benchmarking Machine Learning Workloads on Arm Servers

Speaker:

Open PDF

sessions: Track D

LLM Inference on Android with KleidiAI, MediaPipe, and XNNPACK

Speaker:

Open PDF

sessions: Track B

Moving on up to Armv9: The Next-level of Performance and Features

Speaker:

Open PDF

sessions: Track D

AI on the Edge with ONELab

Speaker:

Bill Fletcher

Linaro 销售总监,

Open PDF

sessions: Track C

Driving the Transition to High-Performance IoT Platforms

Speaker:

Open PDF

sessions: Track B

Generative AI: From Hype to Deployment with Arm Neoverse

Speaker:

Open PDF

sessions: Track C

Unlocking AI and Generative AI Use Cases on Edge IoT Devices

Speaker:

Open PDF

sessions: Track C

Changes to Arm Compiler for Embedded

Speaker:

Open PDF

sessions: Track A

Arm Neoverse – Future of Infrastructure is Built on Arm

Speaker:

Open PDF

sessions: Track B

Accelerating Innovation with ADTechnology & Arm: HPC/AI, Chiplets, and Beyond

Speaker:

Eun-Joo Lee

SoC 设计部门 / 高级副总裁兼部门主任 ADTechnology, ADTechnology

Open PDF

sessions: Track C

Automotive Compute Systems and the Journey to Chiplets

Speaker:

Open PDF

sessions: Track A

Latest Practices for Optimizing LLM Inference on Arm-based Mobile Devices with MNN

Speaker:

Wang Zhaode (王召德)

阿里巴巴淘天集团技术专家,

Open PDF

Shanghai, China

Presentations