[Webinar] Maximizing LLM Inference Throughput
Learn how Fixstars engineers boost LLM inference—cut TTFT, raise throughput, break memory bottlenecks, with live demo & Q&A.
Please join the webinar from this link: https://us06web.zoom.us/webinar/register/WN_it__-lAdQoqpM5p0c1EwBQ
Join us for our upcoming webinar, followed by a live Q&A with Fixstars.
This session focuses on the critical shift from AI training to high-efficiency inference, exploring how to maximize performance and cost efficiency as LLMs move into production.
You’ll dive into inference optimization from a systems and runtime perspective—learning how to identify and eliminate bottlenecks across GPU compute, memory bandwidth, and decoding pipelines that directly impact latency and throughput.
You’ll gain practical insights into:
- Profiling LLM inference workloads to uncover latency spikes and utilization gaps
- Addressing the “memory wall” through efficient memory handling and custom GPU kernels
- Applying advanced decoding and execution optimizations to reduce TTFT and improve throughput
Through real-world inference scenarios and a guided demonstration of Fixstars AIBooster, discover how engineers are achieving:
- Significant reductions in Time To First Token (TTFT)
- Higher end-to-end inference throughput on existing hardware
- Lower operational costs through targeted, automated inference tuning
Who Should Watch
- ML engineers deploying LLMs in production
- Platform and infrastructure engineers optimizing inference stacks
- MLOps and DevOps teams managing GPU-backed inference services
- Tech leads and architects focused on latency, scalability, and cost control
Whether you’re serving LLMs at scale, optimizing real-time AI applications, or pushing more performance out of your current GPU infrastructure, this session delivers concrete, production-ready strategies to make inference faster, more predictable, and more cost-efficient.
Learn how Fixstars engineers boost LLM inference—cut TTFT, raise throughput, break memory bottlenecks, with live demo & Q&A.
Please join the webinar from this link: https://us06web.zoom.us/webinar/register/WN_it__-lAdQoqpM5p0c1EwBQ
Join us for our upcoming webinar, followed by a live Q&A with Fixstars.
This session focuses on the critical shift from AI training to high-efficiency inference, exploring how to maximize performance and cost efficiency as LLMs move into production.
You’ll dive into inference optimization from a systems and runtime perspective—learning how to identify and eliminate bottlenecks across GPU compute, memory bandwidth, and decoding pipelines that directly impact latency and throughput.
You’ll gain practical insights into:
- Profiling LLM inference workloads to uncover latency spikes and utilization gaps
- Addressing the “memory wall” through efficient memory handling and custom GPU kernels
- Applying advanced decoding and execution optimizations to reduce TTFT and improve throughput
Through real-world inference scenarios and a guided demonstration of Fixstars AIBooster, discover how engineers are achieving:
- Significant reductions in Time To First Token (TTFT)
- Higher end-to-end inference throughput on existing hardware
- Lower operational costs through targeted, automated inference tuning
Who Should Watch
- ML engineers deploying LLMs in production
- Platform and infrastructure engineers optimizing inference stacks
- MLOps and DevOps teams managing GPU-backed inference services
- Tech leads and architects focused on latency, scalability, and cost control
Whether you’re serving LLMs at scale, optimizing real-time AI applications, or pushing more performance out of your current GPU infrastructure, this session delivers concrete, production-ready strategies to make inference faster, more predictable, and more cost-efficient.
Good to know
Highlights
- 45 minutes
- Online