[Webinar] Maximizing LLM Inference Throughput

[Webinar] Maximizing LLM Inference Throughput

Online event
Thursday, Feb 26 from 12 pm to 12:45 pm PST
Overview

Learn how Fixstars engineers boost LLM inference—cut TTFT, raise throughput, break memory bottlenecks, with live demo & Q&A.

Please join the webinar from this link: https://us06web.zoom.us/webinar/register/WN_it__-lAdQoqpM5p0c1EwBQ

Join us for our upcoming webinar, followed by a live Q&A with Fixstars.

This session focuses on the critical shift from AI training to high-efficiency inference, exploring how to maximize performance and cost efficiency as LLMs move into production.


You’ll dive into inference optimization from a systems and runtime perspective—learning how to identify and eliminate bottlenecks across GPU compute, memory bandwidth, and decoding pipelines that directly impact latency and throughput.


You’ll gain practical insights into:

  • Profiling LLM inference workloads to uncover latency spikes and utilization gaps
  • Addressing the “memory wall” through efficient memory handling and custom GPU kernels
  • Applying advanced decoding and execution optimizations to reduce TTFT and improve throughput


Through real-world inference scenarios and a guided demonstration of Fixstars AIBooster, discover how engineers are achieving:

  • Significant reductions in Time To First Token (TTFT)
  • Higher end-to-end inference throughput on existing hardware
  • Lower operational costs through targeted, automated inference tuning


Who Should Watch

  • ML engineers deploying LLMs in production
  • Platform and infrastructure engineers optimizing inference stacks
  • MLOps and DevOps teams managing GPU-backed inference services
  • Tech leads and architects focused on latency, scalability, and cost control


Whether you’re serving LLMs at scale, optimizing real-time AI applications, or pushing more performance out of your current GPU infrastructure, this session delivers concrete, production-ready strategies to make inference faster, more predictable, and more cost-efficient.

Learn how Fixstars engineers boost LLM inference—cut TTFT, raise throughput, break memory bottlenecks, with live demo & Q&A.

Please join the webinar from this link: https://us06web.zoom.us/webinar/register/WN_it__-lAdQoqpM5p0c1EwBQ

Join us for our upcoming webinar, followed by a live Q&A with Fixstars.

This session focuses on the critical shift from AI training to high-efficiency inference, exploring how to maximize performance and cost efficiency as LLMs move into production.


You’ll dive into inference optimization from a systems and runtime perspective—learning how to identify and eliminate bottlenecks across GPU compute, memory bandwidth, and decoding pipelines that directly impact latency and throughput.


You’ll gain practical insights into:

  • Profiling LLM inference workloads to uncover latency spikes and utilization gaps
  • Addressing the “memory wall” through efficient memory handling and custom GPU kernels
  • Applying advanced decoding and execution optimizations to reduce TTFT and improve throughput


Through real-world inference scenarios and a guided demonstration of Fixstars AIBooster, discover how engineers are achieving:

  • Significant reductions in Time To First Token (TTFT)
  • Higher end-to-end inference throughput on existing hardware
  • Lower operational costs through targeted, automated inference tuning


Who Should Watch

  • ML engineers deploying LLMs in production
  • Platform and infrastructure engineers optimizing inference stacks
  • MLOps and DevOps teams managing GPU-backed inference services
  • Tech leads and architects focused on latency, scalability, and cost control


Whether you’re serving LLMs at scale, optimizing real-time AI applications, or pushing more performance out of your current GPU infrastructure, this session delivers concrete, production-ready strategies to make inference faster, more predictable, and more cost-efficient.

Good to know

Highlights

  • 45 minutes
  • Online

Location

Online event

Frequently asked questions
Organized by
Fixstars
Followers--
Events8
Hosting--
Report this event