Who is this webinar for?

- AI Service Architects: Engineers responsible for deploying LLMs into production environments. - Inference Infrastructure Engineers: Professionals focused on scaling AI workloads and minimizing latency. - Technical Decision Makers: Leaders looking to optimize GPU cloud costs and improve AI service

Will the session be technical?

Yes—but approachable. We focus on real-world problems and hands-on strategies, not just theory. Our case studies and demo walk you through concepts in an accessible way.

Is Fixstars AIBooster free to use?

We offer free access to our Performance Observability tool and a trial of the Performance Intelligence module. You can also request a 1-on-1 consultation to evaluate your current environment and explore how AIBooster can support your optimization goals.

Can I ask specific questions about my environment?

Absolutely. We encourage questions during the live session, and you’ll also have the chance to book a free consultation with one of our engineers afterward.

[Webinar] Maximizing LLM Inference Throughput Tickets, Thursday, Feb 26 from 8 pm to 8:45 pm UTC

Overview

Learn how Fixstars engineers boost LLM inference—cut TTFT, raise throughput, break memory bottlenecks, with live demo & Q&A.

Please join the webinar from this link: https://us06web.zoom.us/webinar/register/WN_it__-lAdQoqpM5p0c1EwBQ

Join us for our upcoming webinar, followed by a live Q&A with Fixstars.

This session focuses on the critical shift from AI training to high-efficiency inference, exploring how to maximize performance and cost efficiency as LLMs move into production.

You’ll dive into inference optimization from a systems and runtime perspective—learning how to identify and eliminate bottlenecks across GPU compute, memory bandwidth, and decoding pipelines that directly impact latency and throughput.

You’ll gain practical insights into:

Profiling LLM inference workloads to uncover latency spikes and utilization gaps
Addressing the “memory wall” through efficient memory handling and custom GPU kernels
Applying advanced decoding and execution optimizations to reduce TTFT and improve throughput

Through real-world inference scenarios and a guided demonstration of Fixstars AIBooster, discover how engineers are achieving:

Significant reductions in Time To First Token (TTFT)
Higher end-to-end inference throughput on existing hardware
Lower operational costs through targeted, automated inference tuning

Who Should Watch

ML engineers deploying LLMs in production
Platform and infrastructure engineers optimizing inference stacks
MLOps and DevOps teams managing GPU-backed inference services
Tech leads and architects focused on latency, scalability, and cost control

Whether you’re serving LLMs at scale, optimizing real-time AI applications, or pushing more performance out of your current GPU infrastructure, this session delivers concrete, production-ready strategies to make inference faster, more predictable, and more cost-efficient.

Learn how Fixstars engineers boost LLM inference—cut TTFT, raise throughput, break memory bottlenecks, with live demo & Q&A.

Please join the webinar from this link: https://us06web.zoom.us/webinar/register/WN_it__-lAdQoqpM5p0c1EwBQ

Join us for our upcoming webinar, followed by a live Q&A with Fixstars.

This session focuses on the critical shift from AI training to high-efficiency inference, exploring how to maximize performance and cost efficiency as LLMs move into production.

You’ll dive into inference optimization from a systems and runtime perspective—learning how to identify and eliminate bottlenecks across GPU compute, memory bandwidth, and decoding pipelines that directly impact latency and throughput.

You’ll gain practical insights into:

Profiling LLM inference workloads to uncover latency spikes and utilization gaps
Addressing the “memory wall” through efficient memory handling and custom GPU kernels
Applying advanced decoding and execution optimizations to reduce TTFT and improve throughput

Through real-world inference scenarios and a guided demonstration of Fixstars AIBooster, discover how engineers are achieving:

Significant reductions in Time To First Token (TTFT)
Higher end-to-end inference throughput on existing hardware
Lower operational costs through targeted, automated inference tuning

Who Should Watch

ML engineers deploying LLMs in production
Platform and infrastructure engineers optimizing inference stacks
MLOps and DevOps teams managing GPU-backed inference services
Tech leads and architects focused on latency, scalability, and cost control

Whether you’re serving LLMs at scale, optimizing real-time AI applications, or pushing more performance out of your current GPU infrastructure, this session delivers concrete, production-ready strategies to make inference faster, more predictable, and more cost-efficient.

Good to know

Highlights

45 minutes
Online

Location

Online event

Frequently asked questions

Organized by

Fixstars

Followers--

Events8

Hosting--

Report this event