Accelerating Streaming AI Inference using Sparsity and Locality (Part II)
Date and time
Location
Online event
Refund policy
Contact the organizer to request a refund.
Eventbrite's fee is nonrefundable.
Femtosense enables streaming AI inference for up to 100x less energy and 10x less area than existing inference accelerators.
About this event
AI algorithms demand an unprecedented level of compute. However, it is not the compute operations, but rather the underlying data motion operations that dominate energy costs. In this talk, we discuss how Femtosense employs sparsity and locality to mitigate energy costs. Sparsity is difficult to exploit on existing CPU and GPU-like architectures due to its irregular data access patterns. It is easy to lose any potential gains to the overhead necessary on existing architectures for sparse processing. Femtosense corrects this algorithm-architecture mismatch in its Sparse Processing Unit to enable large scale AI applications where previously considered impossible (e.g., the battery-powered edge).