L'événement s'est terminé

JLR Challenge # 1 Technical Workshop (1st Offering) by: Soroush Ziaeinejad

Par School of Computer Science

University of Windsor Advanced Computing HubWindsor, ON

oct. 24, 2025 to oct. 24, 2025

Aperçu

An Introduction to AI Agent Benchmarks (1st Offering)

School of Computer Science – JLR Challenge # 1 Technical Workshop

An Introduction to AI Agent Benchmarks (1st Offering)

Presenter: Soroush Ziaeinejad

Date: Friday, October 24th, 2025

Time: 1:00 PM

Location: Workshop Space, 4th Floor - 300 Ouellette Ave., School of Computer Science Advanced Computing Hub

Abstract:

As AI systems move from just answering questions to acting like agents that can plan, use tools, and interact with real environments, we need new ways to measure their abilities. Traditional benchmarks only test final answers, but agent benchmarks focus on how well an AI can solve problems step by step, correct mistakes, and adapt to different tasks. Each benchmark looks at a different skill: HumanEval tests if an agent can write correct code, Mint checks how an agent uses tools to solve problems, GAIA evaluates reasoning across text, images, and real-world data, and SWEBench-Lite measures how well an agent can understand and fix real software issues. This presentation will explain these benchmarks, show how they differ, and discuss what they help us learn about the strengths and weaknesses of current AI agents.

Workshop Outline:

• Why Agent Benchmarks Are Needed

• Core Concepts in Agent Evaluation

• Overview of HumanEval, Mint, GAIA, and SWEBench-Lite

• Comparison of benchmark goals and methodologies

• Hands-on demonstration: running and interpreting results from one benchmark

• Discussion and Q&A

Prerequisites:

• Basic understanding of AI or machine learning concepts

• Familiarity with large language models (LLMs) and their applications

Biography:

Soroush is a Ph.D. candidate and research assistant in Computer Science at the University of Windsor. He received his bachelor’s degree in Software Engineering and his master’s degree in AI specializing in computer vision and video processing. His current research focuses on privacy and security in AI, with a particular emphasis in distributed and collaborative learning systems.