JLR Challenge #4 Technical Workshop by: Mehedi Hasan Shanto (1st Offering)

By School of Computer Science

Mining Git Repositories with PyDriller – Part I: Understanding Git and Repository Mining Basics (1st Offering)

Date and time

Location

University of Windsor Advanced Computing Hub

300 Ouellette Avenue Windsor, ON N9A 6X5 Canada

Good to know

Highlights

  • In person

About this event

Science & Tech • High Tech

School of Computer Science – JLR Challenge #4 Technical Workshop


Mining Git Repositories with PyDriller – Part I: Understanding Git and Repository Mining Basics (1st Offering)


Mehedi Hasan Shanto

Date: Tuesday, October 28th, 2025

Time: 3:00 PM

Location: Workshop Space, 4th Floor - 300 Ouellette Ave., School of Computer Science Advanced Computing Hub


Abstract:

This workshop provides a hands-on introduction to mining and analyzing software repositories using PyDriller, a Python framework that simplifies access to Git data. Participants will first explore how Git records the evolution of software projects through commits, authors, timestamps, and code changes. The session will then demonstrate how PyDriller converts raw commit logs into structured, analyzable Python objects—making it easier to extract insights such as developer activity, commit frequency, and project evolution patterns.

Through interactive examples, attendees will learn how to connect to a GitHub repository, traverse commit histories, and extract essential metadata for software analytics. The workshop will emphasize the importance of version control data in empirical software engineering, showcasing how it can be used to support research, automate reporting, and drive evidence-based development practices. This first session sets the stage for deeper repository analysis on Day 2, where participants will move from basic mining to advanced metrics and developer behavior analytics.

Workshop Outline:

1. Introduction to Repository Mining
- Why studying version control data matters
- Applications in research and industry

2. Git Fundamentals
- How Git tracks code history and collaboration
- Key commands and underlying concepts

3. Introduction to PyDriller
- Overview, installation, and core capabilities
- Understanding commits, authors, and metadata

4. Hands-on Demonstration
- Connecting to a repository
- Extracting commit messages, authors, and timestamps

5. Interactive Exercise & Discussion
- Running PyDriller on real repositories
- Exploring developer activity patterns

6. Wrap-Up & Next Steps
- Key takeaways and preview of advanced analysis in Part II


Prerequisites:

Basic understanding of Python programming and familiarity with Git concepts.
Participants should have access to Jupyter Notebook or Google Colab for the live demo.


Biography:

Mehedi Hasan Shanto is a Ph.D. student in the School of Computer Science at the University of Windsor, specializing in software engineering, large language models (LLMs), and repository mining. His research focuses on understanding how AI and empirical methods can evaluate, predict, and automate software development activities. He has experience working with GitHub data, software analytics, and LLM-based evaluation frameworks. Shanto’s passion lies in bridging the gap between software repository data and intelligent automation, helping developers and researchers turn raw version control history into actionable insights.


Organized by

School of Computer Science

Followers

--

Events

--

Hosting

--

On Sale Oct 27 at 8:00 PM