School of Computer Science – JLR Challenge #4 Technical Workshop
Mining Git Repositories with PyDriller – Part I: Understanding Git and Repository Mining Basics (1st Offering)
Mehedi Hasan Shanto
Date: Tuesday, October 28th, 2025
Time: 3:00 PM
Location: Workshop Space, 4th Floor - 300 Ouellette Ave., School of Computer Science Advanced Computing Hub
Abstract:
This workshop provides a hands-on introduction to mining and analyzing software repositories using PyDriller, a Python framework that simplifies access to Git data. Participants will first explore how Git records the evolution of software projects through commits, authors, timestamps, and code changes. The session will then demonstrate how PyDriller converts raw commit logs into structured, analyzable Python objects—making it easier to extract insights such as developer activity, commit frequency, and project evolution patterns.
Through interactive examples, attendees will learn how to connect to a GitHub repository, traverse commit histories, and extract essential metadata for software analytics. The workshop will emphasize the importance of version control data in empirical software engineering, showcasing how it can be used to support research, automate reporting, and drive evidence-based development practices. This first session sets the stage for deeper repository analysis on Day 2, where participants will move from basic mining to advanced metrics and developer behavior analytics.
Workshop Outline:
1. Introduction to Repository Mining
- Why studying version control data matters
- Applications in research and industry
2. Git Fundamentals
- How Git tracks code history and collaboration
- Key commands and underlying concepts
3. Introduction to PyDriller
- Overview, installation, and core capabilities
- Understanding commits, authors, and metadata
4. Hands-on Demonstration
- Connecting to a repository
- Extracting commit messages, authors, and timestamps
5. Interactive Exercise & Discussion
- Running PyDriller on real repositories
- Exploring developer activity patterns
6. Wrap-Up & Next Steps
- Key takeaways and preview of advanced analysis in Part II
Prerequisites:
Basic understanding of Python programming and familiarity with Git concepts.
Participants should have access to Jupyter Notebook or Google Colab for the live demo.
Biography:
Mehedi Hasan Shanto is a Ph.D. student in the School of Computer Science at the University of Windsor, specializing in software engineering, large language models (LLMs), and repository mining. His research focuses on understanding how AI and empirical methods can evaluate, predict, and automate software development activities. He has experience working with GitHub data, software analytics, and LLM-based evaluation frameworks. Shanto’s passion lies in bridging the gap between software repository data and intelligent automation, helping developers and researchers turn raw version control history into actionable insights.