The Open Science Toolbox: Reproducibility Using Git, Python, Zenodo, & More
Join us for a virtual training session to boost your open science skills with Open Science Experts from DUTC!
Date and time
Location
Online
Refund Policy
About this event
- Event lasts 1 hour 30 minutes
Event Description:
The Open Science Toolbox -
Make Reproducible Research Using Git, Python, Zenodo, and More
Technologies Covered in the Seminar
- Open data repositories (Zenodo, Data.gov) for locating research-ready datasets
- Pixi for managing reproducible software environments
- Git and GitHub for version control and collaboration
- Zenodo for archiving and citing code and datasets
- Python, with a focus on pandas for data wrangling and Matplotlib for visualization
Training Overview
Reproducibility in research starts with using the right tools. This 90-minute
seminar offers a practical introduction to essential technologies that help
researchers write clean, trackable, and shareable code. The session is
structured as a guided demo, with permanent access to a recorded walkthrough
and a companion repository of materials.
We will begin by reviewing how to find and evaluate public datasets.
Participants will learn how to assess licensing, source quality, and fitness
for use, using examples from popular repositories.
We will then look at how to set up a clean, reproducible computational
environment using Pixi. This makes it easy to install Python and the packages
needed for analysis without running into version conflicts.
From there, we will introduce version control using Git. Participants will see
how Git can be used to track changes to code in tandem with GitHub to support
collaboration, and prepare research outputs for sharing. We will also
demonstrate how to archive a project using Zenodo, making it easy for others to
cite and reuse.
Throughout the session, we will use Python to explore and visualize datasets.
Examples will focus on pandas for data processing and Matplotlib for creating
clear, publication-ready figures. The emphasis will be on writing transparent
code that others can inspect and reuse.
This seminar is for researchers who want to improve the way they manage code,
analyze data, and share results. No prior experience with these tools is
required, but even experienced users may pick up new ideas for structuring
their research workflows.
Agenda
Part 1: Find and Prepare Data
- Overview of reproducibility and why it matters
- Finding datasets on Zenodo, Kaggle, and Data.gov
- Assessing licenses, documentation, and data quality
Part 2: Environment Management and Coding Workflows - Setting up a clean environment with Pixi
- Installing Python and core packages
- Using Git to track changes and manage projects
- Introduction to pandas and Matplotlib for basic data analysis
Part 3: Share Your Work - Hosting code on GitHub
- Archiving and citing research materials using Zenodo
What to Expect
This is a demonstration-focused seminar. Participants will not need to install
any software or follow along in real time. While we will demonstrate practical Python
examples live, this is not a hands-on coding workshop. The focus is on giving
participants a clear picture of how these tools fit together in a reproducible
research workflow.
Frequently asked questions
You will need a computer with a stable internet connection and a willingness to learn and engage!
No problem! Here’s how you can still benefit: - Register anyway, and we’ll send you the recording so you can watch at your convenience. - Have questions? Our team is happy to provide follow-up support via email (openscience@dutc.io).