Gridded geospatial datasets, such as those from climate models, Earth system models, and remote sensing platforms, are among the largest datasets available today, easily accumulating in terabytes and growing rapidly. These datasets present unique challenges due to their high spatial resolution (down to centimeters) and temporal frequency (minutes). Modern geospatial raster data adds further complexity with its high dimensionality, where dimensions may represent various experiments, model configurations, spectral bands, or environmental variables. These characteristics make handling geospatial raster data particularly challenging for traditional data processing approaches.
This workshop is the first in a two-part series on geospatial data analysis with Python. Part 1 focuses on Xarray, a powerful library specifically designed for working with labeled multi-dimensional arrays and gridded geospatial data. Part 2 will address scaling geospatial data processing workflows on HPC resources using Xarray and Dask for parallel computation.
Prerequisites
- Basic Python programming knowledge
- Familiarity with scientific Python libraries (NumPy, Pandas) is helpful but not required
- Access to the RCC HPC environment (instructions will be provided prior to workshop)
Workshop Materials
All materials, including Jupyter notebooks, example datasets, and setup instructions will be provided to participants before the workshop.