Run SUMMA models with CAMELS dataset on CyberGIS-Jupyter for Water (CJW)

Introduction

CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) is a large-sample hydrometeorological dataset that provides catchment attributes, forcings and GIS data for 671 small- to medium-sized basins across the CONUS (continental United States). HydroShare hosts a copy of CAMELS and exposes it through different public data access protocols (WMS, WFS and OPeNDAP) for easy visualization, retrieval and subsetting of the dataset in community modeling research. This notebook demostrates how to set up SUMMA models with CAMELS dataset from HydroShare using various tools integrated in the CyberGIS-Jupyter for Water (CJW) environment and execution of ensemble model runs on a High Performance Computing (HPC) resource through CyberGIS-Compute Service.

CAMELS dataset hosted on HydroShare

The CAMELS dataset is currently stored in two HydroShare resources.

  1. Resource "NLDAS Forcing NetCDF..." (Click Here) contains a shapefile "HCDN_nhru_final_671.shp" of the 671 basin boudary and a NetCDF file "nldasForcing1980to2018.nc" of NLDAS forcings from 1980 to 2018.

CAMELS NLDAS resource

  1. Resource "CAMELS Basin Attributes" (Click Here) contains basin attributes in 2 NetCDF files: "attributes.camels.v2.nc" and "trialParams.camels.Oct2020_new.nc".

In HydroShare, shapefiles are represented as GeographicFeatureConetentType and exposed through OGC WMS and WFS services using GeoServer; NetCDF files are represented as MultideimentialContentType and exposed through OPeNDAP protocol using Hyrax Data Server.

To programmatically access or subset the data in Jupyter Notebook environment, the use of one or more client tools that are compatiable to the above protocols are often required, which will be introduced in later sections. For now, we just take advantage of the simple web interfaces built into GeoServer and Hyrax Data Server for quick data preview.

CAMELS Basins

NLDAS Forcings

Set up SUMMA models with CAMELS dataset

In this section, we will set up SUMMA model for a user-picked CAMELS basin. Several steps are required:

Pick a CAMELS Basin and Simulation Period

You may put the "hru_id" of your interested CAMELS basin below if you already know it. Otherwise we will interactively select a basin on the map. For simulation period, the start_datetime and end_datetime should be in "YYYY-MM-DD HH:MM" format and within "1980-01-01 00:00" to "2018-12-31 23:00" (temporal coverage of NLDAS forcings).

Select a CAMELS basin on interactive map (optional)

Here we provide an interactive map for you to view the 671 CAMELS basins. By taking advantage of the OGC WFS service that HydroShare has set up on top of the basin shapefile "HCDN_nhru_final_671.shp", we can retrieve the basin geometry in GeoJSON and visualize it using ipyleaflet.

You may hover over a basin to check hru_id shown in the bottom-right corner. You may click on a basin in the map to select it for modelling. A popuop with more info (hru_id, lat, lon, area, elevation and perimeter) about the basin will show up in the upper-left corner. NCAR provides high-resolution basin image and elevation map for some basins located in the western coast. For those basins, the popup shows thumbnails that are clickable and linked to the originals.

Note: Clicking on a basin indicates it is selected for modelling, and this action will overwrite and upate the above "hru_id" variable.

In this section, we will subset NLDAS forcing data to the above user-selected basin over the requested simulation period. The orginal forcing contains data for 671 CAMELS basins from 1980-2018 (39 years) in a single 6GB NetCDF file "nldasForcing1980to2018.nc". HydroShare exposes the NetCDF file through OpenDAP protocol using Hydrax, which enables users to directly retrieve a portion of the data without having to download the whole file to Jupyter environment. The public OpenDAP access url for a specific NetCDF file hosted on HydroShare follows the following pattern:

http://hyrax.hydroshare.org/opendap/hyrax/{RESOURCE_ID}/data/contents/{NETCDF_FILE_NAME}

In the case of NLDAS CAMELS forcing file "nldasForcing1980to2018.nc", the access url is:

http://hyrax.hydroshare.org/opendap/hyrax/a28685d2dd584fe5885fc368cb76ff2a/data/contents/nldasForcing1980to2018.nc

We would need to use a OpenDAP-complaint client tool to open the url (putting it in browser directly only results in a warning message). Here we chose to use XArray for this purpose.

The following cell shows the metadata of the remote NLDAS NetCDF file through OpenDAP. Under the Demensions tab, you can see it has 671 basins and 341880 timesteps (14245 days * 24 timesteps/day).

Here we use hru_id, start_datetime and end_datetime to subset the NLDAS forcing NetCDF. The new Dimensions should show only 1 basin and whatever timesteps that matches the selected simulation period (num of days * 24 timesteps/day). Note that we also tweaked the resulting file a bit (set variable "data_step" and remove attribute "_NCProperties") to make it compatible with SUMMA model.

Save forcing subset to local SUMMA model folder

Plot forcing subset

Prepare Attribute File for selected CAMELS Basin

SUMMA uses a number of files to specify model attributes and parameters. Although SUMMA's distinction between attributes and parameters is somewhat arbitrary, attributes generally describe characteristics of the model domain that are time-invariant during the simulation, such as GRU and HRU identifiers, spatial organization, an topography. The important part for understanding the organization of the SUMMA input files is that the values specified in the local attributes file do not overlap with those in the various parameter files. Thus, these values do not overwrite any attributes specified elsewhere. In contrast, the various parameter file are read in sequence (as explained in the next paragraph) and parameter values that are read in from the input files successively overwrite values that have been specified earlier.

Since the CAMELS basin attribute NetCDF file attributes.camels.v2.nc is pretty small (70KB), a copy is included in the summa model folder. We just need to subset it to our selected basin.

Prepare Parameter File for selected CAMELS Basin

The trial parameters file is a NetCDF file that specifies model parameters for GRUs and individual HRUs. This enables the user to overwrite the default and/or Noah-MP parameter values with local-specific ones.

Since the CAMELS parameter NetCDF file trialParams.camels.Oct2020.nc is pretty small (250KB), a copy is included in the summa model folder. We just need to subset it to our selected basin.

Create Initial Conditions

Run SUMMA model locally

Plot model outputs

Run Ensemble SUMMA model on HPC through CyberGIS-Compute Service

Build Ensemble

Create folders for job submission

Submit ensemble model to HPC for execution through CyberGIS-Compute Service

Monitor Job Status

Retrieve Ensemble Model Output

Plot Ensemble Model Output

Done