Running in a HPC Tutorial

CUWALID Forecast Workflow Guide

This guide describes how to run the full CUWALID forecasting workflow, including generating JSON input files and submitting jobs to a compute cluster. This example uses SLURM but it can adjusted to whichever software your hpc uses.

1. Generate JSON Files and Bash Scripts

Before you start running CUWALID, make sure you have created and have access to all parameter datasets, as well as the model parameter and setting files for each of the components that you plan to run.

You can either use the Input Helper tool, here, to create all the required files each components, or you can find some templates in the following repository:

CUWALID Templates

Run the JSON generator script which also creates .bash job scripts:

python -m cuwalid.tools.generate_json your_cuwalid_config.json

This will:

Parse the configuration
Generate STORM, StoPET, and DRYP input JSONs
Save lists of generated files:
- storm_jsons.txt
- stopet_jsons.txt
- dryp_jsons.txt
complete some necessary pre-processing needed for storm and stopet

2. Create the Bash Job Templates

Submit independent jobs (option 1)

After the JSON files have been generated, you may find it convenient to submit Cuwalid jobs separately for each component, rather than running all components at the same time. This can help you maintain better control over each part of the forecasting system.

The following file can be used as templates to submit the climatological componets: STORM and stoPET:

submit_meteo_jobs_slurm.bash

whereas for the hydrological components the following file can be used to to submit jobs: DRYP.

submit_dryp_jobs_slurm.bash

At the following link, you can find additional scripts to help you set up and submit jobs to the server:

CUWALID HPC Templates

Create the Bash Job Templates (option 2)

Run the bash generator script:

an example is provided below of how you can do this, but it will depend on your environment

import os

def write_bash(job_name, command, logname=None, slurm=True, memory="4G", time="01:00:00"):
    """Writes a bash script for submitting a job to a cluster.
    
    Parameters:
    ----------
    time : str
        The maximum time the job is allowed to run, in the format HH:MM:SS.
    memory : str
        The amount of memory allocated for the job, e.g., "4G" for 4 gigabytes.
    job_name : str
        The name of the job.
    command : str
        The command to run in the job.
    logname : str, optional
        The name of the log file. If not provided, defaults to job_name.log.
    """

    path = f"bSub_runMe/cuwalid_run_{job_name}.bash"
    log_file = logname or f"{job_name}.log"
    lines = [
            "#!/bin/bash\n",
            f"#SBATCH --job-name=cuwalid_{job_name}\n",
            f"#SBATCH --time={time}\n",
            f"#SBATCH --mem={memory}\n",
            "#SBATCH --cpus-per-task=1\n",
            "#SBATCH --output=../bSub_logMe/%x.log\n",
            "#SBATCH --error=../bSub_logMe/%x.err\n",
            "\n",
            "module load miniconda3\n",
            "source activate cwld\n",
            f"{command} > ../bSub_logMe/{log_file} 2>&1\n"
            ]
    with open(path, "w") as f:
        f.writelines(lines)

def main():
    os.makedirs("bSub_runMe", exist_ok=True)
    os.makedirs("bSub_logMe", exist_ok=True)

    # Only run STORM and StoPET jobs, since post-processing is now integrated
    write_bash("storm", "python -m cuwalid.storm.main_storm input_storm.json")
    write_bash("stopet", "python -m cuwalid.stopet.main_stopet_wrapper input_stopet.json")

    # Dryp jobs are generated separately later
    print("Run `submit_all.sh` after generating dryp_jsons.txt.")

if __name__ == "__main__":
    main()

Submit the Workflow

Make sure your JSON lists (storm_jsons.txt, stopet_jsons.txt, and dryp_jsons.txt) are in the root or properly pathed in the script.

Then you can run a bash file like this:

#!/bin/bash
set -e

# Navigate to the directory with .bash files
pushd bSub_runMe

# Read Storm JSON files and submit jobs for each
while IFS= read -r storm_json; do
    # Submit Storm job and get job ID
    storm_jobid=$(sbatch --export=STORM_JSON="$storm_json" --parsable cuwalid_run_storm.bash)

    # Submit StoPET job for the same JSON
    stopet_json=$(echo "$storm_json" | sed 's/storm/stopet/')
    stopet_jobid=$(sbatch --export=STOPET_JSON="$stopet_json" --parsable cuwalid_run_stopet.bash)

    # Now submit DRYP jobs after Storm and StoPET (they should wait until both are done)
    while IFS= read -r dryp_json; do
        job_name=$(basename "$dryp_json" .json)
        bash_script="bSub_runMe/dryp_${job_name}.bash"

        cat <<EOF > "$bash_script"
#!/bin/bash
#SBATCH --job-name=dryp_${job_name}
#SBATCH --time=01:00:00
#SBATCH --mem=4G
#SBATCH --cpus-per-task=1
#SBATCH --output=bSub_logMe/dryp_${job_name}.log
#SBATCH --error=bSub_logMe/dryp_${job_name}.log
#SBATCH --export=ALL

source ~/miniconda3/bin/activate
conda activate cwld

python -m cuwalid.dryp "$dryp_json"
EOF

        chmod +x "$bash_script"

        # Submit the DRYP job with dependency on both Storm and StoPET
        sbatch --dependency=afterok:$storm_jobid:$stopet_jobid "$bash_script"
    done < path/to/dryp_jsons.txt

done < path/to/storm_jsons.txt

popd

or download this CUWALID SLURM-type file from the following link or a CUWALID QSUB-type

This script will:

Loop through each Storm JSON and submit cuwalid_run_storm.bash
Submit corresponding cuwalid_run_stopet.bash job in parallel
Generate and submit DRYP jobs that depend on both Storm and StoPET jobs completing

Each DRYP JSON file will get its own dynamically generated bash script in bSub_runMe/.

3. Complete forecasting

Because this system use the same file structure as the main cuwalid system, to complete the final forecasting steps after dryp you can simply run the normal cuwalid.This can be done be running main_cuwalid script with a cuwalid_input.json. Here you can find all information required to run CUWALID. In the configuration file, you need enable the WaterCast conmponens by activating the HydroCast and IMCast components. But please insure you have disabled the running of storm, stopet and dryp in the configurations settings.

Job Dependencies

Job dependencies are managed with -hold_jid:

DRYP jobs wait for both STORM and StoPET jobs to finish
STORM and StoPET run in parallel

Logs

All logs will be saved in bSub_logMe/, including:

storm_*.log
stopet_*.log
dryp_*.log

Final Notes

Make sure you have SLURM or qsub and conda properly configured on your HPC environment.
Activate the cwld environment in your scripts.
Double-check file paths in your config.json and script logic.
If needed, modify memory/time parameters in .bash headers.

```