4.2 Feature Selection for SWE Prediction Models

4.2 Feature Selection for SWE Prediction Models#

Criteria for selecting features in SWE prediction models, Techniques and tools used for feature selection

  • Filtering: Select only the relevant columns needed for SWE prediction, such as weather conditions, geographic features, and snowpack measurements.

  • Renaming: Streamline column names for consistency and clarity (e.g., changing “Snow Water Equivalent (in) Start of Day Values” to “swe_value”).

save the final data into csv file.

import dask.dataframe as dd
input_csv = '../data/model_training_data.csv'
# List of columns you want to extract
selected_columns = ['date', 'lat', 'lon', 'etr', 'pr', 'rmax',
                    'rmin', 'tmmn', 'tmmx', 'vpd', 'vs', 
                    'elevation',
                    'slope', 'curvature', 'aspect', 'eastness',
                    'northness', 'Snow Water Equivalent (in) Start of Day Values']
# Read the CSV file into a Dask DataFrame
df = dd.read_csv(input_csv, usecols=selected_columns)

df = df.rename(columns={"Snow Water Equivalent (in) Start of Day Values": "swe_value"})

# Replace 'output.csv' with the desired output file name
output_csv = '../data/model_training_cleaned.csv'

# Write the selected columns to a new CSV file
df.to_csv(output_csv, index=False, single_file=True)
['/Users/vangavetisaivivek/research/swe-workflow-book/book/data/model_training_cleaned.csv']

Major environmental factors that affect the swe and required to train the models are date, location which includes latitude and longitude, etr (Evapotranspiration), pr (Precipitation), rmax (Maximum Relative Humidity), rmin (Minimum Relative Humidity), tmmn (Minimum Temperature), tmmx (Maximum Temperature), vpd (Vapor Pressure Deficit), vs (Wind Speed), elevation, slope, curvature, aspect, eastness, northness.