logo

ML Geo Curriculum

  • Machine Learning in the Geosciences

About this Book

  • Geosmart website
  • ML Project Overview
  • Acknowlegments

Chapter 1 - Open Source Ecosystem

  • Getting Started
  • 1.1 Open Reproducible Science
  • 1.3 Jupyter Environment
  • 1.3 Python Ecosystem
  • 1.4 Computing Environments
  • 1.5 Version Control & GitHub
  • 1.6 Data Gallery
  • Final Integrated Project in Machine Learning in Geoscience

Chapter 2 - Data Manipulation

  • Chapter Overview
  • 2.1 Data Definitions
  • 2.2 Data Formats
  • 2.3 Pandas
  • 2.4 DataFrame Exploration
  • 2.5 Data Arrays
  • 2.6 Resampling Methods
  • 2.7 Statistical Considerations for geoscientific Data and Noise
  • 2.8 Spectral Transforms
  • 2.9 Filtering Data
  • 2.10 Synthetic noise
  • 2.11 Feature Engineering
  • 2.12 Dimensionality Reduction
  • 2.13 ML-ready data
  • Assignment: Preparing AI-Ready Data for The Final Project

Chapter 3 - Machine Learning

  • Chapter Overview
  • 3.1 Concepts in training supervision
  • 3.2 Classification and Regression
  • 3.3 Clustering: Unsupervised Classification
  • 3.4 Binary classification
  • 3.5 Multiclass Classification
  • 3.6 Logistic regression
  • 3.7 Random Forests
  • 3.8 Robust Training
  • 3.9 Ensemble learning
  • 3.10 AutoML
  • Final Project - Classic Machine Learning
  • Homework Classic Machine Learning (50 points)

Chapter 4 - Deep Learning

  • Chapter Overview
  • 4.0 The Perceptron
  • 4.1 Neural Networks
  • 4.2 Multi Layer Perceptrons
  • 4.3 Convolutional Neural Networks
  • 4.4 Recurrent Neural Networks: Processing sequences
  • 4.5 Model Training
  • 4.6 Auto-encoders
  • 4.7 Physics-Informed Neural Networks
  • 4.8 NAS: Network Architecture Search
  • 4.9 LLMGEO
  • 4.10 Time Series Forecast
  • Deep Learning Exploration with AI-Ready Datasets

Chapter 5 - Workflow Management and Reproducibility

  • ML reproducibility

Chapter 6- Introduction to Cloud Computing

  • Browser Access to Cloud Instances
  • Terraform Access to Cloud Instances
  • AWS Cloud

Chapter 7 - MLLGEO Projects

  • Use Cases in MLGEO

Reference

  • Glossaries
  • Bibliography
Powered by Jupyter Book
  • repository
  • open issue
  • suggest edit
  • .md
Contents
  • How to Download a File from a MLGeo-dataset Repository
  • Description of data

1.6 Data Gallery

Contents

  • How to Download a File from a MLGeo-dataset Repository
  • Description of data

1.6 Data Gallery#

This course provides various tracks for students in geosciences. The data sets can be downloaded and used throughout to use in the various exercises (e.g., classification, regression, clustering …)

The tracks are: geophysical sciences (seismology and geodetic tracks), cryospheric sciences, atmospheric sciences, ocean sciences, hydrology, and forestry.

We provide a series of small, curated data set for the course. These data set are open-access data, with their own licenses.

To download similar data, we made a MLGEO-dataset (https://github.com/UW-MLGEO/MLGeo-dataset).

How to Download a File from a MLGeo-dataset Repository#

To download a file from a GitHub repository, follow these steps:

  1. Identify the File URL:

    • Navigate to the CSV file in the GitHub repository.

    • Click on the file to view its contents.

    • Click the “Raw” button to get the direct URL to the file.

  2. Construct the Download URL:

    The URL format is:

    https://raw.githubusercontent.com/UW-MLGEO/MLGEO-dataset/main/data/file.csv
    
  3. Download the File: You can use various methods to download the file, such as using a web browser, wget, or curl. Open your terminal and run:

    wget https://raw.githubusercontent.com/UW-MLGEO/MLGEO-dataset/main/data/EarthRocGranites.csv
    

Description of data#

The collection of data aims to represent the diversity of data sets encountered in the geosciences.

The data includes time series of various time scales (from the second to the 100ka). The data is stored either in CSV files for the class, but typically is stored in CSV, Arrow, H5, NetCDF, TileDB, mseed and other disciplinary-specific format.

Geoscientific Temporal Data

../_images/geocast-alldata.png

Fig. 6 Figure 1: Geoscience Temporal Data: x-axis represent time normalized, y-axis is normalized time series offset by indexing in the data set. The data sets includes extreme events, dynamic seismic waves, CO2 rising, seasonal pattern over 15+ years such as hydrological and weather signals.#

Figure 1: Geoscience Temporal Data: x-axis represent time normalized, y-axis is normalized time series offset by indexing in the data set. The data sets includes extreme events, dynamic seismic waves, CO2 rising, seasonal pattern over 15+ years such as hydrological and weather signals.

previous

1.5 Version Control & GitHub

next

Final Integrated Project in Machine Learning in Geoscience

By eScience Institute, University of Washington
© Copyright 2022.