2.12 Dimensionality Reduction#

Ideally, one would not need to extract or select features in the input data. However, reducing the dimensionality as a separate pre-processing step may be advantageous:

  1. The complexity of the algorithm depends on the number of input dimensions and size of the data.

  2. If some features are unecessary, not extracting them saves computing time

  3. Simpler models are more robust on small datasets

  4. Fewer features lead to a better understanding of the data.

  5. Visualization is easier in fewer dimensions.

Dimensionality reduction techniques fall into two categories:

1. Feature selection#

  • Feature selection finds the dimensions that explain the data without loss of information and ends with a smaller dimensionality of the input data. A forward selection approach starts with one variable that decreases the error the most and adds one by one. A backward selection starts with all variables and removes them one by one.

# Import useful modules
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
# Generate synthetic climate data
np.random.seed(42)  # For reproducibility

n_samples = 1000
temperature = np.random.normal(loc=15, scale=10, size=n_samples)  # Mean 15°C, std 10°C
humidity = np.random.normal(loc=75, scale=15, size=n_samples)     # Mean 75%, std 15%
# Introduce correlation: precipitation is a function of humidity plus some noise
precipitation = 0.5 * humidity + np.random.normal(loc=0, scale=10, size=n_samples)  # Correlated with humidity
wind_speed = np.random.normal(loc=10, scale=5, size=n_samples)    # Mean 10m/s, std 5m/s
# Introduce correlation: solar radiation is a function of temperature plus some noise
solar_radiation = 0.8 * temperature + np.random.normal(loc=0, scale=5, size=n_samples)  # Correlated with temperature

# Combine into a DataFrame
climate_data = pd.DataFrame({
    'Temperature (°C)': temperature,
    'Humidity (%)': humidity,
    'Precipitation (mm)': precipitation,
    'Wind Speed (m/s)': wind_speed,
    'Solar Radiation (W/m2)': solar_radiation
})
climate_data.head()
Temperature (°C) Humidity (%) Precipitation (mm) Wind Speed (m/s) Solar Radiation (W/m2)
0 19.967142 95.990332 41.243383 0.460962 11.656245
1 13.617357 88.869505 42.989566 5.698075 10.737868
2 21.476885 75.894456 30.023029 7.931972 17.271593
3 30.230299 65.295948 29.568359 19.438438 26.547391
4 12.658466 85.473350 23.800528 12.782766 3.292481
# Calculate the correlation matrix
correlation_matrix = climate_data.corr()

# Display the correlation matrix with a color gradient
correlation_matrix.style.background_gradient(cmap='coolwarm')
  Temperature (°C) Humidity (%) Precipitation (mm) Wind Speed (m/s) Solar Radiation (W/m2)
Temperature (°C) 1.000000 -0.040400 -0.006884 -0.013321 0.840024
Humidity (%) -0.040400 1.000000 0.599756 -0.054698 -0.044765
Precipitation (mm) -0.006884 0.599756 1.000000 -0.016022 0.003570
Wind Speed (m/s) -0.013321 -0.054698 -0.016022 1.000000 -0.000992
Solar Radiation (W/m2) 0.840024 -0.044765 0.003570 -0.000992 1.000000

Which feature might we remove given these correlations?

2. Feature extraction#

Feature extraction finds a new set of dimensions as a combination of the original dimensions. They can be supervised or unsupervised depending on the output information. Examples are Principal Component Analysis, Independent Component Analysis Linear Discriminant Analysis

3 Principal Component Analysis#

PCA is an unsupervised learning method that finds the mapping from the input to a lower dimensional space with minimum loss of information.

Principal Component Analysis (PCA) identifies the axis that accounts for the largest amount of variance in the data.

Let: \(\mathbf{Y} = \mathbf{y}_1,\cdots,\mathbf{y}_n \) be the data, measured \(n\) times over multiple fields of measurements (the length of \(\mathbf{y})\).

Each column of \(\mathbf{Y}\) represents a unique observation. Each row of \( \mathbf{Y} \) represents a single parameter.

To undertake PCA, we will

  1. Center and standardize the data by subtracting the mean of each row of \(\mathbf{Y} \).

  2. Calculate the covariance matrix of the de-meaned data \(\mathbf{C} = \frac{1}{n-1} \mathbf{Y}^{\ast}\mathbf{Y}\). By definition, the covariance matrix is positive symmetric, and thus can be diagonalized.

  3. Calculate the Singular Value Decomposition (SVD) of the covariance matrix \(\mathbf{C} \).

As the name implies, SVD decomposes the data covariance matrix \(\mathbf{C} \) into 3 terms:

\(\mathbf{X} = \mathbf{U} \Sigma \mathbf{V}^T ,\)

where \(\mathbf{V}^T\) contains the eigenvectors, or principal components.

Principal components are normalized, centered around zero.

The 1st principal component eigenvector has the highest eigenvalue in the direction that has the highest variance.

To demonstrate the application of PCA, we will start with some toy data: a two-dimensional (2D) point cloud made up of 10,000 observations, each with two parameters.

# Generating the toy data
xC = np.array([2, 1])      # Center of data (mean)
sig = np.array([2, 0.5])   # Principal axes
theta = np.pi/3            # Rotate cloud by pi/3
R = np.array([[np.cos(theta), -np.sin(theta)],     # Rotation matrix
              [np.sin(theta), np.cos(theta)]])
nPoints = 10000            # Create 10,000 points

# create the cloud of points by multiplying (np.matmul is also called by @)
X = R @ np.diag(sig) @ np.random.randn(2,nPoints) + np.diag(xC) @ np.ones((2,nPoints))
# plot the data
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.plot(X[0,:],X[1,:], '.', color='k', alpha=0.125)
ax1.grid()
plt.xlim((-6, 8))
plt.ylim((-6,8))
ax1.set_aspect('equal')
../_images/2.12_dimensionality_reduction_8_0.png

Step 1: subtract the mean#

## Remove the mean of the data
Xavg = np.mean(X, axis=1)                 # Compute mean
B = X - np.tile(Xavg,(nPoints,1)).T       # Mean-subtracted data

plt.scatter(B[0,:],B[1,:], color='k', alpha=0.125)
<matplotlib.collections.PathCollection at 0x153759850>
../_images/2.12_dimensionality_reduction_10_1.png
# calculate the covariance matrix using matmul
covB = np.matmul(B,B.T)/nPoints
print(f"shape of B {B.shape} and shape of covB {covB.shape}")
print(covB)
shape of B (2, 10000) and shape of covB (2, 2)
[[1.19259352 1.64014365]
 [1.64014365 3.09080537]]

Step 2: Determine the SVD of the covariance matrix#

# Find principal components (SVD): 
# use the option full_matrices =0 will calculate the covariance of B
# Here, we transpose B so that each observation is on a row

U, S, VT = np.linalg.svd(covB,full_matrices=0)
np.diag(S)
array([[4.0366594 , 0.        ],
       [0.        , 0.24673948]])
VT
array([[-0.4995708 , -0.86627306],
       [-0.86627306,  0.4995708 ]])

Step 3: explore the outcomes#

fig = plt.figure()
ax2 = fig.add_subplot(111)
ax2.plot(X[0,:],X[1,:], '.', color='k', alpha=0.125)   # Plot data to overlay PCA
ax2.grid()
plt.xlim((-6, 8))
plt.ylim((-6,8))
ax2.set_aspect('equal')

# Plot eigenvectors VT[:,0] and VT[:,1]
ax2.plot(np.array([Xavg[0], Xavg[0]+VT[0,0]]),
         np.array([Xavg[1], Xavg[1]+VT[1,0]]),'-',color='cyan',linewidth=2)
ax2.plot(np.array([Xavg[0], Xavg[0]+VT[0,1]]),
         np.array([Xavg[1], Xavg[1]+VT[1,1]]),'-',color='white',linewidth=2)

plt.show()
# Plot eigenvectors VT[:,0] and VT[:,1]
ax2.plot(np.array([Xavg[0], Xavg[0]+VT[0,0]*S[0]]),
         np.array([Xavg[1], Xavg[1]+VT[1,0]*S[0]]),'-',color='cyan',linewidth=2)
ax2.plot(np.array([Xavg[0], Xavg[0]+VT[0,1]*S[1]]),
         np.array([Xavg[1], Xavg[1]+VT[1,1]*S[1]]),'-',color='white',linewidth=2)

plt.show()
../_images/2.12_dimensionality_reduction_17_0.png
# Let us project the original data
# Projecting the data onto the right singular vectors

projected = X.T.dot(VT.T)

plt.scatter(projected[:,0], projected[:,1], c='k', alpha=0.5)
ax = plt.gca()
ax.set_axisbelow(True)
ax.grid()
ax.set_aspect('equal')
../_images/2.12_dimensionality_reduction_18_0.png

3.2 PCA on Climate Data#

climate_data.head()
Temperature (°C) Humidity (%) Precipitation (mm) Wind Speed (m/s) Solar Radiation (W/m2)
0 19.967142 95.990332 41.243383 0.460962 11.656245
1 13.617357 88.869505 42.989566 5.698075 10.737868
2 21.476885 75.894456 30.023029 7.931972 17.271593
3 30.230299 65.295948 29.568359 19.438438 26.547391
4 12.658466 85.473350 23.800528 12.782766 3.292481
climate_data_scaled.shape
(1000, 5)
climate_pca
array([[ 0.1040393 , -1.31195709],
       [-0.36471427, -0.903754  ],
       [ 0.92535981,  0.28148253],
       ...,
       [ 2.32755929,  1.3884194 ],
       [-0.43357006,  0.29246707],
       [ 0.60857732,  0.96030444]])
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Standardize the data before applying PCA
scaler = StandardScaler()
climate_data_scaled = scaler.fit_transform(climate_data)
# Apply PCA
pca = PCA(n_components=2)  # Reduce to 2 dimensions for visualization
climate_pca = pca.fit_transform(climate_data_scaled)

# Display the explained variance ratio
print("Explained variance ratio:", pca.explained_variance_ratio_)

# Plot the PCA results
plt.figure(figsize=(10, 7))
plt.scatter(climate_pca[:, 0], climate_pca[:, 1], c='blue', alpha=0.5)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA of Climate Data')
plt.grid(True)
plt.show()
Explained variance ratio: [0.36980642 0.31925119]
../_images/2.12_dimensionality_reduction_23_1.png

SVD can be computationally intensive for larger dimensions.

It is recommended to use a randomized PCA to approximate the first principal components. Scikit-learn automatically switches to randomized PCA in either the following happens: data size > 500, number of vectors is > 500 and the number of PCs selected is less than 80% of either one.

3.3 PCA on geospatial and temporal data#

Principal Component Analysis (PCA) to spatial-temporal datasets yields 2 important dimensions:

  • Empirical Orthogonal Functions: EOFs are the eigenvectors obtained from the covariance matrix of the dataset. Each EOF corresponds to a spatial pattern that explains a portion of the total variance in the data. In climate science, EOFs help identify dominant patterns like atmospheric circulation modes or temperature anomalies.

  • Principal Components (PCs): The time series associated with each EOF is called a principal component. PCs show how the amplitude of the corresponding EOF varies over time.

Together, EOFs and PCs describe the spatial-temporal variability of the dataset.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# Step 1: Define the spatial grid
lat_points = 50
lon_points = 50
latitudes = np.linspace(-90, 90, lat_points)  # From South Pole to North Pole
longitudes = np.linspace(-180, 180, lon_points)  # Full range of longitudes

# Create a meshgrid for latitude and longitude
lon_grid, lat_grid = np.meshgrid(longitudes, latitudes)

# Step 2: Generate the latitudinal temperature gradient
# Assume temperatures decrease linearly from the equator to the poles
latitudinal_gradient = np.cos(np.radians(lat_grid))  # Cosine function for smooth transition
latitudinal_gradient = latitudinal_gradient / np.max(latitudinal_gradient)  # Normalize


# Step 3: Generate the seasonal cycle
time_steps = 360  # 30 years of monthly data
months = np.arange(time_steps) % 12  # Months from 0 to 11
seasonal_cycle = np.sin(2 * np.pi * months / 12)  # 12-month cycle


# Step 4: Introduce a climate trend
# Simulate a small temperature increase over 30 years
trend = np.linspace(0, 0.5, time_steps)  # Temperature increases by 0.5 units over 30 years

# Step 5: Combine latitudinal gradient and seasonal cycle
# Initialize the data array
data = np.zeros((time_steps, lat_points, lon_points))

for t in range(time_steps):
    # Temperature at each time step is the product of latitudinal gradient and seasonal cycle
    data[t] = latitudinal_gradient * seasonal_cycle[t] + trend[t]

# Step 6: Add random noise to simulate variability
noise_level = 0.2  # Adjust the noise level as needed
random_noise = noise_level * np.random.randn(time_steps, lat_points, lon_points)
data += random_noise

# Visualize the synthetic climate data
plt.imshow(data[1], extent=[-180, 180, -90, 90], cmap='coolwarm')
plt.title('Synthetic Climate Data (Month 2)')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.colorbar(label='Temperature', fraction=0.025, pad=0.04)
plt.show()
# make the colorbar smaller
../_images/2.12_dimensionality_reduction_25_0.png

PCA#

# Step 7: Reshape data for PCA
data_reshaped = data.reshape(time_steps, lat_points * lon_points)

# Step 8: Standardize the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_standardized = scaler.fit_transform(data_reshaped)

# Step 9: Apply PCA
pca = PCA(n_components=5)
principal_components = pca.fit_transform(data_standardized)

# Step 10: Interpret Results
# Reshape the principal components back to spatial dimensions for plotting
eofs = pca.components_.reshape(5, lat_points, lon_points)

# Plot the first EOF (Spatial Pattern)
plt.imshow(eofs[0], extent=[-180, 180, -90, 90], cmap='coolwarm')
plt.title('First EOF - Spatial Pattern')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.colorbar(label='Amplitude', fraction=0.025, pad=0.04)
plt.show()

# Plot the corresponding time series of the first principal component
plt.plot(principal_components[:, 0])
plt.title('First Principal Component Time Series')
plt.xlabel('Time (Months)')
plt.ylabel('Amplitude')
plt.show()

# Plot the second EOF (Spatial Pattern)
plt.imshow(eofs[1], extent=[-180, 180, -90, 90], cmap='coolwarm')
plt.title('Second EOF - Spatial Pattern')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.colorbar(label='Amplitude', fraction=0.025, pad=0.04)
plt.show()

# Plot the corresponding time series of the second principal component
plt.plot(principal_components[:, 1])
plt.title('Second Principal Component Time Series')
plt.xlabel('Time (Months)')
plt.ylabel('Amplitude')
plt.show()


# Plot the third EOF (Spatial Pattern)
plt.imshow(eofs[2], extent=[-180, 180, -90, 90], cmap='coolwarm')
plt.title('Third EOF - Spatial Pattern')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.colorbar(label='Amplitude', fraction=0.025, pad=0.04)
plt.show()

# Plot the corresponding time series of the third principal component
plt.plot(principal_components[:, 2])
plt.title('Third Principal Component Time Series')
plt.xlabel('Time (Months)')
plt.ylabel('Amplitude')
plt.show()


# Explained variance ratios
explained_variance = pca.explained_variance_ratio_
print("Explained Variance Ratios:", explained_variance)
../_images/2.12_dimensionality_reduction_27_0.png ../_images/2.12_dimensionality_reduction_27_1.png ../_images/2.12_dimensionality_reduction_27_2.png ../_images/2.12_dimensionality_reduction_27_3.png ../_images/2.12_dimensionality_reduction_27_4.png ../_images/2.12_dimensionality_reduction_27_5.png
Explained Variance Ratios: [0.72900329 0.04557341 0.00145263 0.00137667 0.00135938]
# Plot the scree plot
plt.plot(np.arange(1, 6), explained_variance, 'o-')
plt.title('Variance explained by each PC ')
plt.xlabel('Principal Component')
plt.ylabel('Explained Variance Ratio')
plt.show()
../_images/2.12_dimensionality_reduction_29_0.png

It is noted here that there is no good separatation between the first two components. Here are some limitations in using PCA for geospatial-temporal data.

1. Correlation Structure in Geospatial-Temporal Data#

  • Spatial and Temporal Autocorrelation: Geospatial data often exhibits strong spatial and temporal autocorrelation—nearby locations and consecutive time points tend to be similar. This can make it challenging for PCA to separate patterns, as PCA is based on the assumption that components are linearly uncorrelated. However, in geospatial-temporal data, the patterns are often interdependent, which can result in components that do not separate spatial and temporal influences cleanly.

  • Smooth Transitions: Many geospatial-temporal processes (e.g., climate variables) change smoothly over space and time. These gradual transitions may not produce distinct, orthogonal components when applying PCA, as the variance is spread across broad regions or time spans rather than concentrated in distinct patterns.

2. Mixing of Spatial and Temporal Variance#

  • Spatial and Temporal Dynamics Interact: In geospatial-temporal datasets, spatial variability and temporal variability are often intertwined. For example, seasonal patterns might dominate both spatial and temporal dimensions, leading PCA to pick components that capture a mix of spatial and temporal variance, rather than separating them into different components. The result is components that may reflect a combination of both dimensions, but not in a way that provides clear separation.

  • Complex, Non-linear Interactions: Many geospatial-temporal processes are driven by non-linear interactions (e.g., atmospheric processes, hydrology). PCA, being a linear method, is not well-suited to capturing non-linear relationships, leading to components that fail to distinguish between different driving forces of the variability.

4. Temporal Non-stationarity#

  • Changing Processes Over Time: In many geospatial-temporal datasets, the underlying processes are non-stationary, meaning the dominant patterns of variability change over time. PCA assumes that the variance structure is constant, which can lead to components that reflect averaged patterns over the entire dataset but fail to capture temporal shifts in dominant processes, making interpretation difficult.

5. Sensitivity to Data Preprocessing#

  • Scaling and Centering Issues: In geospatial-temporal data, variables often have different units, magnitudes, or scales (e.g., temperature, precipitation, vegetation indices). Proper scaling and centering of variables are crucial in PCA, but choosing an appropriate scaling method can be difficult when dealing with multiple variables or dimensions (e.g., space and time). Incorrect scaling can lead to overemphasis on one variable or dimension, causing poor separation of components.

6. Dimensionality Challenges#

  • High-Dimensional Data: Geospatial-temporal datasets often have a very high dimensionality, with many spatial points and time steps. In such cases, PCA can struggle because the variance is spread across many dimensions, leading to components that capture small amounts of variance and are difficult to interpret. This is often referred to as the “curse of dimensionality,” where the high number of variables makes it difficult to reduce the dataset to a small number of interpretable components.

4. Independent Component Analysis#

What is ICA and How Does It Differ from PCA? Independent Component Analysis (ICA) is a computational method for separating a multivariate signal into additive, independent non-Gaussian components. It is a type of blind source separation technique.

Key Differences Between ICA and PCA:#

PCA: Finds orthogonal axes (principal components) that maximize the variance in the data. ICA: Seeks statistically independent components, not necessarily orthogonal, by minimizing mutual information. Assumptions:

PCA: Assumes that the data is linearly correlated and relies on second-order statistics (covariance). ICA: Assumes that the underlying components are statistically independent and non-Gaussian. Results:

PCA: Components are uncorrelated but may not be independent. ICA: Components are both uncorrelated and independent, capturing more complex underlying structures.

ICA is used to estimate sources given noisy measurements. It is frequently used in Geodesy to isolate contributions from earthquakes and hydrology.

Advantages of ICA in Geosciences:#

  • Blind Source Separation: ICA can separate mixed signals into their original sources without prior knowledge, which is useful when the underlying processes are unknown.

  • Non-Gaussian Signals: Many geophysical processes generate non-Gaussian data (e.g., precipitation events, seismic signals), where ICA can be more effective than PCA.

  • Independent Components: Identifying independent factors can help in understanding and modeling complex geoscientific phenomena that are driven by multiple independent sources.

from scipy import signal
from sklearn.decomposition import FastICA

# Generate sample data
np.random.seed(0)
n_samples = 2000
time = np.linspace(0, 8, n_samples)

# create 3 source signals
s1 = np.sin(2 * time)  # Signal 1 : sinusoidal signal
s2 = np.sign(np.sin(3 * time))  # Signal 2 : square signal
s3 = signal.sawtooth(2 * np.pi * time)  # Signal 3: saw tooth signal


S = np.c_[s1, s2, s3]
S += 0.2 * np.random.normal(size=S.shape)  # Add noise
S /= S.std(axis=0)  # Standardize data

print(S)
[[ 0.495126    0.07841108 -1.31840023]
 [ 0.64019598  1.34570272 -1.94657351]
 [ 0.28913069  0.9500949  -1.646886  ]
 ...
 [-0.38561943 -0.71624672  1.34043406]
 [-0.50777458 -1.24052539  1.74176784]
 [-0.5550078  -0.90265774 -1.54534953]]
# Mix data
# create 3 signals at 3 receivers:
A = np.array([[1, 1, 1], [0.5, 2, 1.0], [1.5, 1.0, 2.0]])  # Mixing matrix
X = np.dot(S, A.T)  # Generate observations
plt.figure(figsize=(10, 7))
plt.subplot(3, 1, 1)
plt.plot(S)
plt.title('True Sources')
plt.subplot(3, 1, 2)
plt.plot(X)
plt.title('Mixed Signals')
plt.tight_layout()
../_images/2.12_dimensionality_reduction_34_0.png
# Compute ICA
ica = FastICA(n_components=3)
S_ = ica.fit_transform(X)  # Reconstruct signals
A_ = ica.mixing_  # Get estimated mixing matrix
print(A_,A)

# For comparison, compute PCA
pca = PCA(n_components=3)
H = pca.fit_transform(X)  # Reconstruct signals based on orthogonal components
[[ 1.01396018  1.02993002 -0.94915864]
 [ 1.9826966   0.98533885 -0.47232592]
 [ 0.99746591  2.05242661 -1.37841317]] [[1.  1.  1. ]
 [0.5 2.  1. ]
 [1.5 1.  2. ]]
plt.figure(figsize=(11,8))
models = [X, S, S_, H]
names = ['Observations (mixed signal)',
         'True Sources',
         'ICA recovered signals', 
         'PCA recovered signals']
colors = ['red', 'steelblue', 'orange']
for ii, (model, name) in enumerate(zip(models, names), 1):
    plt.subplot(4, 1, ii)
    plt.title(name)
    for sig, color in zip(model.T, colors):
        plt.plot(sig, color=color)
plt.tight_layout()
plt.show()
../_images/2.12_dimensionality_reduction_36_0.png
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import FastICA
from sklearn.preprocessing import StandardScaler

# Step 8: Center and whiten the data
scaler = StandardScaler()
data_whitened = scaler.fit_transform(data_reshaped)

# Step 9: Apply ICA
n_components = 3
ica = FastICA(n_components=n_components, random_state=0, max_iter=1000)
S_ = ica.fit_transform(data_whitened)  # Reconstructed signals
A_ = ica.mixing_  # Estimated mixing matrix

# Step 10: Reshape independent components for plotting
ICs = A_.T.reshape(n_components, lat_points, lon_points)

# Step 11: Plot the independent components
for i in range(n_components):
    plt.figure(figsize=(8, 4))
    plt.subplot(1, 2, 1)
    plt.imshow(ICs[i], extent=[-180, 180, -90, 90], cmap='coolwarm')
    plt.title(f'Independent Component {i+1} - Spatial Pattern')
    plt.xlabel('Longitude')
    plt.ylabel('Latitude')
    plt.colorbar(label='Amplitude', fraction=0.025, pad=0.04)

    plt.subplot(1, 2, 2)
    plt.plot(S_[:, i])
    plt.title(f'Independent Component {i+1} - Time Series')
    plt.xlabel('Time (Months)')
    plt.ylabel('Amplitude')

    plt.tight_layout()
    plt.show()
../_images/2.12_dimensionality_reduction_37_0.png ../_images/2.12_dimensionality_reduction_37_1.png ../_images/2.12_dimensionality_reduction_37_2.png

6. Other Techniques#

  1. Random Projections https://scikit-learn.org/stable/modules/random_projection.html

  2. Multidimensional Scaling

  3. Isomap

  4. t-Distributed stochastic neighbor embedding

  5. Linear discriminant analysis

!pip install earthaccess
Collecting earthaccess
  Downloading earthaccess-0.11.0-py3-none-any.whl.metadata (7.0 kB)
Requirement already satisfied: fsspec>=2022.11 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from earthaccess) (2024.9.0)
Collecting importlib-resources>=6.3.2 (from earthaccess)
  Downloading importlib_resources-6.4.5-py3-none-any.whl.metadata (4.0 kB)
Collecting multimethod>=1.8 (from earthaccess)
  Downloading multimethod-1.12-py3-none-any.whl.metadata (9.6 kB)
Requirement already satisfied: numpy>=1.24.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from earthaccess) (1.26.0)
Collecting pqdm>=0.1 (from earthaccess)
  Downloading pqdm-0.2.0-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting python-cmr>=0.10.0 (from earthaccess)
  Downloading python_cmr-0.13.0-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: requests>=2.26 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from earthaccess) (2.31.0)
Requirement already satisfied: s3fs>=2022.11 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from earthaccess) (2024.9.0)
Collecting tinynetrc>=1.3.1 (from earthaccess)
  Downloading tinynetrc-1.3.1-py2.py3-none-any.whl.metadata (2.9 kB)
Collecting typing-extensions>=4.10.0 (from earthaccess)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: zipp>=3.1.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from importlib-resources>=6.3.2->earthaccess) (3.17.0)
Collecting bounded-pool-executor (from pqdm>=0.1->earthaccess)
  Downloading bounded_pool_executor-0.0.3-py3-none-any.whl.metadata (2.7 kB)
Requirement already satisfied: tqdm in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from pqdm>=0.1->earthaccess) (4.66.5)
Requirement already satisfied: python-dateutil<3.0.0,>=2.8.2 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from python-cmr>=0.10.0->earthaccess) (2.8.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from requests>=2.26->earthaccess) (3.3.0)
Requirement already satisfied: idna<4,>=2.5 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from requests>=2.26->earthaccess) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from requests>=2.26->earthaccess) (1.26.20)
Requirement already satisfied: certifi>=2017.4.17 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from requests>=2.26->earthaccess) (2023.7.22)
Requirement already satisfied: aiobotocore<3.0.0,>=2.5.4 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from s3fs>=2022.11->earthaccess) (2.15.1)
Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from s3fs>=2022.11->earthaccess) (3.10.9)
Requirement already satisfied: botocore<1.35.24,>=1.35.16 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs>=2022.11->earthaccess) (1.35.23)
Requirement already satisfied: wrapt<2.0.0,>=1.10.10 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs>=2022.11->earthaccess) (1.15.0)
Requirement already satisfied: aioitertools<1.0.0,>=0.5.1 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs>=2022.11->earthaccess) (0.12.0)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2022.11->earthaccess) (2.4.3)
Requirement already satisfied: aiosignal>=1.1.2 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2022.11->earthaccess) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2022.11->earthaccess) (23.1.0)
Requirement already satisfied: frozenlist>=1.1.1 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2022.11->earthaccess) (1.4.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2022.11->earthaccess) (6.0.4)
Requirement already satisfied: yarl<2.0,>=1.12.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2022.11->earthaccess) (1.13.1)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs>=2022.11->earthaccess) (4.0.3)
Requirement already satisfied: six>=1.5 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.8.2->python-cmr>=0.10.0->earthaccess) (1.16.0)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from botocore<1.35.24,>=1.35.16->aiobotocore<3.0.0,>=2.5.4->s3fs>=2022.11->earthaccess) (1.0.1)
Downloading earthaccess-0.11.0-py3-none-any.whl (59 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 59.4/59.4 kB 1.7 MB/s eta 0:00:00
?25hDownloading importlib_resources-6.4.5-py3-none-any.whl (36 kB)
Downloading multimethod-1.12-py3-none-any.whl (10 kB)
Downloading pqdm-0.2.0-py2.py3-none-any.whl (6.8 kB)
Downloading python_cmr-0.13.0-py3-none-any.whl (14 kB)
Downloading tinynetrc-1.3.1-py2.py3-none-any.whl (3.9 kB)
Using cached typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading bounded_pool_executor-0.0.3-py3-none-any.whl (3.4 kB)
Installing collected packages: tinynetrc, bounded-pool-executor, typing-extensions, multimethod, importlib-resources, python-cmr, pqdm, earthaccess
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.5.0
    Uninstalling typing_extensions-4.5.0:
      Successfully uninstalled typing_extensions-4.5.0
  Attempting uninstall: importlib-resources
    Found existing installation: importlib-resources 6.1.0
    Uninstalling importlib-resources-6.1.0:
      Successfully uninstalled importlib-resources-6.1.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.12.1 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.12.2 which is incompatible.
Successfully installed bounded-pool-executor-0.0.3 earthaccess-0.11.0 importlib-resources-6.4.5 multimethod-1.12 pqdm-0.2.0 python-cmr-0.13.0 tinynetrc-1.3.1 typing-extensions-4.12.2

[notice] A new release of pip is available: 23.3.1 -> 24.2
[notice] To update, run: pip install --upgrade pip
!pip install "sliderule @ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python"
Collecting sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python
  Cloning https://github.com/SlideRuleEarth/sliderule to /private/var/folders/js/lzmy975n0l5bjbmr9db291m00000gn/T/pip-install-nbzikxge/sliderule_75a8f52e3d2a4c65b4d67ac42ee0abe3
  Running command git clone --filter=blob:none --quiet https://github.com/SlideRuleEarth/sliderule /private/var/folders/js/lzmy975n0l5bjbmr9db291m00000gn/T/pip-install-nbzikxge/sliderule_75a8f52e3d2a4c65b4d67ac42ee0abe3
  Resolved https://github.com/SlideRuleEarth/sliderule to commit 93fc823b661bf80e054a8c89c3e63ba14f1f8a35
  Preparing metadata (setup.py) ... ?25ldone
?25hRequirement already satisfied: requests in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (2.31.0)
Requirement already satisfied: numpy in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (1.26.0)
Requirement already satisfied: fiona in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (1.9.4)
Requirement already satisfied: geopandas in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (0.14.0)
Requirement already satisfied: shapely in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (2.0.1)
Requirement already satisfied: scikit-learn in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (1.3.1)
Requirement already satisfied: pyarrow in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (12.0.1)
Requirement already satisfied: attrs>=19.2.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from fiona->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (23.1.0)
Requirement already satisfied: certifi in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from fiona->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (2023.7.22)
Requirement already satisfied: click~=8.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from fiona->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (8.1.7)
Requirement already satisfied: click-plugins>=1.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from fiona->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (1.1.1)
Requirement already satisfied: cligj>=0.5 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from fiona->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (0.7.2)
Requirement already satisfied: six in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from fiona->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (1.16.0)
Requirement already satisfied: importlib-metadata in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from fiona->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (6.8.0)
Requirement already satisfied: packaging in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from geopandas->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (23.2)
Requirement already satisfied: pandas>=1.4.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from geopandas->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (2.1.1)
Requirement already satisfied: pyproj>=3.3.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from geopandas->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (3.6.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from requests->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (3.3.0)
Requirement already satisfied: idna<4,>=2.5 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from requests->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from requests->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (1.26.20)
Requirement already satisfied: scipy>=1.5.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from scikit-learn->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (1.11.3)
Requirement already satisfied: joblib>=1.1.1 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from scikit-learn->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from scikit-learn->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (3.2.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from pandas>=1.4.0->geopandas->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from pandas>=1.4.0->geopandas->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from pandas>=1.4.0->geopandas->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (2023.3)
Requirement already satisfied: zipp>=0.5 in /Users/marinedenolle/opt/miniconda3/envs/mlgeo/lib/python3.9/site-packages (from importlib-metadata->fiona->sliderule@ git+https://github.com/SlideRuleEarth/sliderule#subdirectory=clients/python) (3.17.0)
Building wheels for collected packages: sliderule
  Building wheel for sliderule (setup.py) ... ?25ldone
?25h  Created wheel for sliderule: filename=sliderule-4.8.3-py3-none-any.whl size=137059 sha256=c239c424ae8618de8637fb4d977ce8f5f76f946736ee3065978568ad4e396c72
  Stored in directory: /private/var/folders/js/lzmy975n0l5bjbmr9db291m00000gn/T/pip-ephem-wheel-cache-1umkice8/wheels/0e/05/25/54630cda2aa0bcc766bede5fcadf8fbd67ce1bb2681db5e16b
Successfully built sliderule
Installing collected packages: sliderule
Successfully installed sliderule-4.8.3

[notice] A new release of pip is available: 23.3.1 -> 24.2
[notice] To update, run: pip install --upgrade pip
import sliderule
import requests

URL = 'https://data.gesdisc.earthdata.nasa.gov/data/MERRA2/M2T1NXSLV.5.12.4/1980/01/MERRA2_100.tavg1_2d_slv_Nx.19800101.nc4'

# Set the FILENAME string to the data file name, the LABEL keyword value, or any customized name. 
# Remember to include the same file extension as in the URL.
FILENAME = 'MERRA2_100.tavg1_2d_slv_Nx.19800101.nc4'

result = requests.get(URL)


try:
    result.raise_for_status()
    f = open(FILENAME,'wb')
    f.write(result.content)
    f.close()
    print('contents of URL written to '+FILENAME)
except:
    print('requests.get() returned an error code '+str(result.status_code))
requests.get() returned an error code 401
# To download multiple files, change the second temporal parameter
results = earthaccess.search_data(
    doi="10.5067/VJAFPLI1CSIV",
    temporal=('1980-01-01', '1980-01-01'), # This will stream one granule, but can be edited for a longer temporal extent
    bounding_box=(-180, 0, 180, 90)
)

downloaded_files = earthaccess.download(
    results,
    local_path='.', # Change this string to download to a different path
)