ML and domain science education to accelerate ML-based scientific discovery
ML and domain science education
Increased data volume
More variables and data sources, higher spatiotemporal resolution
ML for advancing discovery
ML methods and tools aid in detection of patterns and changes from large amounts of data
As geoscience has entered an era in which both in-situ and remote sensing observations are dense and global, advances in observational techniques have dramatically increased the spatial and temporal resolution and therefore, data quality and volume. ML tools can assist handling large volumes of observations, modeling, analysis, and forecasting of the environment by increasing the speed and accuracy of computations. With recent computational hardware innovations such as graphics processing units (GPUs), as well as innovations in ML methods themselves, ML is positioned to assist geoscientists by exploiting untapped environmental observations and enhancing Earth system modeling in timely and cost-effective ways. However, reproducible ML workflows require sophisticated coupling of advanced software, data processing pipelines, visualization tools, and rapidly evolving Cloud-based CI. Adopting ML curriculum in classrooms has great potential for advancing ML-based discovery in the geosciences through hands-on teaching and project work. We will cultivate the development of discipline-specific ML libraries, workflows and communities of practice capable of sustaining future growth of ML cybertraining opportunities. To address the general lack of educational material in ML for geoscientific applications we will assemble a team of geoscience educators to create a novel ML curriculum. These materials could then be included in several university courses to further broaden our impact on emerging ML communities. Our GeoSMART implementation plan will guide participants through training in fundamental open source ML toolkits and data science skills. With these fundamental skills, participants will then progress through interactive events including hackweeks, project-led and peer-to-peer mentoring activities and incubators. By building tools using open source and cloud-accessible platforms, and by partnering with colleges and institutions who currently lack computing resources for ML workflows, we will democratize access to our cybertraining materials and ensure more people can be included in helping to solve urgent geoscience challenges. There is a clear need for scientists that have both data science skills and domain knowledge, and the demand for these coupled skills is recognized by both academic and professional environments.ML Adoption in Classrooms
ML Support for GeoScience Communities