Final Integrated Project in Machine Learning in Geoscience
Contents
Final Integrated Project in Machine Learning in Geoscience#
Objective: Integrate the learning of the entire class into a single, group project that demonstrates understanding and skill to manipulate data and develop machine learning approaches to a scientific problem. Evaluate the integration of AI-ready data preparation, classical machine learning (CML), and deep learning (DL) components into a cohesive project, with a focus on scientific discussion, interpretation, reproducibility, and team contributions.
1. Report (5 Pages) - 40%#
Content & Research Quality (25%) Quality, depth, and relevance of the research
Present a clear statement of an outstanding research question and place it in the context of an up-to-date literature review
Demonstrates the originality of the research
Present an AI-ready data set, preliminary analysis with correlation or description of basic data feature, and discuss potential data imbalance within the context of the stated problem.
Demonstrate strong understanding of classic machine learning and/or deep learning with an example
Discuss the performance evaluation in the context of training the model and for generalization beyond the data and domain presented in the report.
Discuss the computational time for training and deploying
Discuss the appropriateness of computational resources needed
Structure and Organization (15%) Coherence, clarity, and flow of the report
Logical structure (introduction, body, conclusions, references)
Well-organized paragraphs with transitions between ideas
Key points and arguments are clearly presented and easy to follow
Figures presented have captions and clear labels and are references in the text.
Clarity and Writing Style (20%) Quality & Effectiveness in Writing
Clear concise language
Minimal grammar or spelling errors
Professional and academic tone appropriate for the field
Critical Thinking & Analysis (20%) Depth of analysis and reflection on the research topic
The analysis goes beyond simple description and shows depths of thought
Acknowledges alternative perspectives or potential limitations in the research
Demonstrates original thinking and critical engagement with the research
Formatting & Citations (10%) Adherence to format guidelines and proper citation of sources
Follows formatting requirements (e.g., margins, font, length)
Correct use of citation style (e.g., APA, MLA, Chicago)
Correctly state the author’s CredIT (every student enrolled needs to be associated with a contribution and the CreDIT statement will assert that)
2. GitHub Repository - 35%#
Code Quality (15%) Quality and functionality of the scripts and code
The code is clean, well-documented, and follows good programming practices
All scripts run without errors (when the environment is properly set up)
Code is modular, with reusable functions where appropriate
Scripts achieve the intended outcomes (e.g., generating plots, performing analysis)
Reproducibility (25%) Ease of reproducing the analysis and results
The repository includes clear instructions (e.g., in a README.md) for setting up the environment and running the code
Jupyter notebooks, scripts, and any other files necessary to recreate the analysis are provided
Data (or instructions to access data) are included or referenced appropriately
Output (plots, tables) are reproducible using the code
Organization & Structure (20%) Organization and clarity of the repository structure
Repository is well-organized with clear folder structure (e.g., separate folders for code, data, results, etc.)
File and folder names are descriptive and intuitive
README
file provides a clear overview of the repository and how to navigate it
Documentation (15%) Clarity and completeness of documentation
README file clearly explains the project, dependencies, and setup instructions
Code and notebooks are well-documented, including comments explaining key sections
Scripts include docstrings for functions and appropriate inline comments
Environment Setup (10%) Provision of environment setup and dependency management
Includes a complete and working conda environment file (
environment.yml
) orrequirements.txt
for virtual environmentsEnvironment file lists all necessary dependencies with correct versions
Instructions for setting up the environment are clear and easy to follow
Version Control Practices (5%) Effective use of Git and GitHub features
Commits are frequent, descriptive, and reflect the progression of the project.
Clear use of branches, if applicable (e.g., for different features or phases of the project)
Issues, pull requests, or other GitHub collaboration tools are used effectively.
3. Presentation (10-15 Minutes) - 25%#
Content (20%) Quality and depth of research
Present a clear statement of an outstanding research question
Provides evidence and reference to the scientific literature to support the key points
Present an AI-ready data set and preliminary analysis.
Demonstrate a classic machine learning example.
Demonstrates a deep learning example
Discuss the computational time for training and deploying
Discuss the appropriateness of computational resources needed
Structure and Organization (15%) Coherence and flow of the presentation
Logical structure (report and presentation: intro, body, conclusion; software: readme, env file, src/, data/, plots/,…)
Key points are clearly distinguished and emphasized
Clarity and Delivery (20%) Effectiveness in communicating ideas
Clear articulation and pronunciation
Adequate volume and pace
Minimal reliance on notes, maintaining eye contact with the audience
Confident and professional demeanor
Visual Aids (10%) Effectiveness of any supporting visual materials (slides, charts, plots)
Visual aids enhance, not distract, from the presentation
Information is presented clearly, is easy to follow, and uses appropriate design principles
Slides do not overwhelm with text or complex visuals
Engagement and Interaction (10%) Ability to engage and interact with the audience
Encourages audience interaction through questions or active participation
Responds effectively to audience questions and comments
Critical Thinking and Analysis (15%) Depth of analysis and reflection on the research topic
Demonstrates original thinking and critical engagement with the research
Identifies limitations or future directions for research
Professionalism (10%) Overall, the professional quality of the presentation
Respect for time limits, prepared with material
Appropriate attire and a respectful manner.
4. Overall Team Contributions - 10%#
Evaluates how well the team worked together to deliver a cohesive project.
Assessed through peer evaluations, clear documentation of roles, and balance of contributions across all deliverables.
Summary of Weightage:#
Report: 40%
GitHub Repository: 35%
Presentation: 25%
Overall Team Contributions: 10%