Final Integrated Project in Machine Learning in Geoscience#

Objective: Integrate the learning of the entire class into a single, group project that demonstrates understanding and skill to manipulate data and develop machine learning approaches to a scientific problem. Evaluate the integration of AI-ready data preparation, classical machine learning (CML), and deep learning (DL) components into a cohesive project, with a focus on scientific discussion, interpretation, reproducibility, and team contributions.


1. Report (5 Pages) - 40%#

  • Content & Research Quality (25%) Quality, depth, and relevance of the research

    • Present a clear statement of an outstanding research question and place it in the context of an up-to-date literature review

    • Demonstrates the originality of the research

    • Present an AI-ready data set, preliminary analysis with correlation or description of basic data feature, and discuss potential data imbalance within the context of the stated problem.

    • Demonstrate strong understanding of classic machine learning and/or deep learning with an example

    • Discuss the performance evaluation in the context of training the model and for generalization beyond the data and domain presented in the report.

    • Discuss the computational time for training and deploying

    • Discuss the appropriateness of computational resources needed

  • Structure and Organization (15%) Coherence, clarity, and flow of the report

    • Logical structure (introduction, body, conclusions, references)

    • Well-organized paragraphs with transitions between ideas

    • Key points and arguments are clearly presented and easy to follow

    • Figures presented have captions and clear labels and are references in the text.

  • Clarity and Writing Style (20%) Quality & Effectiveness in Writing

    • Clear concise language

    • Minimal grammar or spelling errors

    • Professional and academic tone appropriate for the field

  • Critical Thinking & Analysis (20%) Depth of analysis and reflection on the research topic

    • The analysis goes beyond simple description and shows depths of thought

    • Acknowledges alternative perspectives or potential limitations in the research

    • Demonstrates original thinking and critical engagement with the research

  • Formatting & Citations (10%) Adherence to format guidelines and proper citation of sources

    • Follows formatting requirements (e.g., margins, font, length)

    • Correct use of citation style (e.g., APA, MLA, Chicago)

    • Correctly state the author’s CredIT (every student enrolled needs to be associated with a contribution and the CreDIT statement will assert that)


2. GitHub Repository - 35%#

  • Code Quality (15%) Quality and functionality of the scripts and code

    • The code is clean, well-documented, and follows good programming practices

    • All scripts run without errors (when the environment is properly set up)

    • Code is modular, with reusable functions where appropriate

    • Scripts achieve the intended outcomes (e.g., generating plots, performing analysis)

  • Reproducibility (25%) Ease of reproducing the analysis and results

    • The repository includes clear instructions (e.g., in a README.md) for setting up the environment and running the code

    • Jupyter notebooks, scripts, and any other files necessary to recreate the analysis are provided

    • Data (or instructions to access data) are included or referenced appropriately

    • Output (plots, tables) are reproducible using the code

  • Organization & Structure (20%) Organization and clarity of the repository structure

    • Repository is well-organized with clear folder structure (e.g., separate folders for code, data, results, etc.)

    • File and folder names are descriptive and intuitive

    • README file provides a clear overview of the repository and how to navigate it

  • Documentation (15%) Clarity and completeness of documentation

    • README file clearly explains the project, dependencies, and setup instructions

    • Code and notebooks are well-documented, including comments explaining key sections

    • Scripts include docstrings for functions and appropriate inline comments

  • Environment Setup (10%) Provision of environment setup and dependency management

    • Includes a complete and working conda environment file (environment.yml) or requirements.txt for virtual environments

    • Environment file lists all necessary dependencies with correct versions

    • Instructions for setting up the environment are clear and easy to follow

  • Version Control Practices (5%) Effective use of Git and GitHub features

    • Commits are frequent, descriptive, and reflect the progression of the project.

    • Clear use of branches, if applicable (e.g., for different features or phases of the project)

    • Issues, pull requests, or other GitHub collaboration tools are used effectively.


3. Presentation (10-15 Minutes) - 25%#

  • Content (20%) Quality and depth of research

    • Present a clear statement of an outstanding research question

    • Provides evidence and reference to the scientific literature to support the key points

    • Present an AI-ready data set and preliminary analysis.

    • Demonstrate a classic machine learning example.

    • Demonstrates a deep learning example

    • Discuss the computational time for training and deploying

    • Discuss the appropriateness of computational resources needed

  • Structure and Organization (15%) Coherence and flow of the presentation

    • Logical structure (report and presentation: intro, body, conclusion; software: readme, env file, src/, data/, plots/,…)

    • Key points are clearly distinguished and emphasized

  • Clarity and Delivery (20%) Effectiveness in communicating ideas

    • Clear articulation and pronunciation

    • Adequate volume and pace

    • Minimal reliance on notes, maintaining eye contact with the audience

    • Confident and professional demeanor

  • Visual Aids (10%) Effectiveness of any supporting visual materials (slides, charts, plots)

    • Visual aids enhance, not distract, from the presentation

    • Information is presented clearly, is easy to follow, and uses appropriate design principles

    • Slides do not overwhelm with text or complex visuals

  • Engagement and Interaction (10%) Ability to engage and interact with the audience

    • Encourages audience interaction through questions or active participation

    • Responds effectively to audience questions and comments

  • Critical Thinking and Analysis (15%) Depth of analysis and reflection on the research topic

    • Demonstrates original thinking and critical engagement with the research

    • Identifies limitations or future directions for research

  • Professionalism (10%) Overall, the professional quality of the presentation

    • Respect for time limits, prepared with material

    • Appropriate attire and a respectful manner.


4. Overall Team Contributions - 10%#

  • Evaluates how well the team worked together to deliver a cohesive project.

  • Assessed through peer evaluations, clear documentation of roles, and balance of contributions across all deliverables.


Summary of Weightage:#

  1. Report: 40%

  2. GitHub Repository: 35%

  3. Presentation: 25%

  4. Overall Team Contributions: 10%