Final Integrated Project in Machine Learning in Geoscience#

Objective: Integrate the learning of the entire class into a single, group project that demonstrates understanding and skill to manipulate data and develop machine learning approaches to a scientific problem. Evaluate the integration of AI-ready data preparation, classical machine learning (CML), and deep learning (DL) components into a cohesive project, with a focus on scientific discussion, interpretation, reproducibility, and team contributions.


1. Report (5 Pages) - 40%#

  • Content & Research Quality (25%) Quality, depth, and relevance of the research

    • Present a clear statement of an outstanding research question and place it in the context of an up-to-date literature review

    • Demonstrates the originality of the research

    • Present an AI-ready data set, preliminary analysis with correlation or description of basic data feature, and discuss potential data imbalance within the context of the stated problem.

    • Demonstrate strong understanding of classic machine learning and/or deep learning with an example

    • Discuss the performance evaluation in the context of training the model and for generalization beyond the data and domain presented in the report.

    • Discuss the computational time for training and deploying

    • Discuss the appropriateness of computational resources needed

  • Structure and Organization (15%) Coherence, clarity, and flow of the report

    • Logical structure:

      • Introduction with scientific background to motivate the project and the state of the data (e.g., we are missing the measurements X and propose to predict it from Y),

      • Data: what is a data sample, what are the features, what are the dimensions, EDA, if the problem is time series forecasting, how do you organize the data for training by splitting shorter time series into long time series, train-validation-test split and some discussion on whether in-domain vs out-of-domain the test data is.

      • Models: design, training strategies (e.g., for deep learning, give optimizer, learning rate, batch size, show learning curve),

      • Generalization statement: discuss the test performance and whether the generalization

      • Results: how is it doing and what do you think you can learn from it? how does the training take, how long does it take to infer/predict? How do you see this analysis contributing to a big data mining project?

      • conclusions and outlook,

      • references.

    • Well-organized paragraphs with transitions between ideas

    • Key points and arguments are clearly presented and easy to follow

    • Figures presented have captions and clear labels and are references in the text.

  • Clarity and Writing Style (20%) Quality & Effectiveness in Writing

    • Clear concise language

    • Minimal grammar or spelling errors

    • Professional and academic tone appropriate for the field

  • Critical Thinking & Analysis (20%) Depth of analysis and reflection on the research topic

    • The analysis goes beyond simple description and shows depths of thought: address the questions:

      • what have you learned from this?

      • what are the advantages and limitations of your approaches?

      • critically, what do you think the ML methods can contribute to the over all research?

    • Acknowledges alternative perspectives or potential limitations in the research

    • Demonstrates original thinking and critical engagement with the research

  • Formatting & Citations (10%) Adherence to format guidelines and proper citation of sources

    • Follows formatting requirements (e.g., margins, font, length)

    • Correct use of citation style (e.g., APA, MLA, Chicago)

    • Correctly state the author’s CredIT (every student enrolled needs to be associated with a contribution and the CreDIT statement will assert that)


2. GitHub Repository - 35%#

  • Code Quality (15%) Quality and functionality of the scripts and code

    • The code is clean, well-documented, and follows good programming practices

    • All scripts run without errors (when the environment is properly set up)

    • Code is modular, with reusable functions where appropriate

    • Scripts achieve the intended outcomes (e.g., generating plots, performing analysis)

  • Reproducibility (25%) Ease of reproducing the analysis and results

    • The repository includes clear instructions (e.g., in a README.md) for setting up the environment and running the code

    • Jupyter notebooks, scripts, and any other files necessary to recreate the analysis are provided

    • Data (or instructions to access data) are included or referenced appropriately

    • Output (plots, tables) are reproducible using the code

  • Organization & Structure (20%) Organization and clarity of the repository structure

    • Repository is well-organized with clear folder structure (e.g., separate folders for code, data, results, etc.)

    • File and folder names are descriptive and intuitive

    • README file provides a clear overview of the repository and how to navigate it

  • Documentation (15%) Clarity and completeness of documentation

    • README file clearly explains the project, dependencies, and setup instructions

    • Code and notebooks are well-documented, including comments explaining key sections

    • Scripts include docstrings for functions and appropriate inline comments

  • Environment Setup (10%) Provision of environment setup and dependency management

    • Includes a complete and working conda environment file (environment.yml) or requirements.txt for virtual environments

    • Environment file lists all necessary dependencies with correct versions

    • Instructions for setting up the environment are clear and easy to follow

  • Version Control Practices (5%) Effective use of Git and GitHub features

    • Commits are frequent, descriptive, and reflect the progression of the project.

    • Clear use of branches, if applicable (e.g., for different features or phases of the project)

    • Issues, pull requests, or other GitHub collaboration tools are used effectively.


3. Presentation (10-15 Minutes) - 25%#

  • Content (20%) Quality and depth of research

    • Present a clear statement of an outstanding research question

    • Provides evidence and reference to the scientific literature to support the key points

    • Present an AI-ready data set and preliminary analysis.

    • Demonstrate a classic machine learning example.

    • Demonstrates a deep learning example

    • Discuss the computational time for training and deploying

    • Discuss the appropriateness of computational resources needed

  • Structure and Organization (15%) Coherence and flow of the presentation

    • Logical structure (report and presentation: intro, body, conclusion; software: readme, env file, src/, data/, plots/,…)

    • Key points are clearly distinguished and emphasized

  • Clarity and Delivery (20%) Effectiveness in communicating ideas

    • Clear articulation and pronunciation

    • Adequate volume and pace

    • Minimal reliance on notes, maintaining eye contact with the audience

    • Confident and professional demeanor

  • Visual Aids (10%) Effectiveness of any supporting visual materials (slides, charts, plots)

    • Visual aids enhance, not distract, from the presentation

    • Information is presented clearly, is easy to follow, and uses appropriate design principles

    • Slides do not overwhelm with text or complex visuals

  • Engagement and Interaction (10%) Ability to engage and interact with the audience

    • Encourages audience interaction through questions or active participation

    • Responds effectively to audience questions and comments

  • Critical Thinking and Analysis (15%) Depth of analysis and reflection on the research topic

    • Demonstrates original thinking and critical engagement with the research

    • Identifies limitations or future directions for research

  • Professionalism (10%) Overall, the professional quality of the presentation

    • Respect for time limits, prepared with material

    • Appropriate attire and a respectful manner.


4. Overall Team Contributions - 10%#

  • Evaluates how well the team worked together to deliver a cohesive project.

  • Assessed through peer evaluations, clear documentation of roles, and balance of contributions across all deliverables.


Summary of Weightage:#

  1. Report: 40%

  2. GitHub Repository: 35%

  3. Presentation: 25%

  4. Overall Team Contributions: 10%