The final project for BC1016 provides an opportunity to bring together, apply, and communicate your knowledge of data science and statistics from this course. You will work in groups of 2 to choose one of 4 provided datasets to analyze and submit a writeup of your analysis and conclusions in the form of a Jupyter Notebook.

Datasets

Your team will pick one of four possible datasets to explore:

  1. NYC Restaurant Health Inspection food - A dataset with info on the results of NYC restaurant inspections, including the name of establishments, borough, cuisine, inspection grade, and types of violations [🔗 starter notebook & dataset]
  2. Family Planning public health - Data from a 1987 National Indonesia Contraceptive Prevalence survey which looks at the relationships between a mother and partner’s age, education, religious beliefs, and income level as well as contraceptive use and number of children [🔗 starter notebook & dataset]
  3. Spotify Top Streams music - A dataset of top streamed tracks from 2023, which includes information like a track’s artist and album, release date, and audio characteristics (like danceability, energy, and liveliness) [🔗 starter notebook & dataset]
  4. Seattle Pet Licenses pets - Data on registered pets (largely cats and dogs) in Seattle, including breed, pet owner’s zip code and income, and local parks by zip code [🔗 starter notebook & dataset]

We plan to release the starter notebooks for these datasets by Wednesday 4/9

Project Milestones & Deadlines

  1. Group Declaration - due Fri 4/11 at 11:59pm

    1. Fill out the Google Form to indicate who your partner will be for the final project https://forms.gle/CBDDY31zrDQLkGo18
  2. Project Proposal - due Mon 4/14 at 11:59pm

    1. Select a final project notebook & dataset to work on for the final project
    2. Complete the Introduction section of your selected final project notebook as per the instructions below (not including the Prediction section since we have not yet covered this topic)
    3. Submit a PDF of your Jupyter Notebook to Coursework (each person on the team should submit the notebook using the file name convention firstPersonLastName-secondPersonLastName.pdf)
  3. Progress Report - due Fri 4/25 at 11:59

    1. You should be ~60% done with your final project (with at least the Intro, Exploratory Data Analysis, and Hypothesis Testing and the Prediction sections started)
    2. You should list out what analysis remains and how you plan to approaching it
    3. You should also share if you are running into any issues with your analysis that you might need assistance with or have questions about
    4. Submit both a PDF and your Jupyter Notebook file (.ipynb) to Courseworks (each person on the team should submit) using the file name convention firstPersonLastName-secondPersonLastName.zip/.ipynb
    5. Note: Labs on 4/23-4/24 will be work time to work on your Progress Report
  4. Final Project Report - Due Fri 5/9 at 11:59

    1. Submit both a PDF and your Jupyter Notebook file (.ipynb) to Courseworks (each person on the team should submit) using the file name convention firstPersonLastName-secondPersonLastName.zip/.ipynb
    2. Complete peer review
      1. Complete Peer Review section in Notebook with partner
      2. Individual: Submit self- and peer-assessment via Google Forms: https://forms.gle/7PT9yZysvMwUVQuq9
      3. Note: Peer review is required. Failure to complete the 2 steps of the peer review will lead to an automatic deduction of 20 points from your individual grade on the final project.
    3. Note: Labs on 4/30 & 5/1 will be consultation times for groups to get feedback from their TAs

Grading Breakdown