Home Up PDF Prof. Dr. Ingo Claßen
Rated Exercise 1 - DSML

Rated Exercise 1

  • Via Zoom - Link will be sent by email
  • Start 28.11.2025 at 8:00 - Intro to exercise
  • End 28.11.2025 at 14:00 - Submission of solution
  • One solution per group
  • I will be available during the whole time for video calls

Machine Learning - Kaggle Competition

Don't forget to join the competition: "Late Submission"

Get Data

Mount gdrive

from google.colab import drive
drive.mount('/content/drive')

Import / Config

import pandas as pd
import os

kaggle_dir = f"/content/drive/My Drive/kaggle"
competition = "tbd"
target_dir = f"{kaggle_dir}/{competition}"

os.environ['KAGGLE_CONFIG_DIR'] = kaggle_dir

Download Data

!kaggle competitions download -c "{competition}"
!mkdir -p "{target_dir}"
!unzip "{competition}.zip" -d "{target_dir}"

Convert to Parquet

df = pd.read_csv(f"{target_dir}/train.csv")
df.to_parquet(f"{target_dir}/train.parquet")

df = pd.read_csv(f"{target_dir}/test.csv")
df.to_parquet(f"{target_dir}/test.parquet")

df = pd.read_csv(f"{target_dir}/sample_submission.csv")
df.to_parquet(f"{target_dir}/sample_submission.parquet")

Remove CSV Files

!rm "{target_dir}/train.csv"
!rm "{target_dir}/test.csv"
!rm "{target_dir}/sample_submission.csv"

Work on ML Solution

Load Data

df_train = pd.read_parquet(f"{target_dir}/train.parquet")
df_test = pd.read_parquet(f"{target_dir}/test.parquet")
df_submission = pd.read_parquet(f"{target_dir}/sample_submission.parquet")

Explorative Data Analysis

Analyse data for feature engineering and insights.

Train Model

Split df_train into training and validation sets, then train your model. Use the validation set to tune hyperparameters and avoid overfitting.

Submit

Use df_test to generate predictions and save them in df_submission before submitting.