Don't forget to join the competition: "Late Submission"
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
import os
kaggle_dir = f"/content/drive/My Drive/kaggle"
competition = "tbd"
target_dir = f"{kaggle_dir}/{competition}"
os.environ['KAGGLE_CONFIG_DIR'] = kaggle_dir
!kaggle competitions download -c "{competition}"
!mkdir -p "{target_dir}"
!unzip "{competition}.zip" -d "{target_dir}"
df = pd.read_csv(f"{target_dir}/train.csv")
df.to_parquet(f"{target_dir}/train.parquet")
df = pd.read_csv(f"{target_dir}/test.csv")
df.to_parquet(f"{target_dir}/test.parquet")
df = pd.read_csv(f"{target_dir}/sample_submission.csv")
df.to_parquet(f"{target_dir}/sample_submission.parquet")
!rm "{target_dir}/train.csv"
!rm "{target_dir}/test.csv"
!rm "{target_dir}/sample_submission.csv"
df_train = pd.read_parquet(f"{target_dir}/train.parquet")
df_test = pd.read_parquet(f"{target_dir}/test.parquet")
df_submission = pd.read_parquet(f"{target_dir}/sample_submission.parquet")
Analyse data for feature engineering and insights.
Split df_train into training and validation sets, then train your model. Use the validation set to tune hyperparameters and avoid overfitting.
Use df_test to generate predictions and save them in df_submission before submitting.