Home Up PDF Prof. Dr. Ingo Claßen
Applied ML - DSML

Two Sigma Connect: Rental Listing Inquiries

Kaggle Competition (link)

Create Kaggle account

Join competition

Data Preparation via Kaggle

Mount gdrive

from google.colab import drive
drive.mount('/content/drive')

Import / Config

import pandas as pd
import os

kaggle_dir = f"/content/drive/My Drive/kaggle"
competition = "two-sigma-connect-rental-listing-inquiries"
target_dir = f"{kaggle_dir}/rent-two-sigma"

!mkdir -p  "{kaggle_dir}"

os.environ['KAGGLE_CONFIG_DIR'] = kaggle_dir

Get Data

  • Open Kaggle API doc (link)
  • Go to authentication section
  • Create kaggle.json, download to your computer
  • Upload kaggle.json to kaggle directory on your gdrive
!kaggle competitions download -c "{competition}"
!unzip {competition}.zip -d "{target_dir}"
!unzip "{target_dir}/train.json.zip" -d "{target_dir}"

df = pd.read_json(f"{target_dir}/train.json")
df.to_parquet(f"{target_dir}/rent.parquet")

df = df[(df.price>1_000) & (df.price<10_000)]
df = df[(df.longitude!=0) | (df.latitude!=0)]
df = df[(df['latitude']>40.55) & (df['latitude']<40.94) &
        (df['longitude']>-74.1) & (df['longitude']<-73.67)]
df_num = df[['bedrooms','bathrooms','latitude','longitude','price']]
df_num.to_parquet(f"{target_dir}/rent-ideal.parquet")

!rm "{target_dir}/images_sample.zip"
!rm "{target_dir}/Kaggle-renthop.torrent"
!rm "{target_dir}/sample_submission.csv.zip"
!rm "{target_dir}/test.json.zip"
!rm "{target_dir}/train.json"
!rm "{target_dir}/train.json.zip"
  • Remove runtime
  • Save notebook
  • Close notebook

Practical work via ML book on explained.ai

  • Create a new Notebook

Install

!pip install -q rfpimp

Mount gdrive

from google.colab import drive
drive.mount('/content/drive')

Import / Config

import numpy as np
import pandas as pd

Load Data

kaggle_dir = f"/content/drive/My Drive/kaggle"
target_dir = f"{kaggle_dir}/rent-two-sigma"
rent = pd.read_parquet(f"{target_dir}/rent-ideal.parquet")
rent.sample(5)
df = pd.read_parquet(f"{target_dir}/rent.parquet")
df.sample(5)

Hands-On

  • Train a random forest model (link)
  • Exploring and Denoising Your Data Set (link)
  • Categorically Speaking (link)