Home Up PDF Prof. Dr. Ingo Claßen
ML1 - DSML

Two Sigma Connect: Rental Listing Inquiries

Kaggle Competition (link)

Create Kaggle account

Join competition

Mount gdrive

from google.colab import drive
drive.mount('/content/drive')

Import / Config

import pandas as pd
import os

kaggle_dir = f"/content/drive/My Drive/kaggle"
competition = "two-sigma-connect-rental-listing-inquiries"
target_dir = f"{kaggle_dir}/rent-two-sigma"

!mkdir -p  "{kaggle_dir}"

os.environ['KAGGLE_CONFIG_DIR'] = kaggle_dir

Data Preparation

  • only done once
  • skip this cell if you start your notebook anew

  • Open Kaggle API doc (link)
  • Goto section authentication
  • Create kaggle.json
  • Upload kaggle.json to kaggle directory on your gdrive
!kaggle competitions download -c "{competition}"
!unzip {competition}.zip -d "{target_dir}"
!unzip "{target_dir}/train.json.zip" -d "{target_dir}"

df = pd.read_json(f"{target_dir}/rent-two-sigma/train.json")
df.to_parquet(f"{target_dir}/rent.parquet")

df = df[(df.price>1_000) & (df.price<10_000)]
df = df[(df.longitude!=0) | (df.latitude!=0)]
df = df[(df['latitude']>40.55) & (df['latitude']<40.94) &
        (df['longitude']>-74.1) & (df['longitude']<-73.67)]
df_num = df[['bedrooms','bathrooms','latitude','longitude','price']]
df_num.to_parquet(f"{target_dir}/rent-ideal.parquet")

!rm "{target_dir}/images_sample.zip"
!rm "{target_dir}/Kaggle-renthop.torrent"
!rm "{target_dir}/sample_submission.csv.zip"
!rm "{target_dir}/test.json.zip"
!rm "{target_dir}/train.json"
!rm "{target_dir}/train.json.zip"

Training a random forest model

Link to ML book at explained.ai (link)

rent = pd.read_parquet(f"{target_dir}/rent-ideal.parquet")
rent

Exploring and Denoising Your Data Set

Link to ML book at explained.ai (link)

df = pd.read_parquet(f"{target_dir}/rent.parquet")
df

Categorically Speaking

Link to ML book at explained.ai (link)

df = pd.read_parquet(f"{target_dir}/rent.parquet")
df