How to find a house in Amsterdam
¶Living in Utrecht, I am currently planing to move to Amsterdam. Afterall, I will be closer to work and some friends. But finding a house in Amsterdam is not an easy thing. Everyone who experienced it would tell you, real estate in Amsterdam is hell! High prices, small surfaces and much more demands than offers.
That doesn't sounds very good. I can already feel the hassle. So let's try to make my life a bit easier.
Let's see if some webscrapping can help us there!
- Scrap the pararius web site (the most popular in the NL) to retrieve Amsterdam Ads (addresses, surfaces, bedrooms, furnitures, dates, prices, links)
- Calculation of new feature 'price/area'
- Creation of Geodataframe with new feature 'geometry' from the addresses (shapely POINT)
- Geographic filter of specific area of interest in Amsterdam
- Calculation of new feature walking distances from adresses to work place and train station
- Some plots of routes and ads on maps
Scrapping Job¶
Let's first retrieve how many pages exists for our reasearch on Pararius (all house/flat in Amsterdam between 0-1500)
import requests
#Root search URL parsing with BeautifulSoup
url = 'https://www.pararius.com/apartments/amsterdam/0-1500'
response = requests.get(url)
We use BeautifulSoup to parse html page. The framework does a great job in creating a tree object from the html and provide handy function to access tags and values of interest.
from bs4 import BeautifulSoup
html_soup = BeautifulSoup(response.text, 'html.parser')
#Catch number of the last page associated to the search
last_page = html_soup.findAll(lambda tag: tag.name == 'a' and tag.get('class') == ['pagination__link'])[-1]
last_page = int(last_page.contents[0])
last_page
24
Let's scrap the first page. Note that we can later loop trhough all the page and repeat the process.
#Root page url and pages subdirectories
url = 'https://www.pararius.com/apartments/amsterdam/0-1500'#+page (for the other pages)
response = requests.get(url)
#BeautifulSoup Html parsing
html_soup = BeautifulSoup(response.text, 'html.parser')
#Individual ads are stored in the li tag, class search-list__item search-list__item--listing
#Here the line retrieves all containers (ads) of that type
add_containers = html_soup.find_all('li', class_ = 'search-list__item search-list__item--listing')
We will now loop through all the ads of this pages. Access their specific links (because more information are display on individuals ads' pages) and scrap the information that interest us.
luckily for us, all the ads features are store into HTML description list "dd" and "dt" tags (https://www.w3schools.com/tags/tag_dd.asp). We can just loop through all of them to form a dictionary.
from time import sleep
import numpy as np
import pandas as pd
from random import uniform
from shapely.geometry import Point
#Preparation of list for each feature
addresses = []
prices = []
dict_features = [] #will host all the features
links = []
geoms = []
#loop trough containers corresponding to the n ads within a page
for container in add_containers:
#The ad link
link = 'https://www.pararius.com/'+container.find('a',class_ = 'listing-search-item__link listing-search-item__link--title',href=True)['href']
links.append(link)
# Pausing your execution with sleep is essential as to not overload the website.
# We don't want to create any problems or been refused access.
sleep(uniform(0.2, 0.5))
# Access and parse the ad page html
response2 = requests.get(link)
html_soup2 = BeautifulSoup(response2.text, 'html.parser')
# The address
address1 = container.section.h2.a.contents[0]
address2 = html_soup2.findAll('div', class_ = "listing-detail-summary__location")[0].contents[0].replace('n','').replace(' ','')
addresses.append(address2)
# Price
price = float(html_soup2.findAll('meta', {'itemprop':'price'})[0]['content'])
prices.append(price)
# Loop through <dd> tags to get all the features
features = {i.contents[0]:i.findNext("dd").string for i in html_soup2.findAll('dt',class_ = 'listing-features__term')}
dict_features.append(features)
# Coordinates are stroed into a sapely Point object
geom = Point([float(html_soup2.findAll('div', class_ = "detail-map map")[0]['data-detail-map-longitude']),
float(html_soup2.findAll('div', class_ = "detail-map map")[0]['data-detail-map-latitude'])])
geoms.append(geom)
#Create Dataframe for features lists
ad_df = pd.DataFrame([addresses,prices,dict_features,links,geoms]).T
ad_df.columns = ['adress','Rental price 0','features','link','geometry']
# Create DataFrame from Series containing feature dictionaries
features_df = ad_df['features'].apply(pd.Series)
ad_df = pd.concat([ad_df.drop('features', axis=1), features_df],join='inner', axis=1)
Here is DataFrame containing all the ads for our 1st page. We can now repeat the process and all the pages.
ad_df.head(5)
adress | Rental price 0 | link | geometry | Rental price | Offered since | Status | Available | Service costs | Deposit | ... | Balcony | Heating | Hot water | Smoking allowed | Duration | Plot area | Rental agreement | Description | Insulation | Garden description | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1012 SJ Amsterdam (Burgwallen-Nieuwe Zijde) | 1450.0 | https://www.pararius.com//apartment-for-rent/a... | POINT (4.89123 52.3747) | €1,450 per month | 2 months | For rent | Immediately | €45 | €2,250 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 1073 XW Amsterdam (Oude Pijp) | 1395.0 | https://www.pararius.com//apartment-for-rent/a... | POINT (4.89424 52.35708) | €1,395 per month | 20-06-2021 | For rent | From 01-08-2021 | €45 | €2,000 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 1021 HV Amsterdam (IJplein/Vogelbuurt) | 1350.0 | https://www.pararius.com//apartment-for-rent/a... | POINT (4.91475 52.38352) | €1,350 per month | 02-07-2021 | For rent | Immediately | NaN | NaN | ... | Present | Central heating boiler | Central heating boiler | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 1032 JX Amsterdam (Volewijck) | 1300.0 | https://www.pararius.com//apartment-for-rent/a... | POINT (4.91364 52.39249) | €1,300 per month | 02-07-2021 | For rent | Immediately | NaN | NaN | ... | Not present | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 1032 JX Amsterdam (Volewijck) | 1400.0 | https://www.pararius.com//apartment-for-rent/a... | POINT (4.91364 52.39249) | €1,400 per month | 02-07-2021 | For rent | Immediately | NaN | NaN | ... | Present | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 42 columns
We can now repeat the process and all the pages. Not that I have created a funciton for the above process pararius_scrap_job for simplicity.
from time import time
#Root search URL parsing with BeautifulSoup
url = 'https://www.pararius.com/apartments/amsterdam/0-1500'
response = requests.get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
#Catch number of the last page associated to the search
last_page = html_soup.findAll(lambda tag: tag.name == 'a' and tag.get('class') == ['pagination__link'])[-1]
last_page = int(last_page.contents[0])
# Preparing the monitoring of the loop
start_time = time()
rqsts = 0
#Scrap the root page
all_ad_df, rqsts = pararius_scrap_job('',rqsts)
elapsed_time = time() - start_time
print('Page: /page-1; Request:{}; Frequency: {} requests/s'.format(rqsts, rqsts / elapsed_time))
#loop through page 2 to last_page, scrap them and append result to dataframe
for i in range(2,last_page+1):
# Pause the loop to mitigate requests on server
sleep(uniform(0.2, 0.5))
#scrapping job as described above.
tmp_df, rqsts = pararius_scrap_job('/page-'+str(i),rqsts)
# Monitor the requests
elapsed_time = time() - start_time
print('Page: /page-{}; Request:{}; Frequency: {} requests/s'.format(str(i),rqsts, rqsts / elapsed_time))
all_ad_df = pd.concat([all_ad_df,tmp_df], ignore_index=True)
Page: /page-1; Request:33; Frequency: 1.093296748569302 requests/s Page: /page-2; Request:66; Frequency: 1.0944938216726854 requests/s Page: /page-3; Request:99; Frequency: 1.0967526318850571 requests/s Page: /page-4; Request:132; Frequency: 1.1024825372119682 requests/s Page: /page-5; Request:165; Frequency: 1.1084387885922273 requests/s Page: /page-6; Request:198; Frequency: 1.1046734063649017 requests/s Page: /page-7; Request:231; Frequency: 1.1177566635732026 requests/s Page: /page-8; Request:264; Frequency: 1.120144167341093 requests/s Page: /page-9; Request:297; Frequency: 1.1173414456081974 requests/s Page: /page-10; Request:330; Frequency: 1.1114715058250495 requests/s Page: /page-11; Request:363; Frequency: 1.102194176711322 requests/s Page: /page-12; Request:396; Frequency: 1.1007611486045312 requests/s Page: /page-13; Request:429; Frequency: 1.0985490668297155 requests/s Page: /page-14; Request:462; Frequency: 1.0965715276685213 requests/s Page: /page-15; Request:495; Frequency: 1.0893074324790766 requests/s Page: /page-16; Request:528; Frequency: 1.0915380395069763 requests/s Page: /page-17; Request:561; Frequency: 1.0912513253495377 requests/s Page: /page-18; Request:594; Frequency: 1.0912298457010519 requests/s Page: /page-19; Request:627; Frequency: 1.0864240348033034 requests/s Page: /page-20; Request:660; Frequency: 1.0872801947986621 requests/s Page: /page-21; Request:693; Frequency: 1.086482765993822 requests/s Page: /page-22; Request:726; Frequency: 1.0872751154851505 requests/s Page: /page-23; Request:759; Frequency: 1.0867645378788682 requests/s Page: /page-24; Request:774; Frequency: 1.0869129034832758 requests/s
Let's create a GeoDataFrame from this table and save it as a GeoJSON
import geopandas as gpd
#store result in GeoJSON file
all_ad_gdf = gpd.GeoDataFrame(all_ad_df, geometry = 'geometry', crs="EPSG:4326")
all_ad_gdf.to_file("ad_Amsterdam.geojson", driver='GeoJSON')
Clean Data¶
I will pass on the data cleaning, the main tasks here being casting strings to int, float, date etc. This step is essential to perfrom analysis and filter later, especially for datetime format.
all_ad_gdf.head(10)
adress | link | Rental price | Offered since | Status | Available | Service costs | Deposit | Specifics | Interior | ... | Energy rating | Smoking allowed | Plot area | Description | Insulation | Garden description | Central heating boiler | Income requirement | Share a house | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1012 SJ Amsterdam (Burgwallen-Nieuwe Zijde) | https://www.pararius.com//apartment-for-rent/a... | 1450 | 2021-05-04 | For rent | 2021-07-04 | €45 | €2,250 | Monumental building | Furnished | ... | None | None | None | None | None | None | None | None | None | POINT (4.89123 52.37470) |
1 | 1013 LL Amsterdam (Haarlemmerbuurt) | https://www.pararius.com//apartment-for-rent/a... | 1200 | 2021-05-30 | For rent | 2021-07-04 | None | None | None | Furnished | ... | None | None | None | None | None | None | None | None | None | POINT (4.88763 52.38623) |
2 | 1021 HV Amsterdam (IJplein/Vogelbuurt) | https://www.pararius.com//apartment-for-rent/a... | 1350 | 2021-07-02 | For rent | 2021-07-04 | None | None | None | Upholstered or furnished | ... | B | None | None | None | None | None | None | None | None | POINT (4.91475 52.38352) |
3 | 1032 JX Amsterdam (Volewijck) | https://www.pararius.com//apartment-for-rent/a... | 1300 | 2021-07-02 | For rent | 2021-07-04 | None | None | None | Upholstered | ... | None | None | None | None | None | None | None | None | None | POINT (4.91364 52.39249) |
4 | 1032 JX Amsterdam (Volewijck) | https://www.pararius.com//apartment-for-rent/a... | 1400 | 2021-07-02 | For rent | 2021-07-04 | None | None | None | Furnished | ... | None | None | None | None | None | None | None | None | None | POINT (4.91364 52.39249) |
5 | 1017 HA Amsterdam (Grachtengordel-Zuid) | https://www.pararius.com//apartment-for-rent/a... | 1450 | 2021-07-02 | For rent | 2021-07-04 | None | None | None | Furnished | ... | F | None | None | None | None | None | None | None | None | POINT (4.89322 52.36267) |
6 | 1031 AG Amsterdam (Volewijck) | https://www.pararius.com//apartment-for-rent/a... | 1250 | 2021-07-02 | For rent | NaT | €25 | None | None | None | ... | A | None | None | None | None | None | None | None | None | POINT (4.91488 52.39061) |
7 | 1076 VA Amsterdam (Stadionbuurt) | https://www.pararius.com//apartment-for-rent/a... | 1425 | 2021-07-02 | For rent | 2021-07-04 | None | €2,850 | None | Furnished | ... | A | No | None | None | None | None | None | None | None | POINT (4.85971 52.34713) |
8 | 1058 LH Amsterdam (Hoofddorppleinbuurt) | https://www.pararius.com//apartment-for-rent/a... | 1500 | 2021-07-02 | For rent | 2021-07-16 | None | None | None | Upholstered | ... | B | None | None | None | None | None | None | None | None | POINT (4.84851 52.35340) |
9 | 1073 ER Amsterdam (Nieuwe Pijp) | https://www.pararius.com//apartment-for-rent/a... | 1400 | 2021-07-02 | For rent | 2021-07-09 | None | None | None | Upholstered | ... | None | None | 55 m² | None | None | None | None | None | None | POINT (4.90008 52.35296) |
10 rows × 44 columns
Geographic filter of specific area in Amsterdam¶
Because Amsterdam is rather big, we are going to filter the ads to a more specifgic area
all_ad_gdf.from_file("ad_Amsterdam.geojson")
adress | link | Rental price | Offered since | Status | Available | Service costs | Deposit | Specifics | Interior | ... | Energy rating | Smoking allowed | Plot area | Description | Insulation | Garden description | Central heating boiler | Income requirement | Share a house | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1012 SJ Amsterdam (Burgwallen-Nieuwe Zijde) | https://www.pararius.com//apartment-for-rent/a... | 1450 | 2021-05-04T00:00:00 | For rent | 2021-07-04T00:00:00 | €45 | €2,250 | Monumental building | Furnished | ... | None | None | None | None | None | None | None | None | None | POINT (4.89123 52.37470) |
1 | 1013 LL Amsterdam (Haarlemmerbuurt) | https://www.pararius.com//apartment-for-rent/a... | 1200 | 2021-05-30T00:00:00 | For rent | 2021-07-04T00:00:00 | None | None | None | Furnished | ... | None | None | None | None | None | None | None | None | None | POINT (4.88763 52.38623) |
2 | 1021 HV Amsterdam (IJplein/Vogelbuurt) | https://www.pararius.com//apartment-for-rent/a... | 1350 | 2021-07-02T00:00:00 | For rent | 2021-07-04T00:00:00 | None | None | None | Upholstered or furnished | ... | B | None | None | None | None | None | None | None | None | POINT (4.91475 52.38352) |
3 | 1032 JX Amsterdam (Volewijck) | https://www.pararius.com//apartment-for-rent/a... | 1300 | 2021-07-02T00:00:00 | For rent | 2021-07-04T00:00:00 | None | None | None | Upholstered | ... | None | None | None | None | None | None | None | None | None | POINT (4.91364 52.39249) |
4 | 1032 JX Amsterdam (Volewijck) | https://www.pararius.com//apartment-for-rent/a... | 1400 | 2021-07-02T00:00:00 | For rent | 2021-07-04T00:00:00 | None | None | None | Furnished | ... | None | None | None | None | None | None | None | None | None | POINT (4.91364 52.39249) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
745 | 1066 RC Amsterdam (Sloter-/Riekerpolder) | https://www.pararius.com//apartment-for-rent/a... | 1350 | 2021-04-04T00:00:00 | Under offer | 2021-07-04T00:00:00 | None | €2,700 | None | Furnished | ... | C | None | 75 m² | None | Fully insulated | None | None | None | None | POINT (4.81001 52.34535) |
746 | 1079 PE Amsterdam (Scheldebuurt) | https://www.pararius.com//apartment-for-rent/a... | 1260 | 2021-04-04T00:00:00 | Under offer | 2021-07-04T00:00:00 | €80 | €1,430 | None | None | ... | A | None | None | None | None | None | None | None | None | POINT (4.89385 52.33880) |
747 | 1052 KH Amsterdam (Frederik Hendrikbuurt) | https://www.pararius.com//apartment-for-rent/a... | 1495 | 2021-04-04T00:00:00 | Under offer | None | None | €3,000 | None | Furnished | ... | G | None | None | None | None | None | None | None | None | POINT (4.87365 52.37412) |
748 | 1017 VT Amsterdam (De Weteringschans) | https://www.pararius.com//apartment-for-rent/a... | 1400 | 2021-04-04T00:00:00 | Under offer | 2021-07-04T00:00:00 | None | None | None | Upholstered | ... | None | None | None | None | Partial double glazing | None | None | None | None | POINT (4.89911 52.36132) |
749 | 1076 CW Amsterdam (Stadionbuurt) | https://www.pararius.com//house-for-rent/amste... | 1400 | 2021-04-04T00:00:00 | Under offer | None | None | €3,300 | None | Upholstered | ... | None | None | None | None | Double glazing, Fully insulated | None | None | None | None | POINT (4.84801 52.34117) |
750 rows × 44 columns
Here is an handy website to draw polygon and get the WKT.
from IPython.display import IFrame
IFrame(src="https://arthur-e.github.io/Wicket/sandbox-gmaps3.html", width='100%', height='600px')
import shapely
#Define polygon to filter our ads
filter_polygon_wkt = 'POLYGON((4.9580351217396235 52.349463547269636,4.950825343907592 52.340759453326854,4.91941131192517 52.33048012411839,4.905335079015014 52.331319342591414,4.884049068272827 52.3417033535761,4.8701444967396235 52.34096921067909,4.85627425728536 52.35397222330489,4.85352767525411 52.35806108990049,4.853012691123251 52.36193992133192,4.874985347373251 52.373259968432215,4.879963527304891 52.38441043375573,4.893696437461141 52.383781777115416,4.924252162558798 52.37634199381422,4.937985072715048 52.37424605391656,4.9580351217396235 52.349463547269636))'
filter_polygon = shapely.wkt.loads(filter_polygon_wkt)
Apply geographic filter to ads data using the clip function of geopandas
#Create a simple geodataframe with the filter polygon for later sjoin use
filter_gdf = gpd.clip(all_ad_gdf,filter_polygon)
filter_gdf
adress | link | Rental price | Offered since | Status | Available | Service costs | Deposit | Interior | Living area | ... | Plot area | Insulation | Garden description | Rental agreement | Duration | Central heating boiler | Description | Income requirement | Share a house | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1073 XW Amsterdam (Oude Pijp) | https://www.pararius.com//apartment-for-rent/a... | 1395 | 2021-06-20 | For rent | 2021-08-01 | €45 | €2,000 | Furnished | 45 | ... | None | None | None | None | None | None | None | None | None | POINT (4.89424 52.35708) |
1 | 1012 SJ Amsterdam (Burgwallen-Nieuwe Zijde) | https://www.pararius.com//apartment-for-rent/a... | 1450 | 2021-05-02 | For rent | 2021-07-02 | €45 | €2,250 | Furnished | 70 | ... | None | None | None | None | None | None | None | None | None | POINT (4.89123 52.37470) |
2 | 1017 XB Amsterdam (De Weteringschans) | https://www.pararius.com//apartment-for-rent/a... | 1500 | 2021-07-02 | For rent | NaT | None | €2,700 | Upholstered or furnished | 45 | ... | None | None | None | None | None | None | None | None | None | POINT (4.89369 52.36006) |
3 | 1016 GX Amsterdam (Grachtengordel-West) | https://www.pararius.com//apartment-for-rent/a... | 1495 | 2021-07-02 | For rent | NaT | None | €2,990 | Upholstered | 65 | ... | None | None | None | None | None | None | None | None | None | POINT (4.88338 52.37313) |
5 | 1016 PZ Amsterdam (Jordaan) | https://www.pararius.com//apartment-for-rent/a... | 1200 | 2021-07-02 | For rent | 2021-08-01 | None | None | Furnished | 35 | ... | None | None | None | None | None | None | None | None | None | POINT (4.88108 52.37188) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
739 | 1016 LD Amsterdam (Jordaan) | https://www.pararius.com//apartment-for-rent/a... | 1350 | 2021-05-02 | Under offer | 2021-07-02 | None | None | Upholstered or furnished | 31 | ... | None | None | None | None | None | None | None | None | None | POINT (4.88125 52.37420) |
741 | 1016 LP Amsterdam (Jordaan) | https://www.pararius.com//apartment-for-rent/a... | 1500 | 2021-05-02 | Under offer | 2021-07-02 | €25 | €3,000 | Upholstered | 55 | ... | 55 m² | Partial double glazing | None | None | None | None | None | None | None | POINT (4.88219 52.37361) |
742 | 1079 PE Amsterdam (Scheldebuurt) | https://www.pararius.com//apartment-for-rent/a... | 1260 | 2021-04-02 | Under offer | 2021-07-02 | €80 | €1,430 | None | 55 | ... | None | None | None | None | None | None | None | None | None | POINT (4.89385 52.33880) |
743 | 1011 NB Amsterdam (Nieuwmarkt/Lastage) | https://www.pararius.com//apartment-for-rent/a... | 1350 | 2021-04-02 | Under option | 2021-07-02 | None | None | None | 46 | ... | None | None | None | None | None | None | Inpandig | None | None | POINT (4.90602 52.36946) |
745 | 1017 VT Amsterdam (De Weteringschans) | https://www.pararius.com//apartment-for-rent/a... | 1400 | 2021-04-02 | Under offer | 2021-07-02 | None | None | Upholstered | 35 | ... | None | Partial double glazing | None | None | None | None | None | None | None | POINT (4.89911 52.36132) |
328 rows × 44 columns
Compute price per m2¶
The price per m2 is a good KPI that I want to add to our data.
filter_gdf['price/m2'] = filter_gdf['Rental price']/filter_gdf['Living area']
filter_gdf.sort_values('price/m2',inplace=True)
Compute walking distances from adresses to work place and train station¶
Calculate the walking distance of the accomodation to the work place and train station using networkx multidigraph, OSMnx package and KDTree.
Function route_distance takes 2 shapely POINT and a networkx multidigraph, compute the shortest route between them and returns the length of this route
import osmnx as ox
import networkx as nx
def route_distance(point,point2,graph):
#Retrieve the nearest node to the point of interest from the nodes geodataframe
source = ox.distance.nearest_nodes(G, point.x,point.y)
target = ox.distance.nearest_nodes(G, point2.x,point2.y)
#Compute the shortest path between the points of interest
try:
route_length = nx.shortest_path_length(graph, source=source, target=target,method='dijkstra',weight='length')
return route_length
except nx.NetworkXNoPath:
return None
Set fixed parameters for the route_distance function
#Create multidigraph from the area of interest using OSMnx
G = ox.graph_from_polygon(filter_polygon,network_type='all_private')
#Create shapely points of the work place and train station coordinates
work = Point(4.913039, 52.342523)
station = Point(4.918278, 52.346691)
Create new features 'work_distance' and 'station_distance' which are walking distances of accomodation to work place and to train station.
filter_gdf['work_distance'] = filter_gdf['geometry'].apply(lambda x: route_distance(x,work,G))
filter_gdf['station_distance'] = filter_gdf['geometry'].apply(lambda x: route_distance(x,station,G))
filter_gdf
adress | link | Rental price | Offered since | Status | Available | Service costs | Deposit | Interior | Living area | ... | Rental agreement | Duration | Central heating boiler | Description | Income requirement | Share a house | geometry | price/m2 | work_distance | station_distance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
73 | 1011 TM Amsterdam (Nieuwmarkt/Lastage) | https://www.pararius.com//room-for-rent/amster... | 713 | 2021-06-29 | For rent | 2021-07-02 | None | €713 | Upholstered | 107 | ... | Unlimited period | None | None | None | None | None | POINT (4.90989 52.37130) | 6.663551 | 3836.563 | 3271.782 |
174 | 1019 AB Amsterdam (Oostelijk Havengebied) | https://www.pararius.com//room-for-rent/amster... | 670 | 2021-06-22 | For rent | 2021-07-12 | None | None | Upholstered | 70 | ... | Unlimited period | Minimum of 12 months | None | None | None | None | POINT (4.93919 52.36710) | 9.571429 | 4052.944 | 3092.726 |
363 | 1016 TL Amsterdam (Jordaan) | https://www.pararius.com//apartment-for-rent/a... | 968 | 2021-06-04 | Rented under option | 2021-07-02 | €8 | €1,966 | Upholstered | 71 | ... | None | None | None | None | None | None | POINT (4.87609 52.37125) | 13.633803 | 4794.499 | 4757.363 |
303 | 1019 HM Amsterdam (Oostelijk Havengebied) | https://www.pararius.com//apartment-for-rent/a... | 1465 | 2021-06-11 | For rent | 2021-07-02 | €110 | None | None | 100 | ... | None | None | None | Parkeerplaats | None | None | POINT (4.92602 52.37601) | 14.650000 | 5059.115 | 4280.861 |
301 | 1079 RN Amsterdam (Scheldebuurt) | https://www.pararius.com//apartment-for-rent/a... | 1415 | 2021-06-11 | For rent | 2021-07-02 | €54 | None | None | 96 | ... | None | None | None | Parkeerkelder, parkeerplaats | None | None | POINT (4.90088 52.33892) | 14.739583 | 1059.617 | 2074.582 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
130 | 1018 CN Amsterdam (Weesperbuurt/Plantage) | https://www.pararius.com//apartment-for-rent/a... | 1150 | 2021-06-25 | For rent | NaT | None | None | Upholstered or furnished | 20 | ... | None | None | None | None | None | None | POINT (4.91275 52.36811) | 57.500000 | 3765.331 | 3200.550 |
658 | 1017 EG Amsterdam (Grachtengordel-Zuid) | https://www.pararius.com//apartment-for-rent/a... | 1495 | 2021-04-02 | For rent | 2021-07-02 | None | None | Upholstered | 25 | ... | None | None | None | None | None | None | POINT (4.88473 52.36655) | 59.800000 | 3901.483 | 3864.347 |
735 | 1071 AZ Amsterdam (Museumkwartier) | https://www.pararius.com//apartment-for-rent/a... | 1350 | 2021-05-02 | Under offer | 2021-07-02 | None | €2,700 | Upholstered or furnished | 20 | ... | None | None | None | None | None | None | POINT (4.87806 52.35844) | 67.500000 | 3801.044 | 3878.138 |
682 | 1054 GA Amsterdam (Vondelbuurt) | https://www.pararius.com//apartment-for-rent/a... | 1431 | 2021-04-02 | For rent | NaT | None | None | Furnished | 20 | ... | None | None | None | None | None | None | POINT (4.87754 52.36090) | 71.550000 | 4003.594 | 3981.497 |
684 | 1017 SP Amsterdam (De Weteringschans) | https://www.pararius.com//room-for-rent/amster... | 899 | 2021-04-02 | For rent | 2021-07-02 | None | None | Furnished | 8 | ... | Temporary rental | Minimum of 3, maximum of 24 months | None | None | None | Suitable to share | POINT (4.88860 52.36115) | 112.375000 | 3611.508 | 3574.372 |
328 rows × 47 columns
Note on that part: OSMnx is a wonderful library that combine networkx library with OpenStreetMap libraries such as geopy, overpassAPI etc. you can find this exellent project here: https://github.com/gboeing/osmnx
In this project, I use it to compute routes distances but other (maybe better) tools exists to do so. The best I had the chance to use is the Open Source Routing Machine (http://project-osrm.org/) which takes into supports car, bicycle, walk modes and takes into account more constraint. Written in C++, the tool is very performant. However, it is more complex to set up and requires the use of multiple docker containers.
Scoring¶
Sorting ads by preferences using MinMaxScaler and weighted average
from sklearn.preprocessing import MinMaxScaler
#Selecting the features for scoring
columns_to_scale = ['Rental price','price/m2','work_distance','station_distance']
#Applying a MixMaxScaler so all the features are scaled between 0 and 1
feature_scaled = MinMaxScaler().fit_transform(np.array(filter_gdf[columns_to_scale]))
#Setting the 'tradeoff' feature as a weighted average of our features. You can play with
#the feature importance by changing the weights.
filter_gdf['tradeoff'] = np.average(feature_scaled,axis=1, weights=[2,1,1,2])
#Let's sort the values by best to worst according to the tradeoff
filter_gdf.sort_values(by='tradeoff',inplace=True)
filter_gdf
adress | link | Rental price | Offered since | Status | Available | Service costs | Deposit | Interior | Living area | ... | Duration | Central heating boiler | Description | Income requirement | Share a house | geometry | price/m2 | work_distance | station_distance | tradeoff | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
708 | 1096 AH Amsterdam (Omval/Overamstel) | https://www.pararius.com//apartment-for-rent/a... | 340 | 2021-04-02 | For rent | 2021-07-02 | None | None | Furnished | 19 | ... | None | None | None | None | None | POINT (4.92909 52.33777) | 17.894737 | 2536.450 | 1707.878 | 0.164633 |
503 | 1096 AH Amsterdam (Omval/Overamstel) | https://www.pararius.com//apartment-for-rent/a... | 460 | 2021-05-02 | For rent | 2021-07-02 | None | None | Furnished | 25 | ... | None | None | None | None | None | POINT (4.92909 52.33777) | 18.400000 | 2536.450 | 1707.878 | 0.199913 |
422 | 1078 TT Amsterdam (IJselbuurt) | https://www.pararius.com//room-for-rent/amster... | 750 | 2021-05-21 | For rent | 2021-07-02 | None | None | Furnished | 14 | ... | Minimum of 6 months | None | None | None | None | POINT (4.90880 52.34862) | 53.571429 | 980.449 | 948.449 | 0.251561 |
133 | 1096 GJ Amsterdam (Omval/Overamstel) | https://www.pararius.com//apartment-for-rent/a... | 1050 | 2021-06-25 | For rent | 2021-07-02 | None | €2,100 | Upholstered | 28 | ... | None | None | None | None | None | POINT (4.92063 52.33893) | 37.500000 | 1611.074 | 993.695 | 0.333156 |
86 | 1097 HS Amsterdam (Frankendael) | https://www.pararius.com//apartment-for-rent/a... | 1405 | 2021-06-29 | For rent | 2021-07-02 | €42 | None | None | 91 | ... | None | None | Inpandig | None | None | POINT (4.92139 52.34733) | 15.439560 | 1401.621 | 199.740 | 0.350221 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
284 | 1015 VH Amsterdam (Jordaan) | https://www.pararius.com//apartment-for-rent/a... | 1500 | 2021-06-11 | For rent | NaT | None | None | None | 70 | ... | None | None | None | None | None | POINT (4.88117 52.38140) | 21.428571 | 5958.417 | 5802.417 | 0.829884 |
527 | 1053 GJ Amsterdam (Van Lennepbuurt) | https://www.pararius.com//apartment-for-rent/a... | 1500 | 2021-05-02 | For rent | 2021-07-02 | €10 | €1,500 | Upholstered | 40 | ... | None | None | None | None | None | POINT (4.86068 52.36454) | 37.500000 | 5553.286 | 5573.808 | 0.830819 |
463 | 1015 LB Amsterdam (Jordaan) | https://www.pararius.com//apartment-for-rent/a... | 1500 | 2021-05-14 | For rent | 2021-07-02 | None | None | Furnished | 50 | ... | None | None | None | None | None | POINT (4.88386 52.37897) | 30.000000 | 5791.598 | 5734.961 | 0.834832 |
368 | 1015 NT Amsterdam (Jordaan) | https://www.pararius.com//apartment-for-rent/a... | 1500 | 2021-06-04 | For rent | 2021-07-02 | None | €3,200 | Upholstered | 43 | ... | None | None | None | None | None | POINT (4.88138 52.37776) | 34.883721 | 5746.155 | 5699.148 | 0.839229 |
525 | 1058 XB Amsterdam (Westindische Buurt) | https://www.pararius.com//apartment-for-rent/a... | 1500 | 2021-05-02 | Rented under option | 2021-07-02 | None | None | Upholstered or furnished | 51 | ... | Minimum of 6, maximum of 24 months | None | None | None | None | POINT (4.85567 52.36140) | 29.411765 | 5962.812 | 6039.906 | 0.855794 |
328 rows × 48 columns
#print top 30 ads links in order of best trade-off work to train station walking distance
filter_gdf.sort_values(by='tradeoff')['link'].values[:30]
array(['https://www.pararius.com//apartment-for-rent/amsterdam/e19c8118/duivendrechtsekade', 'https://www.pararius.com//apartment-for-rent/amsterdam/2d9100e7/duivendrechtsekade', 'https://www.pararius.com//room-for-rent/amsterdam/58ad4c30/holendrechtstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/8bf08c86/welnastraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/d4f0f5d4/maliebaan', 'https://www.pararius.com//apartment-for-rent/amsterdam/3da19a99/welnastraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/5feb746d/welnastraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/dea85afc/bart-de-ligtstraat', 'https://www.pararius.com//room-for-rent/amsterdam/873d5bfb/zeeburgerpad', 'https://www.pararius.com//apartment-for-rent/amsterdam/0a443600/clara-meyer-wichmannstraat', 'https://www.pararius.com//room-for-rent/amsterdam/24a643e9/foeliestraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/d04ab4bb/peelstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/c2f7a2f4/peelstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/ae259cb0/clara-meyer-wichmannstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/9da5fcae/hofmeyrstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/3565a788/kromme-mijdrechtstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/4f716bac/uithoornstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/be5c98ea/waverstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/4bace1a3/holendrechtstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/4cc54a17/kromme-mijdrechtstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/17f3385a/vrijheidslaan', 'https://www.pararius.com//apartment-for-rent/amsterdam/5beba351/hofmeyrstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/fdb80687/boterdiepstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/1430f46e/holendrechtstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/43117190/berkelstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/7237002c/reitzstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/044e7c65/welnastraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/586ce50c/uiterwaardenstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/64cc5a60/eemsstraat', 'https://www.pararius.com//apartment-for-rent/amsterdam/b9e388a2/reggestraat'], dtype=object)
Some fancy plot¶
from shapely.geometry import Polygon, MultiPoint
import pyproj
import folium
import seaborn as sns
centre = MultiPoint(filter_gdf['geometry'].values).centroid
colors = sns.color_palette("flare",n_colors=len(filter_gdf)).as_hex()
filter_gdf.reset_index(drop=True,inplace=True)
my_map = folium.Map(location=[centre.y, centre.x],width='100%', height='100%', tiles="cartodbpositron",zoom_start=15)
for index, row in filter_gdf.iterrows():
pt = row['geometry']
info = "<a href='{}' target='_blank'>link</a>".format(row['link'])
folium.CircleMarker(location=[pt.y, pt.x],
color=colors[index],
fill=True,
radius=5,
popup=info).add_to(my_map)
display(my_map)
Conclusion¶
Done! I eased my houses research. Will it help me getting the best offers? Mmmmh, not quite sure, but it was a lot of fun to build.
I think this could benefit from more KPIs and setting weights to features in more sophisticated way.
Another addition could be to put up a service that would email me the newest ads that fits my criteria.
Overall, this project showcases that scrapping the internet is a valuable way to gather data. This is especially true for social media such as twitter. However, the practice is still controvertial as it can be used by bots for some nasty business and weight heavily on website infrastructures. So please, use it reasonably ;-)