A few years back I had to deliver this task for a European project (the THERMOSS project). THERMOSS was a H2020 project that aimed at facilitating the adoption of District Heating and Cooling
In this project, the first stage was to create geoclusters for district heating and cooling catalogue. Those geoclusters would have similar characteristics
At the time, I did not know python so much and used MATLAB for such task. If the result were satisfaying at the time, I now want to reproduce it using python and some well-known GIS packages.
So lets get to it...
1. the data¶
The table below gives the list of features that have been used to create the geoclusters. Those are believed to be the most influential parameters for heating and cooling loads.
Note: Environmental conditions taken from NetCDF
Parameter | Source |
---|---|
Environmental conditions | |
Cold season mean temperature | |
Cold season mean temperature | |
Building typology | |
Residential building energy use | |
Non-residential building energy use | |
Single family unit U-value | http://webtool.building-typology.eu/#bm |
Multiple family unit U-value | http://webtool.building-typology.eu/#bm |
Single family unit floor area | http://webtool.building-typology.eu/#bm |
Multiple family unit floor area | http://webtool.building-typology.eu/#bm |
Building stock | |
Residential building stock | |
Single family stock | |
Economic considerations | |
Energy use | |
Gas prices for domestic consumers | |
Gas prices for industrial consumers | |
Electricity prices for domestic consumers | |
Electricity prices for industrial consumers | |
Share of renewable energy in gross final energy consumption |
For the building typology and stock, some manual work was needed in excel to produce a neat table with only the required information.
Thanks, eurostat provided some nice tables, ready to use. See the example below for renewable energy share.
import requests
import pandas as pd
import io
url = "https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/T2020_RD330/?format=CSV"
urlData = requests.get(url).content
renew_df = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
renew_df.iloc[:,0] = renew_df.iloc[:,0].apply(lambda x: x.split(';')[-1])
renew_df.rename(columns={'freq;nrg_bal;unit;geoTIME_PERIOD': 'CODE'},inplace=True)
renew_df
CODE | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | AL | 29.621 | 31.367 | 32.070 | 32.657 | 32.448 | 31.437 | 31.867 | 31.187 | 35.152 | 33.167 | 31.856 | 34.896 | 36.939 | 35.898 | 36.844 | 36.667 |
1 | AT | 22.554 | 24.355 | 26.277 | 28.145 | 28.790 | 31.041 | 31.207 | 31.553 | 32.736 | 32.666 | 33.553 | 33.502 | 33.374 | 33.141 | 33.806 | 33.626 |
2 | BA | 20.274 | 19.752 | 19.213 | 18.701 | 16.708 | 18.471 | 18.710 | 17.995 | 18.014 | 19.307 | 24.873 | 26.607 | 25.358 | 23.241 | 35.972 | 37.578 |
3 | BE | 1.890 | 2.332 | 2.633 | 3.101 | 3.590 | 4.715 | 6.002 | 6.275 | 7.089 | 7.650 | 8.043 | 8.026 | 8.752 | 9.113 | 9.478 | 9.924 |
4 | BG | 9.231 | 9.173 | 9.415 | 9.098 | 10.345 | 12.005 | 13.928 | 14.152 | 15.837 | 18.898 | 18.050 | 18.261 | 18.760 | 18.701 | 20.592 | 21.564 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
34 | SE | 38.677 | 40.265 | 42.040 | 43.551 | 44.288 | 47.476 | 46.595 | 48.135 | 50.027 | 50.792 | 51.817 | 52.947 | 53.328 | 54.157 | 54.651 | 56.391 |
35 | SI | 18.397 | 19.809 | 18.416 | 19.674 | 18.645 | 20.764 | 21.080 | 20.936 | 21.549 | 23.161 | 22.461 | 22.880 | 21.977 | 21.658 | 21.378 | 21.974 |
36 | SK | 6.391 | 6.360 | 6.584 | 7.766 | 7.723 | 9.368 | 9.099 | 10.348 | 10.453 | 10.133 | 11.713 | 12.883 | 12.029 | 11.465 | 11.896 | 16.894 |
37 | UK | 1.096 | 1.281 | 1.488 | 1.735 | 2.814 | 3.448 | 3.862 | 4.392 | 4.461 | 5.524 | 6.737 | 8.385 | 9.032 | 9.858 | 11.138 | 12.336 |
38 | XK | 20.541 | 19.773 | 19.512 | 18.812 | 18.429 | 18.230 | 18.230 | 17.598 | 18.625 | 18.823 | 19.544 | 18.484 | 24.472 | 23.082 | 24.616 | 25.686 |
39 rows × 17 columns
In addition to that, we will also be using EU countries geoJSON file found here.
With all the country codes set to iso alpha 2
import geopandas as gpd
all_data = pd.concat([renew_df,energy_df,gas_df,elc_df,typo_df,stock_dfeu_df[['geometry']]],join='inner',axis=1).replace(': ',np.nan) #eurostat set nan values as ':'
all_data_gdf = gpd.GeoDataFrame(all_data)
renew share in % | energy cons | residential gas price in euro | indus gas price in euro | residential elec price in euro | indus elec price in euro | single family U value | multiple family U value | single family area | mutiple family area | Residential share in % | Non-residential share in % | single family share in % | mutiple family share in % | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AT | 33.626 | 6695.287 | 18.3400 | 7.3744 | 0.1950 | 0.0621 | 0.978513 | 0.882793 | 145 | 418.0 | 61.6 | 38.4 | 47.5 | 52.5 | POLYGON ((1687803.014 6264215.777, 1696293.843... |
BE | 9.924 | 7899.465 | 15.3755 | 6.2508 | 0.2857 | 0.0816 | 1.019217 | 1.121330 | 220 | 1613.0 | 67.5 | 32.5 | 72.9 | 27.1 | POLYGON ((536053.133 6697902.835, 536858.496 6... |
BG | 21.564 | 2159.861 | 12.4631 | 8.2636 | 0.0955 | 0.0753 | 1.182875 | 0.989048 | 172 | 495.0 | 72.1 | 27.9 | 54.9 | 45.1 | POLYGON ((2551393.950 5439823.819, 2575025.606... |
CZ | 16.244 | 7006.728 | 16.2682 | 7.7802 | 0.1438 | 0.0677 | 0.813539 | 0.967243 | 112 | 681.0 | 64.9 | 35.1 | 44.3 | 55.7 | POLYGON ((1602756.662 6623613.894, 1605874.568... |
DE | 17.354 | 57743.118 | 17.5503 | 7.7126 | 0.3048 | 0.0761 | 0.722681 | 0.861979 | 173 | 1239.0 | 68.4 | 31.6 | 45.2 | 54.8 | MULTIPOLYGON (((750538.061 7090702.816, 751353... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
IT | 18.181 | 31612.006 | 21.3645 | 8.2732 | 0.2132 | 0.0829 | 1.295267 | 1.109430 | 154 | 884.0 | 89.0 | 11.0 | 26.0 | 74.0 | MULTIPOLYGON (((1404993.029 4230990.607, 14038... |
NL | 8.768 | 9307.869 | 25.5701 | 6.1856 | 0.1562 | 0.0607 | 1.075242 | 1.544523 | 163 | 2618.0 | 60.7 | 39.3 | 70.8 | 29.2 | MULTIPOLYGON (((-7596140.830 1348758.669, -759... |
PL | 12.164 | 18196.434 | 13.1310 | 9.3669 | 0.1446 | 0.0673 | 0.699272 | 0.729197 | 136 | 2186.0 | 67.0 | 33.0 | 33.1 | 66.9 | POLYGON ((2115307.046 7235751.798, 2157060.914... |
SE | 56.391 | 7363.914 | 32.8557 | 8.7466 | 0.1936 | 0.0643 | 0.255006 | NaN | 121 | 1207.0 | 66.6 | 33.4 | 44.1 | 55.9 | MULTIPOLYGON (((1744966.813 7582362.064, 17458... |
SI | 21.974 | 1057.443 | 15.8903 | 8.5024 | 0.1609 | 0.0619 | 0.788727 | 1.110169 | 201 | 1362.0 | 81.7 | 18.3 | 60.7 | 39.3 | POLYGON ((1819341.831 5895544.825, 1820883.526... |
15 rows × 15 columns
2. Handling NetCDF¶
For NetCDF it is a bit more technical. Indeed, NetCdf are multidimensional arrays that allows us to store both temporal and spatial dimension of certain phenomenon
It can also been seen as temporaly related raster layers. For that reason, we are ogoin to use python libraries such as xarray
, rasterio
and rioxarray
.
First let's use xarray to load our data.
import xarray as xr
import rioxarray
import rasterio
with xr.open_dataset('tg_0.50deg_reg_1995-2016_v14.0.nc') as file:
dsCDF = file.rio.write_crs("epsg:4326", inplace=True)
dsCDF
<xarray.Dataset> Dimensions: (latitude: 101, longitude: 232, time: 7914) Coordinates: * longitude (longitude) float32 -40.25 -39.75 -39.25 ... 74.25 74.75 75.25 * latitude (latitude) float32 25.25 25.75 26.25 ... 74.25 74.75 75.25 * time (time) datetime64[ns] 1995-01-01 1995-01-02 ... 2016-08-31 spatial_ref int32 0 Data variables: tg (time, latitude, longitude) float32 ... Attributes: Ensembles_ECAD: 14.0 Conventions: CF-1.4 References: http://www.ecad.eunhttp://www.ecad.eu/download/ensemble... history: Mon Oct 17 11:28:47 2016: ncks -a -d time,16436,24349 tg... NCO: "4.5.3" grid_mapping: spatial_ref
- latitude: 101
- longitude: 232
- time: 7914
- longitudefloat32-40.25 -39.75 ... 74.75 75.25
- long_name :
- Longitude values
- units :
- degrees_east
- standard_name :
- longitude
array([-40.25, -39.75, -39.25, ..., 74.25, 74.75, 75.25], dtype=float32)
- latitudefloat3225.25 25.75 26.25 ... 74.75 75.25
- long_name :
- Latitude values
- units :
- degrees_north
- standard_name :
- latitude
array([25.25, 25.75, 26.25, 26.75, 27.25, 27.75, 28.25, 28.75, 29.25, 29.75, 30.25, 30.75, 31.25, 31.75, 32.25, 32.75, 33.25, 33.75, 34.25, 34.75, 35.25, 35.75, 36.25, 36.75, 37.25, 37.75, 38.25, 38.75, 39.25, 39.75, 40.25, 40.75, 41.25, 41.75, 42.25, 42.75, 43.25, 43.75, 44.25, 44.75, 45.25, 45.75, 46.25, 46.75, 47.25, 47.75, 48.25, 48.75, 49.25, 49.75, 50.25, 50.75, 51.25, 51.75, 52.25, 52.75, 53.25, 53.75, 54.25, 54.75, 55.25, 55.75, 56.25, 56.75, 57.25, 57.75, 58.25, 58.75, 59.25, 59.75, 60.25, 60.75, 61.25, 61.75, 62.25, 62.75, 63.25, 63.75, 64.25, 64.75, 65.25, 65.75, 66.25, 66.75, 67.25, 67.75, 68.25, 68.75, 69.25, 69.75, 70.25, 70.75, 71.25, 71.75, 72.25, 72.75, 73.25, 73.75, 74.25, 74.75, 75.25], dtype=float32)
- timedatetime641995-01-01 ... 2016-08-31
- long_name :
- Time in days
- standard_name :
- time
array(['1995-01-01T00:00:00.000000000', '1995-01-02T00:00:00.000000000', '1995-01-03T00:00:00.000000000', ..., '2016-08-29T00:00:00.000000000', '2016-08-30T00:00:00.000000000', '2016-08-31T00:00:00.000000000'], dtype='datetime64[ns]')
- spatial_refint320
- crs_wkt :
- GEOGCS
- semi_major_axis :
- 6378137.0
- semi_minor_axis :
- 6356752.314245179
- inverse_flattening :
- 298.257223563
- reference_ellipsoid_name :
- WGS 84
- longitude_of_prime_meridian :
- 0.0
- prime_meridian_name :
- Greenwich
- geographic_crs_name :
- WGS 84
- grid_mapping_name :
- latitude_longitude
- spatial_ref :
- GEOGCS
array(0)
- tgfloat32...
- long_name :
- mean temperature
- units :
- Celsius
- standard_name :
- air_temperature
- grid_mapping :
- spatial_ref
[185440848 values with dtype=float32]
- Ensembles_ECAD :
- 14.0
- Conventions :
- CF-1.4
- References :
- http://www.ecad.eunhttp://www.ecad.eu/download/ensembles/ensembles.phpnhttp://www.ecad.eu/download/ensembles/Haylock_et_al_2008.pdf
- history :
- Mon Oct 17 11:28:47 2016: ncks -a -d time,16436,24349 tg_0.50deg_regular_1.nc tg_0.50deg_reg_1995-2016_v14.0.nc Mon Oct 17 11:28:39 2016: ncks -a --mk_rec_dmn time tg_0.50deg_regular.nc tg_0.50deg_regular_1.nc
- NCO :
- "4.5.3"
- grid_mapping :
- spatial_ref
From there, we will create a typical year by grouping data by days and running a median, thus for each long/lat cell.
#Create typical year from the 20 years'set with daily data point (grid X 366 days)
typical_year = dsCDF.groupby('time.dayofyear').median('time')
We are now going to slip our typical year into 2 seasons, namely cold and hot seasons so that for each cell we have the median temperature over those seasons.
#Split typical year into seasonal bins (grid X 2 seasons)
day_bins = [0,106,289,366] #Limits of the 2 seasons (day number over the year)
typical_Seasonal_year = typical_year.groupby_bins('dayofyear',day_bins,labels=['cold1','hot','cold2']).median('dayofyear') #Split the year into 3 bins
Cold = typical_Seasonal_year.sel(dayofyear_bins = ['cold2','cold1']).mean('dayofyear_bins') #Average of the 2 "Cold" season bins
Hot = typical_Seasonal_year.sel(dayofyear_bins = ['hot'])
<xarray.Dataset> Dimensions: (latitude: 101, longitude: 232) Coordinates: * longitude (longitude) float32 -40.25 -39.75 -39.25 ... 74.25 74.75 75.25 * latitude (latitude) float32 25.25 25.75 26.25 ... 74.25 74.75 75.25 spatial_ref int32 0 Data variables: tg (latitude, longitude) float32 nan nan nan nan ... nan nan nan
- latitude: 101
- longitude: 232
- longitudefloat32-40.25 -39.75 ... 74.75 75.25
- long_name :
- Longitude values
- units :
- degrees_east
- standard_name :
- longitude
array([-40.25, -39.75, -39.25, ..., 74.25, 74.75, 75.25], dtype=float32)
- latitudefloat3225.25 25.75 26.25 ... 74.75 75.25
- long_name :
- Latitude values
- units :
- degrees_north
- standard_name :
- latitude
array([25.25, 25.75, 26.25, 26.75, 27.25, 27.75, 28.25, 28.75, 29.25, 29.75, 30.25, 30.75, 31.25, 31.75, 32.25, 32.75, 33.25, 33.75, 34.25, 34.75, 35.25, 35.75, 36.25, 36.75, 37.25, 37.75, 38.25, 38.75, 39.25, 39.75, 40.25, 40.75, 41.25, 41.75, 42.25, 42.75, 43.25, 43.75, 44.25, 44.75, 45.25, 45.75, 46.25, 46.75, 47.25, 47.75, 48.25, 48.75, 49.25, 49.75, 50.25, 50.75, 51.25, 51.75, 52.25, 52.75, 53.25, 53.75, 54.25, 54.75, 55.25, 55.75, 56.25, 56.75, 57.25, 57.75, 58.25, 58.75, 59.25, 59.75, 60.25, 60.75, 61.25, 61.75, 62.25, 62.75, 63.25, 63.75, 64.25, 64.75, 65.25, 65.75, 66.25, 66.75, 67.25, 67.75, 68.25, 68.75, 69.25, 69.75, 70.25, 70.75, 71.25, 71.75, 72.25, 72.75, 73.25, 73.75, 74.25, 74.75, 75.25], dtype=float32)
- spatial_refint320
- crs_wkt :
- GEOGCS
- semi_major_axis :
- 6378137.0
- semi_minor_axis :
- 6356752.314245179
- inverse_flattening :
- 298.257223563
- reference_ellipsoid_name :
- WGS 84
- longitude_of_prime_meridian :
- 0.0
- prime_meridian_name :
- Greenwich
- geographic_crs_name :
- WGS 84
- grid_mapping_name :
- latitude_longitude
- spatial_ref :
- GEOGCS
array(0)
- tgfloat32nan nan nan nan ... nan nan nan nan
array([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
<xarray.Dataset> Dimensions: (latitude: 101, longitude: 232) Coordinates: * longitude (longitude) float32 -40.25 -39.75 -39.25 ... 74.25 74.75 75.25 * latitude (latitude) float32 25.25 25.75 26.25 ... 74.25 74.75 75.25 spatial_ref int32 0 Data variables: tg (latitude, longitude) float32 nan nan nan nan ... nan nan nan
- latitude: 101
- longitude: 232
- longitudefloat32-40.25 -39.75 ... 74.75 75.25
- long_name :
- Longitude values
- units :
- degrees_east
- standard_name :
- longitude
array([-40.25, -39.75, -39.25, ..., 74.25, 74.75, 75.25], dtype=float32)
- latitudefloat3225.25 25.75 26.25 ... 74.75 75.25
- long_name :
- Latitude values
- units :
- degrees_north
- standard_name :
- latitude
array([25.25, 25.75, 26.25, 26.75, 27.25, 27.75, 28.25, 28.75, 29.25, 29.75, 30.25, 30.75, 31.25, 31.75, 32.25, 32.75, 33.25, 33.75, 34.25, 34.75, 35.25, 35.75, 36.25, 36.75, 37.25, 37.75, 38.25, 38.75, 39.25, 39.75, 40.25, 40.75, 41.25, 41.75, 42.25, 42.75, 43.25, 43.75, 44.25, 44.75, 45.25, 45.75, 46.25, 46.75, 47.25, 47.75, 48.25, 48.75, 49.25, 49.75, 50.25, 50.75, 51.25, 51.75, 52.25, 52.75, 53.25, 53.75, 54.25, 54.75, 55.25, 55.75, 56.25, 56.75, 57.25, 57.75, 58.25, 58.75, 59.25, 59.75, 60.25, 60.75, 61.25, 61.75, 62.25, 62.75, 63.25, 63.75, 64.25, 64.75, 65.25, 65.75, 66.25, 66.75, 67.25, 67.75, 68.25, 68.75, 69.25, 69.75, 70.25, 70.75, 71.25, 71.75, 72.25, 72.75, 73.25, 73.75, 74.25, 74.75, 75.25], dtype=float32)
- spatial_refint320
- crs_wkt :
- GEOGCS
- semi_major_axis :
- 6378137.0
- semi_minor_axis :
- 6356752.314245179
- inverse_flattening :
- 298.257223563
- reference_ellipsoid_name :
- WGS 84
- longitude_of_prime_meridian :
- 0.0
- prime_meridian_name :
- Greenwich
- geographic_crs_name :
- WGS 84
- grid_mapping_name :
- latitude_longitude
- spatial_ref :
- GEOGCS
array(0)
- tgfloat32nan nan nan nan ... nan nan nan nan
array([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Also, the area covered by the NetCDF is greater than we actually need. Let's clip the Hot and Cold raster to the area of interest
#Note that this could have been done before defining Hot and Cold ¯_ツ_/¯
Hot.rio.write_crs("epsg:4326", inplace=True) #set crs to our raster
Hot = Hot.rio.reproject(dst_crs='EPSG:3857') #reproject because geojson in EPSG:3857
Hot = Hot.rio.clip(eu)
Hot = Hot.where(Hot['tg'] != -9999.) #conserve only necesaary values
Cold.rio.write_crs("epsg:4326", inplace=True)
... #idem
3. binning¶
Ok, so, we have 16 features that we must transform into 'categorical features'. For that we must split them into equals bins. With 4 bins per feature, we have 416 = 4294967296 geoclusters possibilities. Plenty enough! The reality will be way less. Testes have shown that 4 bins per feature would result into 116 geoclusters which is sufficient for now.
hot_bins = Hot.quantile([0,0.25,0.5,0.75,1]).tg.values
Hot_discr = np.digitize(Hot.tg.values, hot_bins)
Hot_discr = Hot_discr.astype(np.int32)
#idem for Cold raster
Now let's vectorize those rasters
from rasterio.features import shapes
west, east = min(Hot.x.values), max(Hot.x.values) #get corners coordinates
south, north = min(Hot.y.values), max(Hot.y.values) #get corners coordinates
width = len(Hot.x.values)
height = len(Hot.y.values)
res_x = np.diff(Hot.x.values).mean() #get the x gradient
res_y = np.diff(Hot.y.values).mean() #get the y gradient
transform = rasterio.transform.from_origin(west, north, res_x, -res_y) #Return an Affine transformation for a georeferenced raster
data = []
mask = Cold_discr != 5
for shp, val in shapes(Cold_discr,mask=mask, transform=transform):
record = {'properties':{'bin': val}, 'geometry' : shp}
data.append(record)
#let's make a geodataframe from it
Hot_df = gpd.GeoDataFrame.from_features(data,crs=3857)
Hot_df = Hot_df.dissolve(by='bin')
Hot_temp in C | |
---|---|
bin | |
1.0 | MULTIPOLYGON (((-727543.440 7693137.472, -7242... |
2.0 | MULTIPOLYGON (((-352714.079 7639933.464, -3604... |
3.0 | MULTIPOLYGON (((-986903.059 5272566.153, -9883... |
4.0 | MULTIPOLYGON (((-1010018.459 4695661.164, -101... |
Also let's fit data to countries outline
for index, row in Hot_df.iterrows(): #for each bin
temp_poly = row.geometry.buffer(100*1000) #buffer the current geometry
temp_poly = temp_poly.intersection(eu) #intersect it with EU geometry (outline)
other_polys = Hot_df.loc[Hot_df.index != index].geometry.values.unary_union() #union of all the other polygon
to_remove = temp_poly.intersection(other_polys) #intersection to retrieve what to remove (because of the previous buffer)
temp_poly = temp_poly.difference(to_remove) #removal of teh extra bits
Hot_df.loc[index,['geometry']]= temp_poly #overwrite geometry in dataframe
Hot_df | Cold_df |
---|---|
![]() |
![]() |
We binned our rasters. Let's bin our other variables.
all_data_dis = all_data.apply(lambda col: pd.qcut(col, 4, labels=False, duplicates = 'drop')) #qcut is a Quantile-based discretization function.
all_data_dis = all_data_dis+1 #we add 1 so the value range from 1 to 4 (and not 0 to 3)
renew share in % | energy cons | residential gas price in euro | indus gas price in euro | residential elec price in euro | indus elec price in euro | single family U value | multiple family U value | single family area | mutiple family area | Residential share in % | Non-residential share in % | single family share in % | mutiple family share in % | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AT | 4 | 2 | 2 | 1 | 3 | 1 | 2 | 2.0 | 2 | 1.0 | 1 | 4 | 2 | 3 | POLYGON ((1687803.014 6264215.777, 1696293.843... |
BE | 1 | 3 | 1 | 1 | 4 | 4 | 3 | 4.0 | 4 | 4.0 | 2 | 3 | 4 | 1 | POLYGON ((536053.133 6697902.835, 536858.496 6... |
BG | 3 | 1 | 1 | 3 | 1 | 3 | 4 | 3.0 | 3 | 1.0 | 3 | 2 | 2 | 2 | POLYGON ((2551393.950 5439823.819, 2575025.606... |
CZ | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2.0 | 1 | 2.0 | 1 | 4 | 2 | 3 | POLYGON ((1602756.662 6623613.894, 1605874.568... |
DE | 2 | 4 | 2 | 2 | 4 | 3 | 1 | 2.0 | 4 | 3.0 | 2 | 3 | 2 | 3 | MULTIPOLYGON (((750538.061 7090702.816, 751353... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
IT | 3 | 4 | 4 | 3 | 3 | 4 | 4 | 3.0 | 3 | 2.0 | 4 | 1 | 1 | 4 | MULTIPOLYGON (((1404993.029 4230990.607, 14038... |
NL | 1 | 3 | 4 | 1 | 2 | 1 | 4 | 4.0 | 3 | 4.0 | 1 | 4 | 4 | 1 | MULTIPOLYGON (((-7596140.830 1348758.669, -759... |
PL | 1 | 4 | 1 | 4 | 1 | 2 | 1 | 1.0 | 2 | 4.0 | 2 | 3 | 1 | 4 | POLYGON ((2115307.046 7235751.798, 2157060.914... |
SE | 4 | 2 | 4 | 4 | 2 | 2 | 1 | NaN | 1 | 3.0 | 1 | 4 | 1 | 4 | MULTIPOLYGON (((1744966.813 7582362.064, 17458... |
SI | 4 | 1 | 2 | 4 | 2 | 1 | 2 | 4.0 | 4 | 3.0 | 4 | 1 | 3 | 2 | POLYGON ((1819341.831 5895544.825, 1820883.526... |
4. merge all in a final table¶
Now, all our data are ready to come together. I am going to spare you dataframe manipulations because it isn't the topic here but at the end, the final table would look like that:
quartiles | geometry | ||
---|---|---|---|
1.0 | renew share in % | MULTIPOLYGON (((-7600964.855 1353927.589, -760... | |
energy cons | MULTIPOLYGON (((-1146332.292 6785384.989, -114... | ||
residential gas price in euro | MULTIPOLYGON (((532003.307 6697072.451, 530749... | ||
indus gas price in euro | MULTIPOLYGON (((-7601494.819 1355961.122, -760... | ||
residential elec price in euro | MULTIPOLYGON (((2089863.323 6360968.338, 20852... | ||
... | ... | ... | ... |
4.0 | Non-residential share in % | MULTIPOLYGON (((-7601703.180 1358036.485, -760... | |
single family share in % | MULTIPOLYGON (((-7600964.855 1353927.589, -760... | ||
mutiple family share in % | MULTIPOLYGON (((-1997145.494 3212924.432, -199... | ||
Cold_temp in C | MULTIPOLYGON (((-1160953.847 6780231.201, -115... | ||
Hot_temp in C | MULTIPOLYGON (((-1010018.459 4695661.164, -101... |
The final result can then be store in a postgresql DB, geoJSON, Shapefile or more...
final_data.to_postgis(name, con, schema=None, if_exists='fail', index=False, index_label=None, chunksize=None, dtype=None)
final_data.to_file("data.shp")
final_data.to_file("data.geojson", driver='GeoJSON')
...
5. Plotting¶
I first intended to use javascript for the plotting using mapbox gl
, turf
and d3
for a more interactive output but I got into some issues with the polygons intersections. So I went for something simpler using python folium
, geopy
and ipywidgets
.