Visualization for wages purchasing power in swedish counties.
This weekend I participated in the hackathon Hack For Sweden 2018. It is focused on using open data from the swedish government agencies to come up with ideas and applications for improving the environment, general well-being of the population, the job market, safety, etc.
My team developed a framework to identify the best areas in the country to settle if one wants to work remotely. One part of this concept required the relative quantification of wages in each County, meaning that a fair comparison of wages had to take into consideration the local cost of living. This way, the optimal place for working would be where wages were high and cost of living low. We called the wage normalized with the cost of living, Smart Cash Index (SCI).
This seemed like a good small project for dealing with messy structured data and learning a bit of geocoding and mapping quantities.
Data sources
- Local cost of living: Statistika Centralbyrån (SCB)
- Mean wages for each County: Lönestatistik.se (in this analysis, we used the wages for programming jobs)
- Population of each municipality: Wikipedia (needed for applying the cost of living information)
The code
import os
import pandas as pd
folder = 'smart_cash_data'
# Loading local cost of living data
living_cost = pd.read_csv(os.path.join(folder, 'cost_of_living.csv'))
living_cost['2015'] = (living_cost['2015']*1000/12).astype(int)
living_cost.columns = ['Location', 'Cost of living (2015)']
living_cost
Location | Cost of living (2015) | |
---|---|---|
0 | 00 Riket | 20783 |
1 | 0010 Stor-Stockholm | 23791 |
2 | 0020 Stor-Göteborg | 20650 |
3 | 0042 Kommuner med > 75000 inv (exkl Stor-Stock... | 19258 |
4 | 0043 Kommuner med < 75000 inv (exkl Stor-Stock... | 20516 |
# Loading mean wages for each County
wages = pd.read_csv(os.path.join(folder, 'it_wages.csv'))
wages['Medellön'] = wages['Medellön'].str.replace(' kr', '').apply(pd.to_numeric)
wages = wages[['Län', 'Medellön']]
wages.columns = ['County', 'Mean wage']
wages
County | Mean wage | |
---|---|---|
0 | Blekinge | 29.140 |
1 | Dalarna | 29.000 |
2 | Gotland | 25.825 |
3 | Gävleborg | 25.529 |
4 | Halland | 27.418 |
5 | Jämtland | 32.913 |
6 | Jönköping | 32.982 |
7 | Kalmar | 31.088 |
8 | Kronoberg | 29.843 |
9 | Norrbotten | 28.560 |
10 | Skåne | 32.538 |
11 | Stockholm | 35.194 |
12 | Södermanland | 32.189 |
13 | Uppsala | 30.704 |
14 | Värmland | 32.902 |
15 | Västerbotten | 27.211 |
16 | Västernorrland | 30.412 |
17 | Västmanland | 34.295 |
18 | Västra Götaland | 31.468 |
19 | Örebro | 28.848 |
20 | Östergötland | 28.661 |
I used wikitable2csv for converting the population info from a wikipedia table to a .csv file.
# Loading and preprocessing local population information.
muni = pd.read_csv(os.path.join(folder, 'municipalities.csv'))
muni['County'] = muni['County'].str.replace(' County', '')
muni['Municipality'] = muni['Municipality'].str.replace(' Municipality', '')
muni = muni[['Municipality', 'County', 'Population']]
muni.sample(10)
Municipality | County | Population | |
---|---|---|---|
134 | Malmö | Skåne | 311540 |
52 | Grums | Värmland | 8918 |
205 | Sundsvall | Västernorrland | 96977 |
158 | Nybro | Kalmar | 19466 |
33 | Emmaboda | Kalmar | 8969 |
162 | Nässjö | Jönköping | 29470 |
104 | Kristianstad | Skåne | 80948 |
282 | Örkelljunga | Skåne | 9658 |
181 | Sandviken | Gävleborg | 37179 |
136 | Malå | Västerbotten | 3170 |
# Applying local cost of living information for each municipality
muni.loc[:, 'Cost of living'] = 20516
muni.loc[muni['Population'] > 75000, 'Cost of living'] = 19258
muni.loc[muni['County'] == 'Stockholm', 'Cost of living'] = 23791
muni.loc[muni['Municipality'] == 'Göteborg', 'Cost of living'] = 20650
# Filling wages values for each County
muni['Wages'] = 0
for wage in wages['County']:
muni.loc[muni['County'] == wage, 'Wages'] = wages.loc[wages['County'] == wage, 'Mean wage'].values
# Smart cash is the wage normalized by the consumption index or cost of living
muni['smart_cash'] = muni['Wages']/muni['Cost of living']
# Mean by County and rescaling the smart_cash
smart_cash = muni.groupby('County')['smart_cash'].mean().reset_index()
spread = (smart_cash['smart_cash'].max() - smart_cash['smart_cash'].min())
smart_cash['smart_cash_idx'] = (smart_cash['smart_cash'] - smart_cash['smart_cash'].min()) / spread
smart_cash.drop('smart_cash', axis=1, inplace=True)
smart_cash
County | smart_cash_idx | |
---|---|---|
0 | Blekinge | 0.390359 |
1 | Dalarna | 0.374491 |
2 | Gotland | 0.014647 |
3 | Gävleborg | 0.000000 |
4 | Halland | 0.262856 |
5 | Jämtland | 0.817978 |
6 | Jönköping | 0.844582 |
7 | Kalmar | 0.611139 |
8 | Kronoberg | 0.497652 |
9 | Norrbotten | 0.339726 |
10 | Skåne | 0.804676 |
11 | Stockholm | 0.527416 |
12 | Södermanland | 0.762402 |
13 | Uppsala | 0.596032 |
14 | Värmland | 0.831956 |
15 | Västerbotten | 0.185162 |
16 | Västernorrland | 0.566688 |
17 | Västmanland | 1.000000 |
18 | Västra Götaland | 0.658489 |
19 | Örebro | 0.375062 |
20 | Östergötland | 0.368715 |
Visualization
A choroplath map is what we need here. A choropleth map is a thematic map in which areas (in this case, Counties) are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map. Here, the variable is of course the SCI.
Besides the data, we need the coordinates of each County, in the geojson format. The choropleth is built with the folium package, that gives interactive plots and it’s extremely easy to use.
import folium
geo = os.path.join(folder, 'sweden-counties.geojson')
m = folium.Map(location=[62, 18], width='60%', zoom_start=4, detect_retina=True, tiles='Mapbox Bright')
m.choropleth(
geo_data=geo,
name='Smart Cash',
data=smart_cash,
columns=['County', 'smart_cash_idx'],
key_on='feature.properties.name',
fill_color='YlGnBu',
fill_opacity=0.7,
line_opacity=0.5,
highlight=True,
legend_name=''
)
folium.LayerControl().add_to(m)
m
Here, 0
indicates the County where the mean wage is lowest with respect to the local cost of living. 1
is the oposite, where the wage represents the maximum purchasing power.
This is the end of this post! Let me know what you think in the comments.
More resources
- My team submission to the hackaton and the public github repository
- Swedish open data portal