Swedish local purchasing power

Visualization for wages purchasing power in swedish counties.


This weekend I participated in the hackathon Hack For Sweden 2018. It is focused on using open data from the swedish government agencies to come up with ideas and applications for improving the environment, general well-being of the population, the job market, safety, etc.

My team developed a framework to identify the best areas in the country to settle if one wants to work remotely. One part of this concept required the relative quantification of wages in each County, meaning that a fair comparison of wages had to take into consideration the local cost of living. This way, the optimal place for working would be where wages were high and cost of living low. We called the wage normalized with the cost of living, Smart Cash Index (SCI).

This seemed like a good small project for dealing with messy structured data and learning a bit of geocoding and mapping quantities.

Data sources

  • Local cost of living: Statistika Centralbyrån (SCB)
  • Mean wages for each County: Lönestatistik.se (in this analysis, we used the wages for programming jobs)
  • Population of each municipality: Wikipedia (needed for applying the cost of living information)

The code

import os
import pandas as pd

folder = 'smart_cash_data'

# Loading local cost of living data
living_cost = pd.read_csv(os.path.join(folder, 'cost_of_living.csv'))
living_cost['2015'] = (living_cost['2015']*1000/12).astype(int)
living_cost.columns = ['Location', 'Cost of living (2015)']
living_cost
Location Cost of living (2015)
0 00 Riket 20783
1 0010 Stor-Stockholm 23791
2 0020 Stor-Göteborg 20650
3 0042 Kommuner med > 75000 inv (exkl Stor-Stock... 19258
4 0043 Kommuner med < 75000 inv (exkl Stor-Stock... 20516
# Loading mean wages for each County
wages = pd.read_csv(os.path.join(folder, 'it_wages.csv'))
wages['Medellön'] = wages['Medellön'].str.replace(' kr', '').apply(pd.to_numeric)
wages = wages[['Län', 'Medellön']]
wages.columns = ['County', 'Mean wage']
wages
County Mean wage
0 Blekinge 29.140
1 Dalarna 29.000
2 Gotland 25.825
3 Gävleborg 25.529
4 Halland 27.418
5 Jämtland 32.913
6 Jönköping 32.982
7 Kalmar 31.088
8 Kronoberg 29.843
9 Norrbotten 28.560
10 Skåne 32.538
11 Stockholm 35.194
12 Södermanland 32.189
13 Uppsala 30.704
14 Värmland 32.902
15 Västerbotten 27.211
16 Västernorrland 30.412
17 Västmanland 34.295
18 Västra Götaland 31.468
19 Örebro 28.848
20 Östergötland 28.661

I used wikitable2csv for converting the population info from a wikipedia table to a .csv file.

# Loading and preprocessing local population information.
muni = pd.read_csv(os.path.join(folder, 'municipalities.csv'))
muni['County'] = muni['County'].str.replace(' County', '')
muni['Municipality'] = muni['Municipality'].str.replace(' Municipality', '')
muni = muni[['Municipality', 'County', 'Population']]
muni.sample(10)
Municipality County Population
134 Malmö Skåne 311540
52 Grums Värmland 8918
205 Sundsvall Västernorrland 96977
158 Nybro Kalmar 19466
33 Emmaboda Kalmar 8969
162 Nässjö Jönköping 29470
104 Kristianstad Skåne 80948
282 Örkelljunga Skåne 9658
181 Sandviken Gävleborg 37179
136 Malå Västerbotten 3170
# Applying local cost of living information for each municipality
muni.loc[:, 'Cost of living'] = 20516
muni.loc[muni['Population'] > 75000, 'Cost of living'] = 19258
muni.loc[muni['County'] == 'Stockholm', 'Cost of living'] = 23791
muni.loc[muni['Municipality'] == 'Göteborg', 'Cost of living'] = 20650

# Filling wages values for each County
muni['Wages'] = 0
for wage in wages['County']:
    muni.loc[muni['County'] == wage, 'Wages'] = wages.loc[wages['County'] == wage, 'Mean wage'].values

# Smart cash is the wage normalized by the consumption index or cost of living
muni['smart_cash'] = muni['Wages']/muni['Cost of living']

# Mean by County and rescaling the smart_cash
smart_cash = muni.groupby('County')['smart_cash'].mean().reset_index()
spread = (smart_cash['smart_cash'].max() - smart_cash['smart_cash'].min())
smart_cash['smart_cash_idx'] = (smart_cash['smart_cash'] - smart_cash['smart_cash'].min()) / spread
smart_cash.drop('smart_cash', axis=1, inplace=True)
smart_cash
County smart_cash_idx
0 Blekinge 0.390359
1 Dalarna 0.374491
2 Gotland 0.014647
3 Gävleborg 0.000000
4 Halland 0.262856
5 Jämtland 0.817978
6 Jönköping 0.844582
7 Kalmar 0.611139
8 Kronoberg 0.497652
9 Norrbotten 0.339726
10 Skåne 0.804676
11 Stockholm 0.527416
12 Södermanland 0.762402
13 Uppsala 0.596032
14 Värmland 0.831956
15 Västerbotten 0.185162
16 Västernorrland 0.566688
17 Västmanland 1.000000
18 Västra Götaland 0.658489
19 Örebro 0.375062
20 Östergötland 0.368715

Visualization

A choroplath map is what we need here. A choropleth map is a thematic map in which areas (in this case, Counties) are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map. Here, the variable is of course the SCI.

Besides the data, we need the coordinates of each County, in the geojson format. The choropleth is built with the folium package, that gives interactive plots and it’s extremely easy to use.

import folium

geo = os.path.join(folder, 'sweden-counties.geojson')
m = folium.Map(location=[62, 18], width='60%', zoom_start=4, detect_retina=True, tiles='Mapbox Bright')

m.choropleth(
    geo_data=geo,
    name='Smart Cash',
    data=smart_cash,
    columns=['County', 'smart_cash_idx'],
    key_on='feature.properties.name',
    fill_color='YlGnBu',
    fill_opacity=0.7,
    line_opacity=0.5,
    highlight=True,
    legend_name=''
)


folium.LayerControl().add_to(m)
m

Here, 0 indicates the County where the mean wage is lowest with respect to the local cost of living. 1 is the oposite, where the wage represents the maximum purchasing power.

This is the end of this post! Let me know what you think in the comments.

More resources