Swedish local purchasing power

Visualization for wages purchasing power in swedish counties.

This weekend I participated in the hackathon Hack For Sweden 2018. It is focused on using open data from the swedish government agencies to come up with ideas and applications for improving the environment, general well-being of the population, the job market, safety, etc.

My team developed a framework to identify the best areas in the country to settle if one wants to work remotely. One part of this concept required the relative quantification of wages in each County, meaning that a fair comparison of wages had to take into consideration the local cost of living. This way, the optimal place for working would be where wages were high and cost of living low. We called the wage normalized with the cost of living, Smart Cash Index (SCI).

This seemed like a good small project for dealing with messy structured data and learning a bit of geocoding and mapping quantities.

Data sources

Local cost of living: Statistika Centralbyrån (SCB)
Mean wages for each County: Lönestatistik.se (in this analysis, we used the wages for programming jobs)
Population of each municipality: Wikipedia (needed for applying the cost of living information)

The code

import os
import pandas as pd

folder = 'smart_cash_data'

# Loading local cost of living data
living_cost = pd.read_csv(os.path.join(folder, 'cost_of_living.csv'))
living_cost['2015'] = (living_cost['2015']*1000/12).astype(int)
living_cost.columns = ['Location', 'Cost of living (2015)']
living_cost

	Location	Cost of living (2015)
0	00 Riket	20783
1	0010 Stor-Stockholm	23791
2	0020 Stor-Göteborg	20650
3	0042 Kommuner med > 75000 inv (exkl Stor-Stock...	19258
4	0043 Kommuner med < 75000 inv (exkl Stor-Stock...	20516

# Loading mean wages for each County
wages = pd.read_csv(os.path.join(folder, 'it_wages.csv'))
wages['Medellön'] = wages['Medellön'].str.replace(' kr', '').apply(pd.to_numeric)
wages = wages[['Län', 'Medellön']]
wages.columns = ['County', 'Mean wage']
wages

	County	Mean wage
0	Blekinge	29.140
1	Dalarna	29.000
2	Gotland	25.825
3	Gävleborg	25.529
4	Halland	27.418
5	Jämtland	32.913
6	Jönköping	32.982
7	Kalmar	31.088
8	Kronoberg	29.843
9	Norrbotten	28.560
10	Skåne	32.538
11	Stockholm	35.194
12	Södermanland	32.189
13	Uppsala	30.704
14	Värmland	32.902
15	Västerbotten	27.211
16	Västernorrland	30.412
17	Västmanland	34.295
18	Västra Götaland	31.468
19	Örebro	28.848
20	Östergötland	28.661

I used wikitable2csv for converting the population info from a wikipedia table to a .csv file.

# Loading and preprocessing local population information.
muni = pd.read_csv(os.path.join(folder, 'municipalities.csv'))
muni['County'] = muni['County'].str.replace(' County', '')
muni['Municipality'] = muni['Municipality'].str.replace(' Municipality', '')
muni = muni[['Municipality', 'County', 'Population']]
muni.sample(10)

	Municipality	County	Population
134	Malmö	Skåne	311540
52	Grums	Värmland	8918
205	Sundsvall	Västernorrland	96977
158	Nybro	Kalmar	19466
33	Emmaboda	Kalmar	8969
162	Nässjö	Jönköping	29470
104	Kristianstad	Skåne	80948
282	Örkelljunga	Skåne	9658
181	Sandviken	Gävleborg	37179
136	Malå	Västerbotten	3170

# Applying local cost of living information for each municipality
muni.loc[:, 'Cost of living'] = 20516
muni.loc[muni['Population'] > 75000, 'Cost of living'] = 19258
muni.loc[muni['County'] == 'Stockholm', 'Cost of living'] = 23791
muni.loc[muni['Municipality'] == 'Göteborg', 'Cost of living'] = 20650

# Filling wages values for each County
muni['Wages'] = 0
for wage in wages['County']:
    muni.loc[muni['County'] == wage, 'Wages'] = wages.loc[wages['County'] == wage, 'Mean wage'].values

# Smart cash is the wage normalized by the consumption index or cost of living
muni['smart_cash'] = muni['Wages']/muni['Cost of living']

# Mean by County and rescaling the smart_cash
smart_cash = muni.groupby('County')['smart_cash'].mean().reset_index()
spread = (smart_cash['smart_cash'].max() - smart_cash['smart_cash'].min())
smart_cash['smart_cash_idx'] = (smart_cash['smart_cash'] - smart_cash['smart_cash'].min()) / spread
smart_cash.drop('smart_cash', axis=1, inplace=True)

smart_cash

	County	smart_cash_idx
0	Blekinge	0.390359
1	Dalarna	0.374491
2	Gotland	0.014647
3	Gävleborg	0.000000
4	Halland	0.262856
5	Jämtland	0.817978
6	Jönköping	0.844582
7	Kalmar	0.611139
8	Kronoberg	0.497652
9	Norrbotten	0.339726
10	Skåne	0.804676
11	Stockholm	0.527416
12	Södermanland	0.762402
13	Uppsala	0.596032
14	Värmland	0.831956
15	Västerbotten	0.185162
16	Västernorrland	0.566688
17	Västmanland	1.000000
18	Västra Götaland	0.658489
19	Örebro	0.375062
20	Östergötland	0.368715

Visualization

A choroplath map is what we need here. A choropleth map is a thematic map in which areas (in this case, Counties) are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map. Here, the variable is of course the SCI.

Besides the data, we need the coordinates of each County, in the geojson format. The choropleth is built with the folium package, that gives interactive plots and it’s extremely easy to use.

import folium

geo = os.path.join(folder, 'sweden-counties.geojson')
m = folium.Map(location=[62, 18], width='60%', zoom_start=4, detect_retina=True, tiles='Mapbox Bright')

m.choropleth(
    geo_data=geo,
    name='Smart Cash',
    data=smart_cash,
    columns=['County', 'smart_cash_idx'],
    key_on='feature.properties.name',
    fill_color='YlGnBu',
    fill_opacity=0.7,
    line_opacity=0.5,
    highlight=True,
    legend_name=''
)


folium.LayerControl().add_to(m)
m

Here, 0 indicates the County where the mean wage is lowest with respect to the local cost of living. 1 is the oposite, where the wage represents the maximum purchasing power.

This is the end of this post! Let me know what you think in the comments.

More resources

My team submission to the hackaton and the public github repository
Swedish open data portal

Share on

Twitter Facebook Google+ LinkedIn

Your email address will not be published. Required fields are marked *

Comment *

Markdown is supported.

Name *

Email address *

Website (optional)