Spatial Correlation Analytics Between Population and COVID-19 Confirmed Cases in New York State

Author: Weiye Chen & Shaohua Wang, University of Illinois at Urbana-Champaign

This Jupyter notebook demonstrates spatial correlation analysis between population and CVOID-19 confirmed cases in New York State.

We are taking the state of New York as our study area. This notebook uses geospatial libraries to show the spatial distribution of population data, COVID-19 confirmed cases, daily increases during the past week in the New York State, and demonstrate results for the spatial correlation analytics between population and the number of confirmed COVID-19 cases in New York State.

Data Preparation

The first part is a demostration that shows users how to prepare population data and COVID-19 data in New York State.

Set up the environment by importing libraries

This notebook depends on numpy, pandas, geopandas, shapely, and other libraries available in CyberGISX-Jupyter. In order to set up an environment to store and manipulate the Population data, we need to import these libraries.

In [1]:
import pathlib
import os
import tarfile

import requests
import shutil
import zipfile
 
import pandas as pd
import pathlib
import os
import tarfile

import requests
import shutil
import zipfile

# Plotting the population data
import matplotlib.pyplot as plt
import datetime
%matplotlib inline

import numpy as np
import geopandas as gpd
from shapely.geometry import Point

import plotly.figure_factory as ff
import plotly.express as px
import json
import plotly.graph_objects as go
import cufflinks as cf

import seaborn as sns

import warnings
warnings.filterwarnings("ignore")

Population data

Population data for New York State

This piece of data is formatted as a shapefile. link: https://www.arcgis.com/home/item.html?id=3b69769aa9b646a483af81d05e7702d2

U.S. Counties represents the counties of the United States in the 50 states, the District of Columbia, and Puerto Rico.

Originally extracted from this layer package: http://www.arcgis.com/home/item.html?id=a00d6b6149b34ed3b833e10fb72ef47b

In [2]:
%%time
file = pathlib.Path("USA_Counties_as_Shape.zip")
if file.exists ():
    print ("Population data exist")
else:
    print ("Population data not exist, Downloading the Population data...")
    !wget https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/22153815/USA_Counties_as_Shape.zip
Population data exist
CPU times: user 334 µs, sys: 234 µs, total: 568 µs
Wall time: 430 µs

Show the first five records of the new york state in this shapfile.

In [3]:
%%time
pop = gpd.read_file("zip://USA_Counties_as_Shape.zip")
pop = pop[pop.STATE_NAME=='New York']
pop
CPU times: user 3.71 s, sys: 190 ms, total: 3.9 s
Wall time: 3.91 s
Out[3]:
NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS POP2010 POP10_SQMI POP2012 POP12_SQMI WHITE ... OWNER_OCC RENTER_OCC NO_FARMS07 AVG_SIZE07 CROP_ACR07 AVG_SALE07 SQMI Shape_Leng Shape_Area geometry
1828 Albany New York 36 001 36001 304204 570.6 305432 572.871183 237873 ... 72577 53674 498.0 123.0 32020.0 45.01 533.16 1.787027 0.151470 MULTIPOLYGON (((-73.87971 42.76647, -73.87986 ...
1829 Allegany New York 36 003 36003 48946 47.3 49415 47.771655 47085 ... 13563 4645 847.0 178.0 74635.0 54.39 1034.40 2.197075 0.292292 POLYGON ((-78.22079 42.52173, -78.21983 42.521...
1830 Bronx New York 36 005 36005 1385108 32939.5 1397295 33229.369798 386497 ... 93101 390348 1.0 -99.0 0.0 -99.00 42.05 1.289333 0.011631 MULTIPOLYGON (((-73.89687 40.79565, -73.89703 ...
1831 Broome New York 36 007 36007 200600 280.3 200701 280.461425 176444 ... 53260 28907 580.0 149.0 43575.0 51.53 715.61 2.473984 0.201904 POLYGON ((-75.86342 42.40935, -75.86251 42.399...
1832 Cattaraugus New York 36 009 36009 80317 60.7 80840 61.126654 74639 ... 23306 8957 1122.0 163.0 91562.0 66.98 1322.50 2.867147 0.373649 POLYGON ((-79.02148 42.53803, -79.01788 42.536...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1885 Washington New York 36 115 36115 63216 74.7 63763 75.353053 59815 ... 17722 6420 843.0 241.0 112016.0 133.17 846.19 3.037408 0.243164 POLYGON ((-73.37456 43.79744, -73.37352 43.796...
1886 Wayne New York 36 117 36117 93772 154.8 94142 155.362654 87148 ... 28106 8479 938.0 180.0 119662.0 180.13 605.95 2.669413 0.173682 MULTIPOLYGON (((-76.94668 43.25894, -76.94638 ...
1887 Westchester New York 36 119 36119 949113 2111.6 949931 2113.399929 646471 ... 213888 133344 106.0 80.0 2512.0 103.75 449.48 2.908143 0.124902 MULTIPOLYGON (((-73.76548 40.87745, -73.76569 ...
1888 Wyoming New York 36 121 36121 42155 70.7 42520 71.319546 38602 ... 11762 3739 761.0 287.0 157338.0 302.16 596.19 1.900056 0.169651 POLYGON ((-78.17165 42.87038, -78.16345 42.870...
1889 Yates New York 36 123 36123 25348 67.4 25464 67.754038 24647 ... 7193 2324 864.0 146.0 86596.0 102.29 375.83 1.511735 0.106828 POLYGON ((-76.94756 42.76441, -76.94773 42.759...

62 rows × 56 columns

COVID-19 Data

The data is retrieved from Johns Hopkins CSSE COVID-19 cases dataset repository.

The data format is CSV file.

In [4]:
%%time
confirmed_cases = pd.read_csv(
    "https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv"
)
confirmed_cases = confirmed_cases[confirmed_cases['Province_State'] == 'New York']
confirmed_cases
#confirmed_cases.head(5)
CPU times: user 59.8 ms, sys: 17.5 ms, total: 77.3 ms
Wall time: 449 ms
Out[4]:
UID iso2 iso3 code3 FIPS Admin2 Province_State Country_Region Lat Long_ ... 3/21/2020 3/22/2020 3/23/2020 3/24/2020 3/25/2020 3/26/2020 3/27/2020 3/28/2020 3/29/2020 3/30/2020
1833 84036001 US USA 840 36001.0 Albany New York US 42.600603 -73.977239 ... 88 123 127 146 152 171 187 195 205 217
1834 84036003 US USA 840 36003.0 Allegany New York US 42.257484 -78.027505 ... 2 2 2 2 2 2 2 2 6 7
1835 84036005 US USA 840 36005.0 Bronx New York US 40.852093 -73.862828 ... 0 0 0 0 0 0 0 0 0 0
1836 84036007 US USA 840 36007.0 Broome New York US 42.159032 -75.813261 ... 2 3 3 9 11 16 18 23 29 35
1837 84036009 US USA 840 36009.0 Cattaraugus New York US 42.247782 -78.679231 ... 0 0 0 0 0 0 0 1 4 6
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1892 84036119 US USA 840 36119.0 Westchester New York US 41.162784 -73.757417 ... 1387 1873 2894 3891 4691 5944 7187 7875 8519 9326
1893 84036121 US USA 840 36121.0 Wyoming New York US 42.701451 -78.221996 ... 2 2 2 4 4 7 7 7 8 8
1894 84036123 US USA 840 36123.0 Yates New York US 42.635055 -77.103699 ... 0 0 0 0 0 0 0 0 0 0
3181 84080036 US USA 840 80036.0 Out of NY New York US 0.000000 0.000000 ... 0 0 0 0 0 0 0 0 0 0
3233 84090036 US USA 840 90036.0 Unassigned New York US 0.000000 0.000000 ... 0 23 107 0 0 0 0 0 0 0

64 rows × 80 columns

Show the time series data

In [5]:
columns = confirmed_cases.columns
dates = columns[11:-1]
dates
Out[5]:
Index(['1/22/2020', '1/23/2020', '1/24/2020', '1/25/2020', '1/26/2020',
       '1/27/2020', '1/28/2020', '1/29/2020', '1/30/2020', '1/31/2020',
       '2/1/2020', '2/2/2020', '2/3/2020', '2/4/2020', '2/5/2020', '2/6/2020',
       '2/7/2020', '2/8/2020', '2/9/2020', '2/10/2020', '2/11/2020',
       '2/12/2020', '2/13/2020', '2/14/2020', '2/15/2020', '2/16/2020',
       '2/17/2020', '2/18/2020', '2/19/2020', '2/20/2020', '2/21/2020',
       '2/22/2020', '2/23/2020', '2/24/2020', '2/25/2020', '2/26/2020',
       '2/27/2020', '2/28/2020', '2/29/2020', '3/1/2020', '3/2/2020',
       '3/3/2020', '3/4/2020', '3/5/2020', '3/6/2020', '3/7/2020', '3/8/2020',
       '3/9/2020', '3/10/2020', '3/11/2020', '3/12/2020', '3/13/2020',
       '3/14/2020', '3/15/2020', '3/16/2020', '3/17/2020', '3/18/2020',
       '3/19/2020', '3/20/2020', '3/21/2020', '3/22/2020', '3/23/2020',
       '3/24/2020', '3/25/2020', '3/26/2020', '3/27/2020', '3/28/2020',
       '3/29/2020'],
      dtype='object')
In [6]:
pop["Admin2"]=pop["NAME"]
pop.shape
Out[6]:
(62, 57)
In [7]:
pop.describe()
Out[7]:
POP2010 POP10_SQMI POP2012 POP12_SQMI WHITE BLACK AMERI_ES ASIAN HAWN_PI HISPANIC ... VACANT OWNER_OCC RENTER_OCC NO_FARMS07 AVG_SIZE07 CROP_ACR07 AVG_SALE07 SQMI Shape_Leng Shape_Area
count 6.200000e+01 62.000000 6.200000e+01 62.000000 6.200000e+01 62.000000 62.000000 62.000000 62.000000 62.000000 ... 62.000000 62.000000 62.000000 62.000000 62.000000 62.000000 62.000000 62.000000 62.000000 62.000000
mean 3.125500e+05 3015.661290 3.150453e+05 3046.961245 2.054996e+05 49577.419355 1724.290323 22907.161290 141.387097 55111.645161 ... 12747.548387 62868.338710 55159.967742 586.322581 157.387097 69594.919355 107.102097 783.529355 2.801870 0.223882
std 5.323563e+05 10840.494722 5.379665e+05 10959.168351 2.829881e+05 139015.384817 3601.447083 75989.530459 311.764537 146525.909445 ... 17493.169368 89108.349004 130407.707749 377.733785 101.719859 54875.206378 98.465447 510.676050 2.388439 0.148468
min 4.836000e+03 2.700000 4.822000e+03 2.667448 4.705000e+03 35.000000 11.000000 24.000000 2.000000 51.000000 ... 1853.000000 1826.000000 436.000000 0.000000 -99.000000 0.000000 -99.000000 22.790000 0.961021 0.006296
25% 5.124350e+04 70.550000 5.162950e+04 71.293766 4.906100e+04 1061.500000 138.750000 367.500000 12.250000 1323.250000 ... 4282.000000 15441.500000 6369.750000 335.500000 123.750000 25046.000000 58.330000 459.987500 1.865630 0.128504
50% 9.130100e+04 115.750000 9.166000e+04 116.497116 8.135600e+04 2871.000000 357.000000 950.000000 22.000000 2503.000000 ... 7127.500000 23658.500000 9969.000000 587.500000 176.500000 65498.000000 94.910000 663.910000 2.457530 0.189159
75% 2.310602e+05 409.725000 2.333172e+05 412.029970 2.065555e+05 14815.750000 1079.000000 8319.000000 103.250000 14103.000000 ... 11281.000000 63807.750000 30053.750000 830.250000 220.750000 106100.000000 132.610000 1029.810000 2.897894 0.292240
max 2.504700e+06 69586.400000 2.549696e+06 70267.792892 1.206297e+06 860083.000000 18260.000000 511787.000000 1530.000000 741413.000000 ... 83437.000000 393507.000000 662615.000000 1658.000000 333.000000 211164.000000 415.270000 2804.820000 18.351044 0.821949

8 rows × 50 columns

In [8]:
confirmed_cases = confirmed_cases[confirmed_cases['Admin2'] != 'Unassigned']
In [9]:
confirmed_cases.head(64)
Out[9]:
UID iso2 iso3 code3 FIPS Admin2 Province_State Country_Region Lat Long_ ... 3/21/2020 3/22/2020 3/23/2020 3/24/2020 3/25/2020 3/26/2020 3/27/2020 3/28/2020 3/29/2020 3/30/2020
1833 84036001 US USA 840 36001.0 Albany New York US 42.600603 -73.977239 ... 88 123 127 146 152 171 187 195 205 217
1834 84036003 US USA 840 36003.0 Allegany New York US 42.257484 -78.027505 ... 2 2 2 2 2 2 2 2 6 7
1835 84036005 US USA 840 36005.0 Bronx New York US 40.852093 -73.862828 ... 0 0 0 0 0 0 0 0 0 0
1836 84036007 US USA 840 36007.0 Broome New York US 42.159032 -75.813261 ... 2 3 3 9 11 16 18 23 29 35
1837 84036009 US USA 840 36009.0 Cattaraugus New York US 42.247782 -78.679231 ... 0 0 0 0 0 0 0 1 4 6
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1891 84036117 US USA 840 36117.0 Wayne New York US 43.154944 -77.029765 ... 3 3 3 6 7 8 11 12 12 15
1892 84036119 US USA 840 36119.0 Westchester New York US 41.162784 -73.757417 ... 1387 1873 2894 3891 4691 5944 7187 7875 8519 9326
1893 84036121 US USA 840 36121.0 Wyoming New York US 42.701451 -78.221996 ... 2 2 2 4 4 7 7 7 8 8
1894 84036123 US USA 840 36123.0 Yates New York US 42.635055 -77.103699 ... 0 0 0 0 0 0 0 0 0 0
3181 84080036 US USA 840 80036.0 Out of NY New York US 0.000000 0.000000 ... 0 0 0 0 0 0 0 0 0 0

63 rows × 80 columns

Spatial Analysis

This part is a demostration that shows spatial correlation analytics bwtween population and CVOID-19 confirmed cases in New York State.

Spatial distribution

In [10]:
from urllib.request import urlopen
import json
#with urlopen('https://raw.githubusercontent.com/cybergis/COVID_19/master/counties_update_new.geojson') as response:
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

Show the spatial distribution of the COVID-19 Confirmed Cases in New York State using Mapbox Choropleth maps with Plotly. It will take 5 seconds to show the interactive map.

In [ ]:
%%time
fig = go.Figure(
    go.Choroplethmapbox(
        geojson=counties, locations=confirmed_cases.FIPS, 
        z=np.log1p(confirmed_cases['3/29/2020']),
#         z=confirmed_cases['3/29/20'],
        colorscale="reds", marker_opacity=0.5, marker_line_width=0,
        ids = confirmed_cases['Admin2'],  
        name = 'Confirmed Cases',
        colorbar_thickness = 10,
        hoverinfo = 'text',
        text = confirmed_cases['Admin2'] + ', ' + confirmed_cases['Province_State'] + '\n' + confirmed_cases['3/29/2020'].astype('str'),
#         showlegend = True,
        showscale = True,
        colorbar = dict(
            title = "# confirmed cases",
            titleside = 'top',
            tickmode = 'array',
            tickvals = np.arange(11),
            ticktext = np.round(np.exp(np.arange(0,11)) - 1),
            ticks = 'inside',
            outlinewidth = 0
        )
    ))
fig.update_layout(mapbox_style="carto-positron",
                  mapbox_zoom=5, #mapbox_center = {"lat": 37.0902, "lon": -95.7129},)
                  mapbox_center={"lat": 42.7, "lon": -76},
                 )
fig.update_layout(margin={"r":10,"t":10,"l":10,"b":10})

fig.show()

The density map is shown for the COVID-19 Confirmed Cases in New York State using Mapbox Density maps with Plotly. It will take about 200 milliseconds to show the interactive map.

In [ ]:
%%time 
fig = go.Figure(
    go.Densitymapbox(
        name = 'Density of Confirmed Cases',
        opacity = 0.7,
        z = np.log1p(confirmed_cases['3/29/2020']),
        lat = confirmed_cases['Lat'],
        lon = confirmed_cases['Long_'],
        colorscale = 'reds',
        radius = 30,
        
        text = confirmed_cases['Admin2'] + ', ' + confirmed_cases['Province_State'] + '\n' + confirmed_cases['3/29/2020'].astype('str'),
        hoverinfo = 'text',
        colorbar = dict(
            title = "# confirmed cases",
            titleside = 'top',
            tickmode = 'array',
            tickvals = np.arange(11),
            ticktext = np.round(np.exp(np.arange(0,11)) - 1),
            ticks = 'inside',
            outlinewidth = 0
        )
    )
)
fig.update_layout(mapbox_style="carto-positron",
                  mapbox_zoom=5, #mapbox_center = {"lat": 37.0902, "lon": -95.7129},)
                  mapbox_center={"lat": 42.7, "lon": -76})
fig.update_layout(margin={"r":0.1,"t":0.1,"l":0.1,"b":0.1})

fig.show()

The trend in the number of COVID-19 confirmed cases in all counties in New York.

In [13]:
nyc_count = confirmed_cases
nyc_count = nyc_count.set_index('Admin2')
nyc_count = nyc_count.T.iloc[11:]

Draw rectangles on the trace to zoom, and hover to see the data.

In [14]:
nyc_count[-30:].iplot(asFigure=True, xTitle="Date", yTitle="Confirmed Cases",
                title = "Trend in number of confirmed cases in New York",
               )

Convert the y-axis to a logarithm scale.

In [15]:
nyc_count[-30:].iplot(asFigure=True, xTitle="Date", yTitle="Confirmed Cases",
                title = "Trend in number of confirmed cases in New York (Log Scale)",
                     logy = True
               )

Spatio-temporal visualization

Number of confirmed cases during the Past Week in New York State. It will take 30 seconds to load the dynamic maps.

In [ ]:
%%time
fig = go.Figure()
dates_ = dates[-7:]
for date in dates_:
    fig.add_trace(
        dict(
            type="choroplethmapbox",
            visible = False,
            geojson=counties, locations=confirmed_cases.FIPS, 
            z=np.log1p(confirmed_cases[date]),
            colorscale="reds", marker_opacity=0.5, marker_line_width=0,
            ids = confirmed_cases['Admin2'],  
            name = 'Confirmed Cases',
            colorbar_thickness = 10,
            hoverinfo = 'text',
            text = confirmed_cases['Admin2'] + ', ' + confirmed_cases['Province_State'] + '\n' + confirmed_cases[date].astype('str'),
            showscale = True,
            zmin = 0,
            zmax = 11,
            colorbar = dict(
#                 title = "# confirmed cases",
                titleside = 'top',
                tickmode = 'array',
                tickvals = np.arange(11),
                ticktext = np.round(np.exp(np.arange(0,11)) - 1),
                ticks = 'inside',
                outlinewidth = 0,
                tickfont = {'color':'#a9a9a9'},
                x = 1
            )
        )
    )

steps = []
for i in range(len(fig.data)):
    step = dict(
        method='restyle',
        args=["visible", [False] * len(fig.data)],
        label = dates_[i],
    )
    step["args"][1][i] = True  # Toggle i'th trace to "visible"
    steps.append(step)

sliders = [dict(
    active=0,
    currentvalue={"prefix": "Date: "},
    pad={"t": 0, 'l' : 50, 'r':50},
    lenmode = 'fraction',
    len = 0.8,
    transition = {'easing': 'sin'},
    font = {'color':'#a9a9a9'},
    steps=steps,
)]

fig.update_layout(
    sliders=sliders
)

fig.data[0].visible = True
    
fig.update_layout(
    mapbox_style="carto-positron",
    mapbox_zoom=5, #mapbox_center = {"lat": 37.0902, "lon": -95.7129},)
    mapbox_center={"lat": 42.7, "lon": -76},
    margin={"r":10,"t":50,"l":15,"b":10},
    title={
        'text': "Confirmed Cases during the Past Week in the State of New York",
        'xref': "container"
    },
)
fig.show()

Drag on the slider to change date view. A preview to this visualization:

Daily increases during the Past Week in New York State. It will take 30 seconds to load the dynamic maps.

In [ ]:
%%time
import warnings
warnings.filterwarnings("ignore")
fig = go.Figure()
dates_ = dates[-8:]
for i in range(1,8):
    date = dates_[i]
    yesterday = dates_[i-1]
    fig.add_trace(
        dict(
            type="choroplethmapbox",
            visible = False,
            geojson=counties, locations=confirmed_cases.FIPS, 
            z=np.log1p(confirmed_cases[date] - confirmed_cases[yesterday]),
            colorscale="reds", marker_opacity=0.5, marker_line_width=0,
            ids = confirmed_cases['Admin2'],  
            name = 'Confirmed Cases',
            colorbar_thickness = 10,
            hoverinfo = 'text',
            text = confirmed_cases['Admin2'] + ', ' + confirmed_cases['Province_State'] + ' - Daily Increase: ' + (confirmed_cases[date] - confirmed_cases[yesterday]).astype('str'),
            showscale = True,
            zmin = 0,
            zmax = 8,
            colorbar = dict(
#                 title = "# confirmed cases",
                titleside = 'top',
                tickmode = 'array',
                tickvals = np.arange(0,9),
                ticktext = np.round(np.exp(np.arange(0,9)) - 1),
                ticks = 'inside',
                outlinewidth = 0,
                tickfont = {'color':'#a9a9a9'},
                x = 1
            )
        )
    )

steps = []
for i in range(len(fig.data)):
    step = dict(
        method='restyle',
        args=["visible", [False] * len(fig.data)],
        label = dates_[i+1],
    )
    step["args"][1][i] = True  # Toggle i'th trace to "visible"
    steps.append(step)

sliders = [dict(
    active=0,
    currentvalue={"prefix": "Date: "},
    pad={"t": 0, 'l' : 50, 'r':50},
    lenmode = 'fraction',
    len = 0.8,
    transition = {'easing': 'sin'},
    font = {'color':'#a9a9a9'},
    steps=steps,
)]

fig.update_layout(
    sliders=sliders
)

fig.data[0].visible = True
    
fig.update_layout(
    mapbox_style="carto-positron",
    mapbox_zoom=5, #mapbox_center = {"lat": 37.0902, "lon": -95.7129},)
    mapbox_center={"lat": 42.7, "lon": -76},
    margin={"r":10,"t":50,"l":15,"b":10},
    title={
        'text': "Daily Increases during the Past Week in the State of New York",
        'xref': "container"
    },
)

fig.show()

Spatial Correlation Analytics

In [16]:
sns.set(style='darkgrid', palette="deep", font_scale=1.1, rc={"figure.figsize": [10, 8]})
sns.distplot(pop['POP2012'], norm_hist=False, kde=False).set(xlabel='POP2012', ylabel='Count');
plt.savefig('POP2012_distplot.png')
In [17]:
sns.jointplot(x=pop['POP2012'], y=pop['POP2010']);
In [18]:
sns.jointplot(x=pop['POP2012'], y=pop['POP12_SQMI']);
In [19]:
%%time
merged_population = pop.merge(confirmed_cases, on=["Admin2"], how='outer')
merged_population.head()
CPU times: user 11.7 ms, sys: 1.48 ms, total: 13.2 ms
Wall time: 11.9 ms
Out[19]:
NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS_x POP2010 POP10_SQMI POP2012 POP12_SQMI WHITE ... 3/21/2020 3/22/2020 3/23/2020 3/24/2020 3/25/2020 3/26/2020 3/27/2020 3/28/2020 3/29/2020 3/30/2020
0 Albany New York 36 001 36001 304204.0 570.6 305432.0 572.871183 237873.0 ... 88 123 127 146 152 171 187 195 205 217
1 Allegany New York 36 003 36003 48946.0 47.3 49415.0 47.771655 47085.0 ... 2 2 2 2 2 2 2 2 6 7
2 Bronx New York 36 005 36005 1385108.0 32939.5 1397295.0 33229.369798 386497.0 ... 0 0 0 0 0 0 0 0 0 0
3 Broome New York 36 007 36007 200600.0 280.3 200701.0 280.461425 176444.0 ... 2 3 3 9 11 16 18 23 29 35
4 Cattaraugus New York 36 009 36009 80317.0 60.7 80840.0 61.126654 74639.0 ... 0 0 0 0 0 0 0 1 4 6

5 rows × 136 columns

Exploratory data analysis for population data and COVID-19 Confirmed Cases

In [20]:
%%time
fig, ax = plt.subplots(1,2, figsize=(18,18))
merged_population.plot(column='POP2012', scheme='Quantiles', k=5, cmap='YlGnBu', legend=True, ax=ax[0]);
merged_population.plot(column='3/29/2020', scheme='Quantiles', k=5, cmap='YlGnBu', legend=True, ax=ax[1]);
plt.tight_layout()
ax[0].set_title("Population Count")
ax[1].set_title("COVID-19 Confirmed Cases on 3/29/2020")
plt.savefig('comparison.png', bbox_inches="tight")
plt.show()
CPU times: user 3.21 s, sys: 258 ms, total: 3.47 s
Wall time: 3.92 s

These two figures show that there is spatial correlation between the population and COVID-19 confirmed cases.

Compute the correlation matrix between the population dataset and the COVID-19 confirmed cases dataset and plot the heatmap

In [21]:
%%time
columns = ['POP2012','POP12_SQMI','MALES','FEMALES','WHITE','BLACK','AMERI_ES','ASIAN','HAWN_PI','HISPANIC','OTHER','3/23/2020','3/24/2020','3/25/2020', '3/26/2020', 
           '3/27/2020', '3/28/2020','3/29/2020','3/30/2020']

# 
correlation = merged_population[columns].corr()

fig, ax = plt.subplots(figsize=(12,10))

sns.heatmap(correlation, xticklabels=columns,yticklabels=columns, ax=ax)

plt.show()
CPU times: user 343 ms, sys: 18.1 ms, total: 361 ms
Wall time: 359 ms

The correlation matrix reveals the correlation between population density and the spatial distribution of the COVID-19 confirmed cases in New York State.

In [ ]: