Coronavirus (COVID-19) Deaths: SQL Data Exploration
- sherry salek
- Jul 20, 2022
- 4 min read
Updated: Jul 20, 2022
According to John Hopkins Medicine Website, Coronavirus is a type of virus. There are many different kinds, and some cause disease. A coronavirus identified in 2019, SARS-CoV-2, has caused a pandemic of respiratory illness, called COVID-19.
We want to explore the number of confirmed deaths data set from Our World in Data Website.
About Our Data
We created two Excel datasets with a lot of information. We divided our excel to two different excels, one with total cases, total deaths, and population information, and the other one with the new vaccinations total vaccinations. The data has been gathered from January 2020 to July 2022. The main variables that we are going to analyze the follow variables:
Total Case
New Cases
Total Deaths
New Deaths
Population
Total Vaccinations
New Vaccination
Objectives
What percentage of people died with Covid-19 comparing to the people who had Covid-19 in Canada?
What percentage of people had Covid-19 comparing to the total population in Canada?
What countries have the highest infection rate compared to their population?
What countries have the highest deaths rate percentage?
What is the total cases, total deaths and death percentage compared to total cases worldwide?
What is the total amount of people in Canada that have been newly vaccinated per day?
We created two different datasets, one for the death information and on for vaccination information.
We are going to use Microsoft SQL Server Management to analyze our data. In order to import our two tables we used SQL Server import and export Wizard, we changed the files to excel workbook.
Let's take a look at our covid_deaths and covid_vaccinations tables:


Let's the examine the columns that we are interested and order them by location and date:

Data Exploration
Now we want to check the total cases and total deaths and comparing them in Canada.

When we check the latest data in July 17, 2022, we can see the total cases about 4 million people have been infected and the total deaths around 43 thousand with a rough estimate around 1.08%.
2. Now we want to look at the Canadian total cases versus the total population.

Canada population is roughly 38 million and according to the latest total cases number which is around 4 million, 0.1% of population got Covid-19 in July 2022.
3. Let's look at countries with the highest infection rate compared to their population. We use the Max() function to return the highest total cases. We also want the percentage population infected, grouping by location and population and order by percent population infected.

As we see here, Faeroe Island part of the Kingdom of Denmark has the highest percent infection since it has a small population. If you are curious about Canada:

4. Let's look at which countries have the highest death percentage. Total deaths column is nvarchar(255) type and we need to change the type to integer in order to show the maximum which is an aggregate function, so we need to cast it in order to convert it into int type. When we look at the data w found out that some continents are in the location by mistake and their continent cell is empty like the picture:

We need to define WHERE clause in order to filter the continents that are not NULL.

Sadly the United States is number one highest deaths and Brazil is the second country. You can find Canada deaths down here:

5. Let's look at the global numbers:

The above image clearly shows the Covid-19 total cases and total deaths in the world.
6. We are checking the total population of Canada versus the new vaccinations. We are using a JOIN clause to combine rows from two tables, based on two related columns, date and location.

As we can see above the number of new vaccinations per day are vary recently.
We want to add up the new vaccinations per day in Canada. We are using the PARTITION BY, which is a sub-clause of the OVER clause. The PARTITION BY clause divides a query’s result set into partitions.
The reason to use it is because we are breaking it up by location so that every time it gets to a new location, we want the count to start over. We also need to order by location and date to add up every single new vaccination to the total vaccination on each day. One more function is the CONVERT() function instead of CAST() function to convert the new_vaccinations into integer.

As you can see above, 30,498 people have been vaccinated on July 14, 2022 and this number will be added up to the rolling people vaccinated, resulting in 86,719,259 people have already been vaccinated in Canada. This number makes sense, since people got their second or third doses.
However we won't be able to add the rolling people vaccinated column to population to get the percentage, since the column has just been created. So we need to create a temporary table. We can use Common Table Expression (CTE), which is a powerful construct in SQL that helps simplify a query. CTEs work as virtual tables (with records and columns), created during the execution of a query, used by the query, and eliminated after query execution.
After creating the temporary table, we can select the desired column in the temp table to check the percentage in Canada.

Another way to create a temporary table with the same result is:
Data Visualization:
Global Numbers

Continent total death

Percent Population Infected Per Country

Comparing the percent population infected of five countries

Dashboard

Comentários