Data Visualization Examples

Caio Gasparine
6 min readOct 29, 2020

--

Follow below some examples using Power BI to generate data visualizations based on free data available on the web. All the files with the raw data are available for free on Kaggle.

#1 — US CENSUS DEMOGRAPHIC DATA

The analysis was performed only in the file acs2017_county_data.csv that contains all the data used and presented in this work.

The file contains demographic and economic data and includes other information such as ethnics, incomes, the field of work, transportation ways, sector of work, employment area, and gender.

The file contains info about:

+320 million people

+226 million people are of voting age (71%)

+150 million people are employed

+21 million people are unemployed

THEME

The file contains info about US counties about total population, gender, ethnics, income, % of workers, % of ways to commute, % workers sectors, and % of unemployed people.

DATA FILE (acs2017_county_data.csv)

This data file contains information about all the 3,200 cities in the 52 US states, including Alaska e Puerto Rico.

DATA CLEANING

The data cleaning was performed changing de % information by the integer value, to simplify the analysis.

The info about areas outside the US such as Alaska and Puerto Rico (less than 1%) were removed because our analysis was about only the US territory.

DATASET LINK

QUESTIONS

1. Perform a co-relation analysis between field of work and transportation used among the regions in USA.

2. What are the factors affecting unemployment?

3. What are the factors that impacts the income of each state?

4. How is the relation between regions and income and employment?

5. What is the voting population in each area?

FILE DATA STRUCTURE

Raw data — file structure (acs2017_county_data.csv)

DATA VISUALIZATION (1)

Simple view

DATA VISUALIZATION (2)

Comparative total population — considering men and women

#2 — FOREST FIRES IN BRAZIL

The downloaded file amazon.cvs contains the number of forest fires in Brazil in the period of time between 1998 and 2017.

The file contains info about:

• 19 years of data

• +690,000 forest fires registered

In average…

• +36,000 forest fires per year

• +3,000 forest fires per month

  • +100 forest fires per day in the whole Brazilian territory…

THEME

The file contains data about forest fires in Brazil, including the year, month, accumulated forest fires, and state (territory).

DATA FILE (amazon.cvs )

This data file contains information about 23 states from the Brazilian territory and the total number of forest fires per state.

DATA CLEANING

The data cleaning was performed to remove some duplicated information, such as the column date because there is information about the year and month of the event and the column date was considering only the year, repeating the date and month for all the other rows in the file.

DATASET LINK

QUESTIONS

1. Which are the Brazilians states (territory) with the highest number of forest fires?

2. What is the tendency of the fire forest in Brazil, per state?

3. How the Amazon forest is in risk, considering your extension trough the affected areas (states)?

4. Where the Brazilian government has to put the highest efforts to avoid fire forests?

FILE DATA STRUCTURE

Raw data — file structure (amazon.cvs )

DATA VISUALIZATION (1)

Table view — using Excel — conditional formating

DATA VISUALIZATION (2)

Brazilian map with forest fires data (highlighted the Legal Amazon Area)

DATA VISUALIZATION (3)

Power BI dashboard

#3 — WAVES MEASURING — BUOYS DATA

Australia map — highlighted: wave measuring buoys

The downloaded file Coastal Data System — Waves (Mooloolaba) 01–2017 to 06–2019.csv contains measured/calculated wave parameters and were collected by oceanographic wave measuring buoys anchored at Mooloolaba.
buoys data.

The file contains info about:

+30 months of data collected

+43,000 rows of data

THEME

The file contains info about ocean waves, including data about significant wave height, the highest third of the waves, the maximum wave height, up crossing wave period, the peak energy wave period, direction from which the peak period waves are coming from and approximation of sea surface temperature.

DATAFILE

(Coastal Data System — Waves (Mooloolaba) 01–2017 to 06–2019.csv )

All the measures were taken on a 30-minute intervals basis.

DATA CLEANING

The data cleaning was performed removing all the rows with negative values (e.g. -99.9) that probably represent a reading error in the measure equipment/hardware.

DATASET LINK

Q U E S T I O N S

1. The relation between the weather and the ocean surface temperature.

2. Identify areas where is difficult to navigate due to great and strong waves.

3. General oceanographic study, such as historical wave patterns and variations, including, direction, strengths, temperature, etc.

4. (possible) Considering this same data file, but in different locations, it is possible to identify the better place to install equipment to generate the energy using the power of the waves.

FILE DATA STRUCTURE

Raw data — file structure (amazon.cvs )

DATA VISUALIZATION (1)

Table view — using Excel — conditional formating

DATA VISUALIZATION (2)

Excel graph — highlighted: seasonality

#4 — Blue Mountain (Collingwood) Weather

The downloaded file collingwood_1994_2019.csv contains daily data about the weather in the Blue Mountain — Collingwood area in the period of time between 1994–12–30 and 2019–06–21.

The file contains info about:

+25 years of data

+10,000 rows

12 columns of data

THEME

Weather conditions, including, a daily measure of Max Temperature (°C), Min Temperature (°C), Mean Temperature (°C), Total Precip (mm), Dir of Max Gust (10s deg), Dir of Max Gust Flag, Speed of Max Gust (km/h)

DATA FILE (collingwood_1994_2019.csv)

The file contains daily data about the weather in the Collingwood area — Blue Mountain.

DATA CLEANING

Some days didn’t have the complete information, but if you work with grouped data (e.g. using max and min) this impact could be minimized due to empty data.

DATASET LINK

QUESTIONS

0. Should I buy a ski pass next year? (Kaggle)

1. Perform a correlation analysis between the wind speed and the min temperature.

2. Identify the worst and best months (historically) — colder and warm wheatear.

3. Identify in which period of the year the wind is stronger.

4. General analysis of weather tendency (future) based on past behavior (historical).

FILE DATA STRUCTURE

Raw data — file structure (collingwood_1994_2019.csv)

DATA VISUALIZATION (1)

DATA INTERPRETATION

For sure, the most important part of the analysis of the graphs is the interpretation.

REFERENCES

> Dataset #1 — US Census Demographic Data

https://www.kaggle.com/muonneutrino/us-census-demographic-data

> Dataset #2 — Forest Fires in Brazil

https://www.kaggle.com/gustavomodelli/forest-fires-in-brazil

• Brazilian Government — http://dados.gov.br/dataset/sistema-nacional-de-informacoes-florestais-snif

https://www.oeco.org.br/dicionario-ambiental/28783-o-que-e-a-amazonia-legal/

> Dataset #3 — Waves Measuring Buoys Data

https://www.kaggle.com/jolasa/waves-measuring-buoys-data-mooloolaba

• Queensland Government Data — https://data.qld.gov.au/dataset

> Dataset #4 — Blue Mountain (Collingwood) Weather

https://www.kaggle.com/metcalfepete/blue-mountain-collingwood-weather

Thank you! ;-)

--

--

Caio Gasparine
Caio Gasparine

Written by Caio Gasparine

Project Manager | Data & AI | Professor

No responses yet