Data Visualization Examples
Follow below some examples using Power BI to generate data visualizations based on free data available on the web. All the files with the raw data are available for free on Kaggle.
#1 — US CENSUS DEMOGRAPHIC DATA
The analysis was performed only in the file acs2017_county_data.csv that contains all the data used and presented in this work.
The file contains demographic and economic data and includes other information such as ethnics, incomes, the field of work, transportation ways, sector of work, employment area, and gender.
The file contains info about:
+320 million people
+226 million people are of voting age (71%)
+150 million people are employed
+21 million people are unemployed
THEME
The file contains info about US counties about total population, gender, ethnics, income, % of workers, % of ways to commute, % workers sectors, and % of unemployed people.
DATA FILE (acs2017_county_data.csv)
This data file contains information about all the 3,200 cities in the 52 US states, including Alaska e Puerto Rico.
DATA CLEANING
The data cleaning was performed changing de % information by the integer value, to simplify the analysis.
The info about areas outside the US such as Alaska and Puerto Rico (less than 1%) were removed because our analysis was about only the US territory.
DATASET LINK
QUESTIONS
1. Perform a co-relation analysis between field of work and transportation used among the regions in USA.
2. What are the factors affecting unemployment?
3. What are the factors that impacts the income of each state?
4. How is the relation between regions and income and employment?
5. What is the voting population in each area?
FILE DATA STRUCTURE
DATA VISUALIZATION (1)
DATA VISUALIZATION (2)
#2 — FOREST FIRES IN BRAZIL
The downloaded file amazon.cvs contains the number of forest fires in Brazil in the period of time between 1998 and 2017.
The file contains info about:
• 19 years of data
• +690,000 forest fires registered
In average…
• +36,000 forest fires per year
• +3,000 forest fires per month
- +100 forest fires per day in the whole Brazilian territory…
THEME
The file contains data about forest fires in Brazil, including the year, month, accumulated forest fires, and state (territory).
DATA FILE (amazon.cvs )
This data file contains information about 23 states from the Brazilian territory and the total number of forest fires per state.
DATA CLEANING
The data cleaning was performed to remove some duplicated information, such as the column date because there is information about the year and month of the event and the column date was considering only the year, repeating the date and month for all the other rows in the file.
DATASET LINK
QUESTIONS
1. Which are the Brazilians states (territory) with the highest number of forest fires?
2. What is the tendency of the fire forest in Brazil, per state?
3. How the Amazon forest is in risk, considering your extension trough the affected areas (states)?
4. Where the Brazilian government has to put the highest efforts to avoid fire forests?
FILE DATA STRUCTURE
DATA VISUALIZATION (1)
DATA VISUALIZATION (2)
DATA VISUALIZATION (3)
#3 — WAVES MEASURING — BUOYS DATA
The downloaded file Coastal Data System — Waves (Mooloolaba) 01–2017 to 06–2019.csv contains measured/calculated wave parameters and were collected by oceanographic wave measuring buoys anchored at Mooloolaba.
buoys data.
The file contains info about:
+30 months of data collected
+43,000 rows of data
THEME
The file contains info about ocean waves, including data about significant wave height, the highest third of the waves, the maximum wave height, up crossing wave period, the peak energy wave period, direction from which the peak period waves are coming from and approximation of sea surface temperature.
DATAFILE
(Coastal Data System — Waves (Mooloolaba) 01–2017 to 06–2019.csv )
All the measures were taken on a 30-minute intervals basis.
DATA CLEANING
The data cleaning was performed removing all the rows with negative values (e.g. -99.9) that probably represent a reading error in the measure equipment/hardware.
DATASET LINK
Q U E S T I O N S
1. The relation between the weather and the ocean surface temperature.
2. Identify areas where is difficult to navigate due to great and strong waves.
3. General oceanographic study, such as historical wave patterns and variations, including, direction, strengths, temperature, etc.
4. (possible) Considering this same data file, but in different locations, it is possible to identify the better place to install equipment to generate the energy using the power of the waves.
FILE DATA STRUCTURE
DATA VISUALIZATION (1)
DATA VISUALIZATION (2)
#4 — Blue Mountain (Collingwood) Weather
The downloaded file collingwood_1994_2019.csv contains daily data about the weather in the Blue Mountain — Collingwood area in the period of time between 1994–12–30 and 2019–06–21.
The file contains info about:
+25 years of data
+10,000 rows
12 columns of data
THEME
Weather conditions, including, a daily measure of Max Temperature (°C), Min Temperature (°C), Mean Temperature (°C), Total Precip (mm), Dir of Max Gust (10s deg), Dir of Max Gust Flag, Speed of Max Gust (km/h)
DATA FILE (collingwood_1994_2019.csv)
The file contains daily data about the weather in the Collingwood area — Blue Mountain.
DATA CLEANING
Some days didn’t have the complete information, but if you work with grouped data (e.g. using max and min) this impact could be minimized due to empty data.
DATASET LINK
QUESTIONS
0. Should I buy a ski pass next year? (Kaggle)
1. Perform a correlation analysis between the wind speed and the min temperature.
2. Identify the worst and best months (historically) — colder and warm wheatear.
3. Identify in which period of the year the wind is stronger.
4. General analysis of weather tendency (future) based on past behavior (historical).
FILE DATA STRUCTURE
DATA VISUALIZATION (1)
DATA INTERPRETATION
For sure, the most important part of the analysis of the graphs is the interpretation.
…
REFERENCES
> Dataset #1 — US Census Demographic Data
• https://www.kaggle.com/muonneutrino/us-census-demographic-data
> Dataset #2 — Forest Fires in Brazil
• https://www.kaggle.com/gustavomodelli/forest-fires-in-brazil
• Brazilian Government — http://dados.gov.br/dataset/sistema-nacional-de-informacoes-florestais-snif
• https://www.oeco.org.br/dicionario-ambiental/28783-o-que-e-a-amazonia-legal/
> Dataset #3 — Waves Measuring Buoys Data
• https://www.kaggle.com/jolasa/waves-measuring-buoys-data-mooloolaba
• Queensland Government Data — https://data.qld.gov.au/dataset
> Dataset #4 — Blue Mountain (Collingwood) Weather
• https://www.kaggle.com/metcalfepete/blue-mountain-collingwood-weather
Thank you! ;-)