Agile Data Science - Data Visualization

Data visualization plays a significant role in data science. We can consider data visualization as a module of data science. Data Science incorporates more than building predictive models. It incorporates clarification of models and utilizing them to understand data and make decisions. Data visualization is an essential part of presenting data in the most convincing way.

From the data science perspective, data visualization is a highlighting feature which shows the changes and trends.

Consider the following guidelines for effective data visualization −

  • Position data along regular scale.

  • Utilization of bars are more effective in comparison of circles and squares.

  • Proper color should be utilized for scatter plots.

  • Use pie chart to show proportions.

  • Sunburst visualization is more successful for hierarchical plots.

Agile needs a simple scripting language for data visualization and with data science in collaboration “Python” is the suggested language for data visualization.

Example 1

The following example demonstrates data visualization of GDP determined in specific years. “Matplotlib” is the best library for data visualization in Python. The installation of this library is shown below −


Consider the following code to understand this −

import matplotlib.pyplot as plt
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]

# create a line chart, years on x-axis, gdp on y-axis
plt.plot(years, gdp, color='green', marker='o', linestyle='solid')

# add a title plt.title("Nominal GDP")
# add a label to the y-axis
plt.ylabel("Billions of $")


The above code generates the following output −


There are many approaches to customize the charts with axis labels, line styles and point markers. Let’s focus on the next example which shows the better data visualization. These results can be utilized for better output.

Example 2

import datetime
import random
import matplotlib.pyplot as plt

# make up some data
x = [ + datetime.timedelta(hours=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]

# plot

# beautify the x-labels


The above code generates the following output −



Input your Topic Name and press Enter.