Mastering Charts with Matplotlib in Python

Data visualization plays a crucial role in understanding and interpreting complex datasets. Python, with its powerful libraries, provides an excellent environment for data analytics and visualization. One such library, Matplotlib, is a go-to tool for creating a wide variety of charts and plots. In this tutorial, we will explore the fundamentals of Matplotlib and learn how to create and customize different types of charts for effective data analysis.

Matplotlib is a versatile and widely-used plotting library in the Python ecosystem. Whether you’re a data scientist, analyst, or enthusiast, mastering Matplotlib empowers you to communicate insights visually. From simple line plots to sophisticated 3D visualizations, Matplotlib has you covered.

 

Exploring Line Plots with Matplotlib

Basic Line Plot:

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Basic Line Plot
plt.plot(x, y, label='Line 1')

# Customize the plot
plt.title('Basic Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()

# Show the plot
plt.show()

Multiple Lines on One Plot:

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 2, 1, 2, 1]

# Multiple Lines on One Plot
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')

# Customize the plot
plt.title('Multiple Lines on One Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()

# Show the plot
plt.show()

3. Line Style and Color:

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Line Style and Color
plt.plot(x, y, linestyle='--', color='red', marker='o', label='Dashed Line')

# Customize the plot
plt.title('Line Style and Color')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()

# Show the plot
plt.show()

types of label and marker in line chart

Line Styles:

SOLID LINE
plt.plot(x, y, linestyle='-', label='Solid Line')
Dashed Line:
plt.plot(x, y, linestyle='--', label='Dashed Line')
Dotted Line:
plt.plot(x, y, linestyle=':', label='Dotted Line')
Dash-Dot Line:
plt.plot(x, y, linestyle='-.', label='Dash-Dot Line')
Line Colors:
plt.plot(x, y, color='red', label='Red Line')
Markers Markers indicate specific data points on the line. You can use various markers:

Circle:

plt.plot(x, y, marker='o', label='Circle Marker')

Square:

plt.plot(x, y, marker='s', label='Square Marker')

Triangle Up

plt.plot(x, y, marker='^', label='Triangle Up Marker')

Bar Charts

Bar charts are an effective way to represent categorical data through rectangular bars. Each bar’s length or height corresponds to the data it represents. Below, I’ll provide you with examples of basic bar charts, stacked bar charts, and grouped bar charts using Matplotlib in Python.

Basic Bar Chart:

import matplotlib.pyplot as plt

# Sample data
categories = ['Category A', 'Category B', 'Category C', 'Category D']
values = [30, 50, 20, 40]

# Basic Bar Chart
plt.bar(categories, values, color='blue')

# Customize the plot
plt.title('Basic Bar Chart')
plt.xlabel('Categories')
plt.ylabel('Values')

# Show the plot
plt.show()

Stacked Bar Chart:

import matplotlib.pyplot as plt

# Sample data
categories = ['Category A', 'Category B', 'Category C', 'Category D']
values1 = [30, 50, 20, 40]
values2 = [10, 20, 30, 40]

# Stacked Bar Chart
plt.bar(categories, values1, color='blue', label='Group 1')
plt.bar(categories, values2, bottom=values1, color='orange', label='Group 2')

# Customize the plot
plt.title('Stacked Bar Chart')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.legend()

# Show the plot
plt.show()

Group bar chart

import numpy as np
import matplotlib.pyplot as plt

# Sample data
categories = ['Category A', 'Category B', 'Category C', 'Category D']
values1 = [30, 50, 20, 40]
values2 = [10, 20, 30, 40]

# Grouped Bar Chart
bar_width = 0.35  # Width of each bar
index = np.arange(len(categories))  # Generating an array of evenly spaced values representing the categories

# Creating the first set of bars (Group 1)
plt.bar(index, values1, width=bar_width, color='blue', label='Group 1')

# Creating the second set of bars (Group 2), shifted to the right by bar_width
plt.bar(index + bar_width, values2, width=bar_width, color='orange', label='Group 2')

# Customize the plot
plt.title('Grouped Bar Chart')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.xticks(index + bar_width / 2, categories)  # Setting the x-axis ticks at the center of each group
plt.legend()  # Displaying the legend

# Show the plot
plt.show()

Creating Grouped Bar Chart:

plt.bar(index, values1, width=bar_width, color=’blue’, label=’Group 1′): Creates the first set of bars (Group 1).
plt.bar(index + bar_width, values2, width=bar_width, color=’orange’, label=’Group 2′): Creates the second set of bars (Group 2), shifted to the right by bar_width

Scatter Plots: Unveiling Relationships

Certainly! Scatter plots are powerful tools for visualizing relationships between two variables. Below is an example of creating a scatter plot using Matplotlib in Python:
import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Scatter Plot
plt.scatter(x, y, color='red', marker='o', label='Data Points')

# Customize the plot
plt.title('Scatter Plot: Unveiling Relationships')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()

# Show the plot
plt.show()
Scatter plotter chart in python

Adding Text Annotations Scatter Plotter

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
data_labels = ['Point 1', 'Point 2', 'Point 3', 'Point 4', 'Point 5']

# Scatter Plot with Text Annotations
plt.scatter(x, y, color='purple', marker='o', label='Data Points')

# Adding text annotations
for i, label in enumerate(data_labels):
    plt.annotate(label, (x[i], y[i]), textcoords="offset points", xytext=(0,5), ha='center')

# Customize the plot
plt.title('Scatter Plot with Text Annotations')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()

# Show the plot
plt.show()

Histrogram

import matplotlib.pyplot as plt
import numpy as np

# Sample data
data = [2, 5, 7, 10, 5, 8, 3, 7, 6, 9, 11, 5, 7]

# Create histogram
plt.hist(data, bins=10, edgecolor='black', color='skyblue')

# Customize the plot
plt.title('Histogram Example')
plt.xlabel('Values')
plt.ylabel('Frequency')

# Show the plot
plt.show()
Multiple Histrogram  
import matplotlib.pyplot as plt
import numpy as np

# Sample data
data1 = np.random.randn(1000)
data2 = np.random.randn(1000) + 2  # Shift the second dataset

# Create multiple histograms
plt.hist(data1, bins=30, edgecolor='black', color='skyblue', alpha=0.7, label='Dataset 1')
plt.hist(data2, bins=30, edgecolor='black', color='orange', alpha=0.7, label='Dataset 2')

# Customize the plot
plt.title('Multiple Histograms Example')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.legend()

# Show the plot
plt.show()

Pie Chart

import matplotlib.pyplot as plt

# Sample data
labels = ['Category A', 'Category B', 'Category C', 'Category D']
sizes = [30, 20, 15, 35]

# Create a pie chart
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, colors=['skyblue', 'orange', 'lightgreen', 'lightcoral'])

# Customize the plot
plt.title('Pie Chart Example')

# Show the plot
plt.show()
  • plt.pie(sizes, labels=labels, autopct=’%1.1f%%’, startangle=90, colors=[‘skyblue’, ‘orange’, ‘lightgreen’, ‘lightcoral’]): This line creates a pie chart. sizes is a list of data values, labels is a list of category labels, autopct adds percentage labels, startangle rotates the pie chart, and colors specifies the color for each category.
  • plt.title(‘Pie Chart Example’): Adds a title to the plot.
  • plt.show(): Displays the pie chart.
pie chart for data analytics in dehradun

Blox plot

A box plot, also known as a box-and-whisker plot, is a standardized way of displaying the distribution of data based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. It can provide insights into the overall shape of the data, its central value, and variability. It’s particularly useful for highlighting outliers and for comparing distributions across different groups. Here’s how the components of a box plot are typically represented: Minimum: The smallest value in the dataset, excluding outliers. First Quartile (Q1): Also known as the lower quartile, it is the median of the lower half of the dataset. It marks the 25th percentile. Median: The middle value of the dataset when it is sorted in ascending order. It marks the 50th percentile. Third Quartile (Q3): Also known as the upper quartile, it is the median of the upper half of the dataset. It marks the 75th percentile. Maximum: The largest value in the dataset, excluding outliers. Outliers: These are data points that fall significantly outside the overall distribution of the dataset. They are typically indicated by small circles or stars outside the “whiskers.” The “box” part of the box plot represents the interquartile range (IQR), which is the distance between the first and third quartiles (Q3 – Q1). The “whiskers” are lines that extend from the box to the highest and lowest values, excluding outliers. Outliers are usually determined by any data point that is more than 1.5 times the IQR above the third quartile or below the first quartile. Box plots are widely used in statistical analysis for their simplicity and ability to convey complex information succinctly. They are helpful for comparing distributions between several groups or datasets and can be easily interpreted at a glance, making them a popular choice for exploratory data analysis.
import matplotlib.pyplot as plt

# Dataset
scores = [78, 82, 85, 90, 91, 93, 95, 97, 100, 102, 105, 108]

# Creating the box plot
plt.boxplot(scores)

plt.title(‘Test Scores Box Plot’)
plt.ylabel(‘Scores’)

# Display the plot
plt.show()