Data Visualization with Matplotlib & Seaborn

This lesson introduces Matplotlib and Seaborn, two powerful Python libraries for creating informative and visually appealing data visualizations. You'll learn how to transform raw data into insightful charts and graphs, effectively communicating your findings and gaining a deeper understanding of your data.

Learning Objectives

  • Understand the purpose and benefits of data visualization.
  • Learn to install and import Matplotlib and Seaborn.
  • Create basic plots using Matplotlib and Seaborn (line plots, bar plots, scatter plots, histograms, box plots).
  • Customize plots with titles, labels, legends, and styling options.

Text-to-Speech

Listen to the lesson content

Lesson Content

Introduction to Data Visualization

Data visualization is the graphical representation of information and data. It uses visual elements like charts, graphs, and maps to help you see and understand patterns, trends, and outliers in data. This is crucial for exploratory data analysis (EDA), communicating results, and making informed decisions. By visualizing your data, you can tell compelling stories and uncover hidden insights that might be missed by simply looking at numbers.

Why Visualize?

  • Exploration: Quickly identify patterns, trends, and relationships in your data.
  • Communication: Clearly and concisely present findings to others.
  • Decision-Making: Support informed decisions by visualizing key data points.
  • Insight Generation: Discover unexpected insights and formulate hypotheses.

Why Matplotlib and Seaborn?

  • Matplotlib: The foundation. Provides the basic building blocks for creating plots.
  • Seaborn: Built on top of Matplotlib, offering a high-level interface and aesthetically pleasing default styles, making it easier to create complex and informative visualizations.

Installing and Importing Libraries

Before you can use Matplotlib and Seaborn, you need to install them. Open your terminal or command prompt and run the following commands:

pip install matplotlib
pip install seaborn

Once installed, import them into your Python script or Jupyter Notebook:

import matplotlib.pyplot as plt # Commonly abbreviated as plt
import seaborn as sns # Commonly abbreviated as sns
import pandas as pd # Import pandas (we'll use this for sample datasets)

# Check versions (optional)
print(f"Matplotlib version: {plt.__version__}")
print(f"Seaborn version: {sns.__version__}")

The plt and sns are standard abbreviations used for these libraries. Pandas is a data analysis library, often used to load and manipulate data before visualization.

Creating Basic Plots with Matplotlib

Let's create some basic plots using Matplotlib. We'll start with line plots, bar plots, and scatter plots.

Line Plots: Useful for showing trends over time or continuous data.

# Sample data (temperature over time)
time = [1, 2, 3, 4, 5]
temperature = [20, 22, 25, 23, 26]

plt.plot(time, temperature) # Create the line plot
plt.xlabel('Time (hours)') # Add x-axis label
plt.ylabel('Temperature (°C)') # Add y-axis label
plt.title('Temperature Over Time') # Add a title
plt.show() # Display the plot

Bar Plots: Useful for comparing categorical data.

# Sample data (sales by product)
products = ['A', 'B', 'C', 'D']
sales = [100, 150, 75, 120]

plt.bar(products, sales)
plt.xlabel('Product')
plt.ylabel('Sales')
plt.title('Sales by Product')
plt.show()

Scatter Plots: Useful for showing the relationship between two variables.

# Sample data (height vs. weight)
height = [160, 170, 165, 175, 180]
weight = [60, 70, 65, 75, 80]

plt.scatter(height, weight)
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.title('Height vs. Weight')
plt.show()

Customizing Plots with Matplotlib

Customize your plots to make them more informative and visually appealing. You can adjust titles, labels, legends, colors, markers, and more.

# Customize line plot
plt.plot(time, temperature, color='red', linestyle='--', marker='o') # Customize line appearance
plt.xlabel('Time (hours)')
plt.ylabel('Temperature (°C)')
plt.title('Temperature Over Time (Customized)')
plt.legend(['Temperature']) # Add a legend (key for the line)
plt.grid(True) # Add grid lines
plt.show()

# Customize bar plot
plt.bar(products, sales, color='skyblue') # Customize bar color
plt.xlabel('Product')
plt.ylabel('Sales')
plt.title('Sales by Product (Customized)')
plt.xticks(rotation=45) # Rotate x-axis labels for readability
plt.show()

# Customize scatter plot
plt.scatter(height, weight, color='green', marker='x') # Customize scatter appearance
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.title('Height vs. Weight (Customized)')
plt.show()

Introduction to Seaborn for Enhanced Visualizations

Seaborn builds on Matplotlib and provides a higher-level interface and attractive default styles. It also offers specific plots like histograms and box plots, ideal for exploring data distributions.

Histograms: Show the distribution of a single variable.

# Example using Seaborn with sample data
data = pd.DataFrame({'value': [10, 12, 15, 18, 20, 11, 13, 16, 19, 21]}) # Create sample data using Pandas
sns.histplot(data=data, x='value', bins=5) # Creates histogram with 5 bins
plt.title('Histogram of Values (Seaborn)')
plt.show()

Box Plots: Show the distribution of data, including quartiles and outliers.

sns.boxplot(data=data, x='value')
plt.title('Box Plot of Values (Seaborn)')
plt.show()

Seaborn also integrates well with Pandas DataFrames making it easy to visualize data directly from your datasets. Many plots also have options for color palettes and styling which are built in to Seaborn.

Using Sample Datasets (Pandas)

Pandas is a powerful library for data manipulation. Let's load a sample dataset (using a fake dataset but in a format typical of what you'll encounter). First we create the data. Then we create visualizations based on it.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample DataFrame
data = {'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
        'Population': [8419000, 3971000, 2746000, 2326000, 1680000],
        'Temperature': [25, 30, 20, 35, 40]}

df = pd.DataFrame(data)

# Bar plot of population
plt.figure(figsize=(10, 6)) # Adjust figure size
sns.barplot(x='City', y='Population', data=df, palette='viridis')
plt.title('Population by City')
plt.xlabel('City')
plt.ylabel('Population')
plt.xticks(rotation=45) # Rotate city names for readability
plt.show()

# Scatter plot of temperature vs population. Notice the added customization.
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Temperature', y='Population', data=df, color='orange')
plt.title('Temperature vs. Population')
plt.xlabel('Temperature (Celsius)')
plt.ylabel('Population')
plt.grid(True)
plt.show()

# Histogram of temperatures
plt.figure(figsize=(8, 6))
sns.histplot(df['Temperature'], bins=5, kde=True, color='skyblue') # kde=True adds a kernel density estimate line
plt.title('Temperature Distribution')
plt.xlabel('Temperature')
plt.ylabel('Frequency')
plt.show()

This example demonstrates how to use Seaborn's barplot and scatterplot functions, as well as customizing the plots and working with a simple Pandas DataFrame. Adjust the DataFrame and the plots to fit your needs.

Progress
0%