**Introduction to Pandas: DataFrames

This lesson introduces Pandas, a powerful Python library for data manipulation and analysis. You'll learn how to create and manipulate DataFrames, the core data structure in Pandas, to organize and work with data efficiently.

Learning Objectives

  • Understand the purpose and importance of the Pandas library in data science.
  • Create Pandas DataFrames from various data sources (lists, dictionaries, CSV files).
  • Access and modify data within a DataFrame using different methods (e.g., column selection, indexing).
  • Describe basic DataFrame operations such as viewing data (head, tail, describe) and checking data types.

Text-to-Speech

Listen to the lesson content

Lesson Content

Introduction to Pandas

Pandas is a fundamental Python library for data analysis. It provides flexible data structures and tools designed to make working with structured data fast and easy. Think of it as a spreadsheet on steroids, but programmable! It excels at tasks like cleaning, transforming, and analyzing data.

To use Pandas, you first need to import it: import pandas as pd. The as pd part is a common convention and allows you to refer to Pandas functions as pd.function_name().

Creating DataFrames

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. You can think of it like a table or a spreadsheet. There are several ways to create DataFrames.

From Lists:

import pandas as pd
data = [['Alice', 25, 'New York'], ['Bob', 30, 'London'], ['Charlie', 28, 'Paris']]
columns = ['Name', 'Age', 'City']
df = pd.DataFrame(data, columns=columns)
print(df)

From Dictionaries:

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 28], 'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
print(df)

From CSV files:

Pandas can read data directly from CSV files. First, ensure you have a CSV file (e.g., 'data.csv') in your working directory. Then:

import pandas as pd
df = pd.read_csv('data.csv') # replace with your csv file name
print(df)

Accessing DataFrame Data

Once you have a DataFrame, you'll need to know how to access its data.

  • Selecting a Column: Use bracket notation with the column name. df['Name'] will give you a Pandas Series containing all the names.
  • Selecting Multiple Columns: Use a list of column names: df[['Name', 'Age']]
  • Selecting Rows: Use .iloc for integer-based indexing (e.g., df.iloc[0] for the first row, df.iloc[0:2] for the first two rows), and .loc for label-based indexing using row labels (usually the index).
  • Accessing a Specific Value: Combine column and row selection. For example, df['Name'][0] retrieves the first value in the 'Name' column.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 28], 'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
print("Name column:", df['Name'])
print("First row:", df.iloc[0])
print("Bob's age:", df['Age'][1])

Basic DataFrame Operations

Pandas offers several helpful methods for quickly examining your data:

  • .head(): Displays the first few rows (default: 5). df.head()
  • .tail(): Displays the last few rows (default: 5). df.tail()
  • .describe(): Generates descriptive statistics (count, mean, std, min, max, quartiles) for numeric columns. df.describe()
  • .info(): Provides a concise summary of the DataFrame, including the data type of each column and the number of non-null values. df.info()
  • .dtypes: Displays the data types of each column in the DataFrame. df.dtypes
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 28, 22], 'City': ['New York', 'London', 'Paris', 'Tokyo']}
df = pd.DataFrame(data)
print("Head:", df.head(2))
print("Descriptive Statistics:", df.describe())
print("Data Types:", df.dtypes)
Progress
0%