Introduction to Pandas

In this lesson, you'll be introduced to Pandas, a powerful Python library used for data manipulation and analysis. We'll explore the fundamental building blocks of Pandas: DataFrames and Series, learning how to create, access, and manipulate them.

Learning Objectives

  • Understand the purpose and importance of the Pandas library in data science.
  • Learn to create Pandas Series and DataFrames.
  • Understand how to access and select data within DataFrames using various methods.
  • Become familiar with basic DataFrame operations, like viewing data and checking data types.

Text-to-Speech

Listen to the lesson content

Lesson Content

Introduction to Pandas

Pandas is a Python library built for data analysis. It provides flexible data structures designed to make working with labeled or relational data both intuitive and efficient. Think of it as a spreadsheet on steroids, allowing you to manipulate, clean, and analyze data quickly.

To use Pandas, you'll first need to import it. The common practice is to import Pandas with the alias pd:

import pandas as pd

Pandas Series

A Pandas Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floats, Python objects, etc.). It's like a column in a spreadsheet. You can create a Series from a list, a NumPy array, or even a dictionary.

Creating a Series:

import pandas as pd

# From a list
data = [10, 20, 30, 40, 50]
series1 = pd.Series(data)
print(series1)

# From a dictionary
data_dict = {'a': 10, 'b': 20, 'c': 30}
series2 = pd.Series(data_dict)
print(series2)

Accessing Series Elements:

You can access elements using their index, similar to lists.

print(series1[0]) # Accessing the element at index 0
print(series2['b']) # Accessing the element with label 'b'

Pandas DataFrames

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. You can think of it as a spreadsheet or a SQL table. It's the most commonly used Pandas object.

Creating a DataFrame:

import pandas as pd

# From a dictionary of lists
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 28],
        'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
print(df)

Accessing DataFrame Elements:

  • Accessing Columns: Use bracket notation ([]) with the column name.

    python print(df['Name'])

  • Accessing Rows: Use the loc or iloc attributes.

    • loc: Accesses rows by label.

      python print(df.loc[0]) # Accessing the row with index 0

    • iloc: Accesses rows by integer position.

      python print(df.iloc[1]) # Accessing the row at position 1

Basic DataFrame Operations

Pandas provides many functions to inspect and understand your data. Here are a few essential ones:

  • .head(): Displays the first few rows of the DataFrame (default is 5).

    python print(df.head())

  • .tail(): Displays the last few rows of the DataFrame (default is 5).

    python print(df.tail())

  • .info(): Provides a concise summary of the DataFrame, including data types and non-null values.

    python df.info()

  • .dtypes: Shows the data types of each column.

    python print(df.dtypes)

Progress
0%