Introduction to NumPy

This lesson introduces NumPy, a fundamental library in Python for numerical computing. You will learn how to create, manipulate, and perform calculations on NumPy arrays, which are essential for data science tasks. By the end of this lesson, you will be able to perform basic array operations and understand the benefits of using NumPy.

Learning Objectives

  • Understand the purpose and importance of NumPy in data science.
  • Create NumPy arrays from lists and other data structures.
  • Perform basic array operations such as indexing, slicing, and arithmetic operations.
  • Describe the key differences between NumPy arrays and Python lists.

Text-to-Speech

Listen to the lesson content

Lesson Content

Introduction to NumPy

NumPy (Numerical Python) is a library that provides support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. It's the foundation for many other data science libraries in Python, such as Pandas and Scikit-learn. NumPy's efficiency stems from its ability to perform operations on entire arrays at once (vectorization), using optimized routines implemented in C. This leads to significantly faster computations compared to using Python lists for the same tasks.

To use NumPy, you first need to import it. The standard convention is to import it as np:

import numpy as np

Creating NumPy Arrays

You can create NumPy arrays from Python lists using the np.array() function:

import numpy as np

# Creating a 1-dimensional array
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)
print(my_array)
# Output: [1 2 3 4 5]

# Creating a 2-dimensional array (matrix)
my_list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
my_2d_array = np.array(my_list_of_lists)
print(my_2d_array)
# Output:
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]

NumPy also provides functions to create arrays with specific values, such as np.zeros(), np.ones(), and np.arange():

import numpy as np

# Create an array of zeros
zeros_array = np.zeros(5) # Creates an array with 5 zeros
print(zeros_array)
# Output: [0. 0. 0. 0. 0.]

# Create an array of ones
ones_array = np.ones((2,3)) # Creates a 2x3 array with ones
print(ones_array)
# Output:
# [[1. 1. 1.]
#  [1. 1. 1.]]

# Create an array with a sequence of numbers (similar to range())
range_array = np.arange(0, 10, 2) # Start, Stop, Step
print(range_array)
# Output: [0 2 4 6 8]

Array Indexing and Slicing

Accessing and modifying elements within a NumPy array is similar to how you would do it with Python lists, but with some key differences. You can use indexing to access individual elements and slicing to access subarrays.

import numpy as np

my_array = np.array([10, 20, 30, 40, 50])

# Indexing
print(my_array[0]) # Access the first element (index 0)
# Output: 10
print(my_array[2]) # Access the third element (index 2)
# Output: 30

# Slicing
print(my_array[1:4]) # Access elements from index 1 to 3 (excluding 4)
# Output: [20 30 40]
print(my_array[:3]) # Access elements from the beginning up to index 2 (excluding 3)
# Output: [10 20 30]
print(my_array[2:]) # Access elements from index 2 to the end
# Output: [30 40 50]

# Modifying elements
my_array[0] = 100
print(my_array)
# Output: [100  20  30  40  50]

#Indexing and Slicing a 2D Array
my_2d_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(my_2d_array[0, 1])  # Access element in the first row, second column
# Output: 2
print(my_2d_array[1:]) # Slicing from the second row
# Output:
# [[4 5 6]
#  [7 8 9]]

Array Arithmetic

NumPy allows you to perform arithmetic operations on entire arrays at once. This is known as vectorization. This is one of the most significant advantages of using NumPy over Python lists for numerical computations. Operations are applied element-wise.

import numpy as np

array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([6, 7, 8, 9, 10])

# Addition
sum_array = array1 + array2
print(sum_array)
# Output: [ 7  9 11 13 15]

# Subtraction
difference_array = array2 - array1
print(difference_array)
# Output: [5 5 5 5 5]

# Multiplication
product_array = array1 * 2
print(product_array)
# Output: [ 2  4  6  8 10]

# Division
division_array = array2 / array1
print(division_array)
# Output: [6.         3.5        2.66666667 2.25       2.        ]

Differences Between NumPy Arrays and Python Lists

While both NumPy arrays and Python lists can store collections of data, there are several key differences:

  • Data Type: NumPy arrays are designed to store data of the same data type (e.g., integers, floats), making them more memory-efficient and enabling faster computations. Python lists can hold elements of different data types.
  • Efficiency: NumPy uses optimized C implementations for its operations, making it much faster for numerical computations, especially on large datasets. Python lists are slower for mathematical operations.
  • Functionality: NumPy provides a vast library of mathematical functions specifically designed to operate on arrays, such as linear algebra, Fourier transforms, and random number generation. Python lists have limited mathematical functions.
  • Memory Usage: Due to their homogenous data type and optimized storage, NumPy arrays generally consume less memory than Python lists, especially when dealing with large datasets.
Progress
0%