Simple Data Analysis with NumPy

This lesson introduces NumPy, a powerful Python library for numerical computing and data analysis. You'll learn how to create and manipulate NumPy arrays, perform basic statistical calculations, and understand fundamental data analysis techniques using this library.

Learning Objectives

  • Create NumPy arrays from different data types.
  • Perform basic arithmetic operations on NumPy arrays.
  • Calculate descriptive statistics (mean, median, standard deviation) using NumPy.
  • Understand the concept of array indexing and slicing.

Text-to-Speech

Listen to the lesson content

Lesson Content

Introduction to NumPy

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides a high-performance multidimensional array object (called 'ndarray') and tools for working with these arrays. It's significantly faster and more memory-efficient than using Python lists for numerical operations, making it essential for data science.

To use NumPy, you first need to import it:

import numpy as np

This line imports the NumPy library and gives it the alias np, which is the standard convention.

Creating NumPy Arrays

You can create NumPy arrays in several ways:

  • From Lists:
    ```python
    import numpy as np

    my_list = [1, 2, 3, 4, 5]
    my_array = np.array(my_list)
    print(my_array)
    print(type(my_array)) # Output:
    ```

  • Using np.zeros(), np.ones(), and np.empty():
    ```python
    import numpy as np

    zeros_array = np.zeros(5) # Creates an array of 5 zeros
    ones_array = np.ones(3) # Creates an array of 3 ones
    empty_array = np.empty(4) # Creates an uninitialized array (values may vary)

    print(zeros_array)
    print(ones_array)
    print(empty_array)
    ```

  • Using np.arange():
    ```python
    import numpy as np

    arange_array = np.arange(0, 10, 2) # Creates an array from 0 to 10 (exclusive), with a step of 2
    print(arange_array)
    ```

Array Operations

NumPy allows you to perform operations on entire arrays at once (vectorized operations), which is much faster than looping through lists.

  • Arithmetic Operations:
    ```python
    import numpy as np

    array1 = np.array([1, 2, 3])
    array2 = np.array([4, 5, 6])

    sum_array = array1 + array2 # Element-wise addition
    product_array = array1 * array2 # Element-wise multiplication
    print(sum_array)
    print(product_array)
    ```

  • Broadcasting: NumPy can perform operations on arrays of different shapes under certain conditions (broadcasting). For example, adding a scalar to an array:
    ```python
    import numpy as np

    array = np.array([1, 2, 3])
    added_array = array + 5 # Adds 5 to each element
    print(added_array)
    ```

Descriptive Statistics with NumPy

NumPy provides functions for calculating descriptive statistics:

  • np.mean(): Calculates the average.
  • np.median(): Calculates the middle value (when sorted).
  • np.std(): Calculates the standard deviation (measure of spread).
import numpy as np

data = np.array([1, 2, 3, 4, 5])

mean_value = np.mean(data)
median_value = np.median(data)
std_value = np.std(data)

print(f"Mean: {mean_value}")
print(f"Median: {median_value}")
print(f"Standard Deviation: {std_value}")

Array Indexing and Slicing

Accessing elements in NumPy arrays is similar to lists, using indexing and slicing.

  • Indexing: Accessing a single element.
    ```python
    import numpy as np

    array = np.array([10, 20, 30, 40, 50])
    first_element = array[0] # Accesses the first element (index 0)
    print(first_element)
    ```

  • Slicing: Accessing a range of elements.
    ```python
    import numpy as np

    array = np.array([10, 20, 30, 40, 50])
    slice_array = array[1:4] # Elements from index 1 to 3 (exclusive of 4)
    print(slice_array)
    ```

Progress
0%