numpy

SKILL.md

NumPy - Numerical Python

The fundamental package for numerical computing in Python, providing multi-dimensional arrays and fast operations.

When to Use

  • Working with multi-dimensional arrays and matrices
  • Performing element-wise operations on arrays
  • Linear algebra computations (matrix multiplication, eigenvalues, SVD)
  • Random number generation and statistical distributions
  • Fourier transforms and signal processing basics
  • Mathematical operations (trigonometric, exponential, logarithmic)
  • Broadcasting operations across different array shapes
  • Vectorizing Python loops for performance
  • Reading and writing numerical data to files
  • Building numerical algorithms and simulations
  • Serving as foundation for pandas, scikit-learn, SciPy

Reference Documentation

Official docs: https://numpy.org/doc/
Search patterns: np.array, np.zeros, np.dot, np.linalg, np.random, np.broadcast

Core Principles

Use NumPy For

Task Function Example
Create arrays array, zeros, ones np.array([1, 2, 3])
Mathematical ops +, *, sin, exp np.sin(arr)
Linear algebra dot, linalg.inv np.dot(A, B)
Statistics mean, std, percentile np.mean(arr)
Random numbers random.rand, random.normal np.random.rand(10)
Indexing [], boolean, fancy arr[arr > 0]
Broadcasting Automatic arr + scalar
Reshaping reshape, flatten arr.reshape(2, 3)

Do NOT Use For

  • String manipulation (use built-in str or pandas)
  • Complex data structures (use pandas DataFrame)
  • Symbolic mathematics (use SymPy)
  • Deep learning (use PyTorch, TensorFlow)
  • Sparse matrices (use scipy.sparse)

Quick Reference

Installation

# pip
pip install numpy

# conda
conda install numpy

# Specific version
pip install numpy==1.26.0

Standard Imports

import numpy as np

# Common submodules
from numpy import linalg as la
from numpy import random as rand
from numpy import fft

# Never import *
# from numpy import *  # DON'T DO THIS!

Basic Pattern - Array Creation

import numpy as np

# From list
arr = np.array([1, 2, 3, 4, 5])

# Zeros and ones
zeros = np.zeros((3, 4))
ones = np.ones((2, 3))

# Range
range_arr = np.arange(0, 10, 2)  # [0, 2, 4, 6, 8]

# Linspace
linspace_arr = np.linspace(0, 1, 5)  # [0, 0.25, 0.5, 0.75, 1]

print(f"Array: {arr}")
print(f"Shape: {arr.shape}")
print(f"Dtype: {arr.dtype}")

Basic Pattern - Array Operations

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise operations
c = a + b      # [5, 7, 9]
d = a * b      # [4, 10, 18]
e = a ** 2     # [1, 4, 9]

# Mathematical functions
f = np.sin(a)
g = np.exp(a)

print(f"Sum: {c}")
print(f"Product: {d}")

Basic Pattern - Linear Algebra

import numpy as np

# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Dot product
C = np.dot(A, B)  # or A @ B

# Matrix inverse
A_inv = np.linalg.inv(A)

# Eigenvalues
eigenvalues, eigenvectors = np.linalg.eig(A)

print(f"Matrix product:\n{C}")
print(f"Eigenvalues: {eigenvalues}")

Critical Rules

✅ DO

  • Use vectorization - Avoid Python loops, use array operations
  • Specify dtype explicitly - For memory efficiency and precision control
  • Use views when possible - Avoid unnecessary copies
  • Broadcast properly - Understand broadcasting rules
  • Check array shapes - Use .shape frequently
  • Use axis parameter - For operations along specific dimensions
  • Pre-allocate arrays - Don't grow arrays in loops
  • Use appropriate dtypes - int32, float64, complex128, etc.
  • Copy when needed - Use .copy() for independent arrays
  • Use built-in functions - They're optimized in C

❌ DON'T

  • Loop over arrays - Use vectorization instead
  • Grow arrays dynamically - Pre-allocate instead
  • Use Python lists for math - Convert to arrays first
  • Ignore memory layout - C-contiguous vs Fortran-contiguous matters
  • Mix dtypes carelessly - Know implicit type promotion rules
  • Modify arrays during iteration - Can cause undefined behavior
  • Use == for array comparison - Use np.array_equal() or np.allclose()
  • Assume views vs copies - Check with .base attribute
  • Ignore NaN handling - Use np.nanmean(), np.nanstd(), etc.
  • Use outdated APIs - Check for deprecated functions

Anti-Patterns (NEVER)

import numpy as np

# ❌ BAD: Python loops
result = []
for i in range(len(arr)):
    result.append(arr[i] * 2)
result = np.array(result)

# ✅ GOOD: Vectorization
result = arr * 2

# ❌ BAD: Growing arrays
result = np.array([])
for i in range(1000):
    result = np.append(result, i)  # Very slow!

# ✅ GOOD: Pre-allocate
result = np.zeros(1000)
for i in range(1000):
    result[i] = i

# Even better: Use arange
result = np.arange(1000)

# ❌ BAD: Comparing arrays with ==
if arr1 == arr2:  # This is ambiguous!
    print("Equal")

# ✅ GOOD: Use appropriate comparison
if np.array_equal(arr1, arr2):
    print("Equal")

# Or for floating point
if np.allclose(arr1, arr2, rtol=1e-5):
    print("Close enough")

# ❌ BAD: Ignoring dtypes
arr = np.array([1, 2, 3])
arr[0] = 1.5  # Silently truncates to 1!

# ✅ GOOD: Explicit dtype
arr = np.array([1, 2, 3], dtype=float)
arr[0] = 1.5  # Now works correctly

# ❌ BAD: Unintentional modification
a = np.array([1, 2, 3])
b = a  # b is just a reference!
b[0] = 999  # Also modifies a!

# ✅ GOOD: Explicit copy
a = np.array([1, 2, 3])
b = a.copy()  # b is independent
b[0] = 999  # a is unchanged

Array Creation

Basic Array Creation

import numpy as np

# From Python list
arr1 = np.array([1, 2, 3, 4, 5])

# From nested list (2D)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

# Specify dtype
arr3 = np.array([1, 2, 3], dtype=np.float64)
arr4 = np.array([1, 2, 3], dtype=np.int32)

# From tuple
arr5 = np.array((1, 2, 3))

# Complex numbers
arr6 = np.array([1+2j, 3+4j])

print(f"1D array: {arr1}")
print(f"2D array:\n{arr2}")
print(f"Float array: {arr3}")

Special Array Creation

import numpy as np

# Zeros
zeros = np.zeros((3, 4))  # 3x4 array of zeros

# Ones
ones = np.ones((2, 3, 4))  # 2x3x4 array of ones

# Empty (uninitialized)
empty = np.empty((2, 2))  # Faster but values are garbage

# Full (constant value)
full = np.full((3, 3), 7)  # 3x3 array filled with 7

# Identity matrix
identity = np.eye(4)  # 4x4 identity matrix

# Diagonal matrix
diag = np.diag([1, 2, 3, 4])

print(f"Zeros shape: {zeros.shape}")
print(f"Identity:\n{identity}")

Range-Based Creation

import numpy as np

# Arange (like Python range)
a = np.arange(10)           # [0, 1, 2, ..., 9]
b = np.arange(2, 10)        # [2, 3, 4, ..., 9]
c = np.arange(0, 10, 2)     # [0, 2, 4, 6, 8]
d = np.arange(0, 1, 0.1)    # [0, 0.1, 0.2, ..., 0.9]

# Linspace (linearly spaced)
e = np.linspace(0, 1, 5)    # [0, 0.25, 0.5, 0.75, 1]
f = np.linspace(0, 10, 100) # 100 points from 0 to 10

# Logspace (logarithmically spaced)
g = np.logspace(0, 2, 5)    # [1, 10^0.5, 10, 10^1.5, 100]

# Geomspace (geometrically spaced)
h = np.geomspace(1, 1000, 4) # [1, 10, 100, 1000]

print(f"Arange: {a}")
print(f"Linspace: {e}")

Array Copies and Views

import numpy as np

original = np.array([1, 2, 3, 4, 5])

# View (shares memory)
view = original[:]
view[0] = 999  # Modifies original!

# Copy (independent)
copy = original.copy()
copy[0] = 777  # Doesn't affect original

# Check if array is a view
print(f"Is view? {view.base is original}")
print(f"Is copy? {copy.base is None}")

# Some operations create views, some create copies
slice_view = original[1:3]  # View
boolean_copy = original[original > 2]  # Copy!

Array Indexing and Slicing

Basic Indexing

import numpy as np

arr = np.array([10, 20, 30, 40, 50])

# Single element
print(arr[0])   # 10
print(arr[-1])  # 50 (last element)

# Slicing
print(arr[1:4])    # [20, 30, 40]
print(arr[:3])     # [10, 20, 30]
print(arr[2:])     # [30, 40, 50]
print(arr[::2])    # [10, 30, 50] (every 2nd element)

# Negative indices
print(arr[-3:-1])  # [30, 40]

# Reverse
print(arr[::-1])   # [50, 40, 30, 20, 10]

Multi-Dimensional Indexing

import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# Single element
print(arr[0, 0])   # 1
print(arr[1, 2])   # 6
print(arr[-1, -1]) # 9

# Row slicing
print(arr[0])      # [1, 2, 3] (first row)
print(arr[1, :])   # [4, 5, 6] (second row)

# Column slicing
print(arr[:, 0])   # [1, 4, 7] (first column)
print(arr[:, 1])   # [2, 5, 8] (second column)

# Sub-array
print(arr[0:2, 1:3])  # [[2, 3], [5, 6]]

# Every other element
print(arr[::2, ::2])  # [[1, 3], [7, 9]]

Boolean Indexing

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Boolean condition
mask = arr > 5
print(mask)  # [False, False, False, False, False, True, True, True, True, True]

# Boolean indexing
filtered = arr[arr > 5]
print(filtered)  # [6, 7, 8, 9, 10]

# Multiple conditions (use & and |, not 'and' and 'or')
result = arr[(arr > 3) & (arr < 8)]
print(result)  # [4, 5, 6, 7]

# Or condition
result = arr[(arr < 3) | (arr > 8)]
print(result)  # [1, 2, 9, 10]

# Negation
result = arr[~(arr > 5)]
print(result)  # [1, 2, 3, 4, 5]

Fancy Indexing

import numpy as np

arr = np.array([10, 20, 30, 40, 50])

# Index with array of integers
indices = np.array([0, 2, 4])
result = arr[indices]
print(result)  # [10, 30, 50]

# 2D fancy indexing
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

rows = np.array([0, 2])
cols = np.array([1, 2])
result = arr2d[rows, cols]  # Elements at (0,1) and (2,2)
print(result)  # [2, 9]

# Combining boolean and fancy indexing
mask = arr > 25
indices_of_large = np.where(mask)[0]
print(indices_of_large)  # [2, 3, 4]

Array Operations

Element-wise Operations

import numpy as np

a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

# Arithmetic operations
print(a + b)    # [6, 8, 10, 12]
print(a - b)    # [-4, -4, -4, -4]
print(a * b)    # [5, 12, 21, 32]
print(a / b)    # [0.2, 0.333..., 0.428..., 0.5]
print(a ** 2)   # [1, 4, 9, 16]
print(a // b)   # [0, 0, 0, 0] (floor division)
print(a % b)    # [1, 2, 3, 4] (modulo)

# With scalars
print(a + 10)   # [11, 12, 13, 14]
print(a * 2)    # [2, 4, 6, 8]

Mathematical Functions

import numpy as np

x = np.array([0, np.pi/6, np.pi/4, np.pi/3, np.pi/2])

# Trigonometric
sin_x = np.sin(x)
cos_x = np.cos(x)
tan_x = np.tan(x)

# Inverse trig
arcsin_x = np.arcsin([0, 0.5, 1])

# Exponential and logarithm
arr = np.array([1, 2, 3, 4])
exp_arr = np.exp(arr)
log_arr = np.log(arr)
log10_arr = np.log10(arr)

# Rounding
floats = np.array([1.2, 2.7, 3.5, 4.9])
print(np.round(floats))   # [1, 3, 4, 5]
print(np.floor(floats))   # [1, 2, 3, 4]
print(np.ceil(floats))    # [2, 3, 4, 5]

# Absolute value
print(np.abs([-1, -2, 3, -4]))  # [1, 2, 3, 4]

Aggregation Functions

import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# Sum
print(np.sum(arr))         # 45 (all elements)
print(np.sum(arr, axis=0)) # [12, 15, 18] (column sums)
print(np.sum(arr, axis=1)) # [6, 15, 24] (row sums)

# Mean
print(np.mean(arr))        # 5.0

# Standard deviation
print(np.std(arr))         # ~2.58

# Min and max
print(np.min(arr))         # 1
print(np.max(arr))         # 9
print(np.argmin(arr))      # 0 (index of min)
print(np.argmax(arr))      # 8 (index of max)

# Median and percentiles
print(np.median(arr))      # 5.0
print(np.percentile(arr, 25))  # 3.0 (25th percentile)

Broadcasting

Broadcasting Rules

import numpy as np

# Scalar and array
arr = np.array([1, 2, 3, 4])
result = arr + 10  # Broadcast scalar to array shape
print(result)  # [11, 12, 13, 14]

# 1D and 2D
arr1d = np.array([1, 2, 3])
arr2d = np.array([[10], [20], [30]])

result = arr1d + arr2d
print(result)
# [[11, 12, 13],
#  [21, 22, 23],
#  [31, 32, 33]]

# Broadcasting example: standardization
data = np.random.randn(100, 3)  # 100 samples, 3 features
mean = np.mean(data, axis=0)    # Mean of each column
std = np.std(data, axis=0)      # Std of each column
standardized = (data - mean) / std  # Broadcasting!

Explicit Broadcasting

import numpy as np

# Using broadcast_to
arr = np.array([1, 2, 3])
broadcasted = np.broadcast_to(arr, (4, 3))
print(broadcasted)
# [[1, 2, 3],
#  [1, 2, 3],
#  [1, 2, 3],
#  [1, 2, 3]]

# Using newaxis
arr1d = np.array([1, 2, 3])
col_vector = arr1d[:, np.newaxis]  # Shape (3, 1)
row_vector = arr1d[np.newaxis, :]  # Shape (1, 3)

# Outer product using broadcasting
outer = col_vector * row_vector
print(outer)
# [[1, 2, 3],
#  [2, 4, 6],
#  [3, 6, 9]]

Linear Algebra

Matrix Operations

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix multiplication
C = np.dot(A, B)  # Traditional
C = A @ B         # Modern syntax (Python 3.5+)

# Element-wise multiplication
D = A * B  # Not matrix multiplication!

# Matrix transpose
A_T = A.T

# Trace (sum of diagonal)
trace = np.trace(A)

# Matrix power
A_squared = np.linalg.matrix_power(A, 2)

print(f"Matrix product:\n{C}")
print(f"Transpose:\n{A_T}")
print(f"Trace: {trace}")

Solving Linear Systems

import numpy as np

# Solve Ax = b
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])

# Solve for x
x = np.linalg.solve(A, b)
print(f"Solution: {x}")  # [2, 3]

# Verify solution
print(f"Verification: {np.allclose(A @ x, b)}")  # True

# Matrix inverse
A_inv = np.linalg.inv(A)
print(f"Inverse:\n{A_inv}")

# Determinant
det = np.linalg.det(A)
print(f"Determinant: {det}")

Eigenvalues and Eigenvectors

import numpy as np

# Square matrix
A = np.array([[1, 2], [2, 1]])

# Eigenvalue decomposition
eigenvalues, eigenvectors = np.linalg.eig(A)

print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")

# Verify: A * v = λ * v
for i in range(len(eigenvalues)):
    lam = eigenvalues[i]
    v = eigenvectors[:, i]
    
    left = A @ v
    right = lam * v
    
    print(f"Eigenvalue {i}: {np.allclose(left, right)}")

Singular Value Decomposition (SVD)

import numpy as np

# Any matrix
A = np.array([[1, 2, 3],
              [4, 5, 6]])

# SVD: A = U @ S @ Vt
U, s, Vt = np.linalg.svd(A)

# Reconstruct original matrix
S = np.zeros((2, 3))
S[:2, :2] = np.diag(s)
A_reconstructed = U @ S @ Vt

print(f"Original:\n{A}")
print(f"Reconstructed:\n{A_reconstructed}")
print(f"Close? {np.allclose(A, A_reconstructed)}")

# Singular values
print(f"Singular values: {s}")

Matrix Norms

import numpy as np

A = np.array([[1, 2], [3, 4]])

# Frobenius norm (default)
norm_fro = np.linalg.norm(A)

# 1-norm (max column sum)
norm_1 = np.linalg.norm(A, ord=1)

# Infinity norm (max row sum)
norm_inf = np.linalg.norm(A, ord=np.inf)

# 2-norm (spectral norm)
norm_2 = np.linalg.norm(A, ord=2)

print(f"Frobenius: {norm_fro:.4f}")
print(f"1-norm: {norm_1:.4f}")
print(f"2-norm: {norm_2:.4f}")
print(f"inf-norm: {norm_inf:.4f}")

Random Number Generation

Basic Random Generation

import numpy as np

# Set seed for reproducibility
np.random.seed(42)

# Random floats [0, 1)
rand_uniform = np.random.rand(5)  # 1D array of 5 elements
rand_2d = np.random.rand(3, 4)    # 3x4 array

# Random integers
rand_int = np.random.randint(0, 10, size=5)  # [0, 10)
rand_int_2d = np.random.randint(0, 100, size=(3, 3))

# Random normal distribution
rand_normal = np.random.randn(1000)  # Mean=0, std=1
rand_normal_custom = np.random.normal(loc=5, scale=2, size=1000)

# Random choice
choices = np.random.choice(['a', 'b', 'c'], size=10)
weighted_choices = np.random.choice([1, 2, 3], size=100, p=[0.1, 0.3, 0.6])

Statistical Distributions

import numpy as np

# Uniform distribution [low, high)
uniform = np.random.uniform(low=0, high=10, size=1000)

# Normal (Gaussian) distribution
normal = np.random.normal(loc=0, scale=1, size=1000)

# Exponential distribution
exponential = np.random.exponential(scale=2, size=1000)

# Binomial distribution
binomial = np.random.binomial(n=10, p=0.5, size=1000)

# Poisson distribution
poisson = np.random.poisson(lam=3, size=1000)

# Beta distribution
beta = np.random.beta(a=2, b=5, size=1000)

# Chi-squared distribution
chisq = np.random.chisquare(df=2, size=1000)

Modern Random Generator (numpy.random.Generator)

import numpy as np

# Create generator
rng = np.random.default_rng(seed=42)

# Generate random numbers
rand = rng.random(size=10)
ints = rng.integers(low=0, high=100, size=10)
normal = rng.normal(loc=0, scale=1, size=10)

# Shuffle array in-place
arr = np.arange(10)
rng.shuffle(arr)

# Sample without replacement
sample = rng.choice(100, size=10, replace=False)

print(f"Random: {rand}")
print(f"Shuffled: {arr}")

Reshaping and Manipulation

Reshaping Arrays

import numpy as np

# Original array
arr = np.arange(12)  # [0, 1, 2, ..., 11]

# Reshape
arr_2d = arr.reshape(3, 4)
arr_3d = arr.reshape(2, 2, 3)

# Automatic dimension calculation with -1
arr_auto = arr.reshape(3, -1)  # Automatically calculates 4

# Flatten to 1D
flat = arr_2d.flatten()  # Returns copy
flat = arr_2d.ravel()    # Returns view if possible

# Transpose
arr_t = arr_2d.T

print(f"Original shape: {arr.shape}")
print(f"2D shape: {arr_2d.shape}")
print(f"3D shape: {arr_3d.shape}")

Stacking and Splitting

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.array([7, 8, 9])

# Vertical stacking (vstack)
vstacked = np.vstack([a, b, c])
print(vstacked)
# [[1, 2, 3],
#  [4, 5, 6],
#  [7, 8, 9]]

# Horizontal stacking (hstack)
hstacked = np.hstack([a, b, c])
print(hstacked)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Column stack
col_stacked = np.column_stack([a, b, c])

# Concatenate (more general)
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
concat_axis0 = np.concatenate([arr1, arr2], axis=0)
concat_axis1 = np.concatenate([arr1, arr2], axis=1)

# Splitting
arr = np.arange(12)
split = np.split(arr, 3)  # Split into 3 equal parts
print(split)  # [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([8, 9, 10, 11])]

File I/O

Text Files

import numpy as np

# Save to text file
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])

np.savetxt('data.txt', data)
np.savetxt('data.csv', data, delimiter=',')
np.savetxt('data_formatted.txt', data, fmt='%.2f')

# Load from text file
loaded = np.loadtxt('data.txt')
loaded_csv = np.loadtxt('data.csv', delimiter=',')

# Skip header rows
loaded_skip = np.loadtxt('data.txt', skiprows=1)

# Load specific columns
loaded_cols = np.loadtxt('data.csv', delimiter=',', usecols=(0, 2))

Binary Files (.npy, .npz)

import numpy as np

# Save single array
arr = np.random.rand(100, 100)
np.save('array.npy', arr)

# Load single array
loaded = np.load('array.npy')

# Save multiple arrays (compressed)
arr1 = np.random.rand(10, 10)
arr2 = np.random.rand(20, 20)
np.savez('arrays.npz', first=arr1, second=arr2)

# Load multiple arrays
loaded = np.load('arrays.npz')
loaded_arr1 = loaded['first']
loaded_arr2 = loaded['second']

# Compressed save
np.savez_compressed('arrays_compressed.npz', arr1=arr1, arr2=arr2)

Advanced Techniques

Universal Functions (ufuncs)

import numpy as np

# Ufuncs operate element-wise
arr = np.array([1, 2, 3, 4, 5])

# Built-in ufuncs
result = np.sqrt(arr)
result = np.exp(arr)
result = np.log(arr)

# Custom ufunc
def my_func(x):
    return x**2 + 2*x + 1

vectorized = np.vectorize(my_func)
result = vectorized(arr)

# More efficient: define true ufunc
@np.vectorize
def better_func(x):
    return x**2 + 2*x + 1

Structured Arrays

import numpy as np

# Define dtype
dt = np.dtype([('name', 'U20'), ('age', 'i4'), ('weight', 'f8')])

# Create structured array
data = np.array([
    ('Alice', 25, 55.5),
    ('Bob', 30, 70.2),
    ('Charlie', 35, 82.1)
], dtype=dt)

# Access by field name
names = data['name']
ages = data['age']

# Sort by field
sorted_data = np.sort(data, order='age')

print(f"Names: {names}")
print(f"Sorted by age:\n{sorted_data}")

Memory Layout and Performance

import numpy as np

# C-contiguous (row-major, default)
arr_c = np.array([[1, 2, 3], [4, 5, 6]], order='C')

# Fortran-contiguous (column-major)
arr_f = np.array([[1, 2, 3], [4, 5, 6]], order='F')

# Check memory layout
print(f"C-contiguous? {arr_c.flags['C_CONTIGUOUS']}")
print(f"F-contiguous? {arr_c.flags['F_CONTIGUOUS']}")

# Make contiguous
arr_made_c = np.ascontiguousarray(arr_f)
arr_made_f = np.asfortranarray(arr_c)

# Memory usage
print(f"Memory (bytes): {arr_c.nbytes}")
print(f"Item size: {arr_c.itemsize}")

Advanced Indexing with ix_

import numpy as np

arr = np.arange(20).reshape(4, 5)

# Select specific rows and columns
rows = np.array([0, 2])
cols = np.array([1, 3, 4])

# ix_ creates open mesh
result = arr[np.ix_(rows, cols)]
print(result)
# [[1, 3, 4],
#  [11, 13, 14]]

# Equivalent to
# result = arr[[0, 2]][:, [1, 3, 4]]

Practical Workflows

Statistical Analysis

import numpy as np

# Generate sample data
np.random.seed(42)
data = np.random.normal(loc=100, scale=15, size=1000)

# Descriptive statistics
mean = np.mean(data)
median = np.median(data)
std = np.std(data)
var = np.var(data)

# Percentiles
q25, q50, q75 = np.percentile(data, [25, 50, 75])

# Histogram
counts, bins = np.histogram(data, bins=20)

# Correlation coefficient
data2 = data + np.random.normal(0, 5, size=1000)
corr = np.corrcoef(data, data2)[0, 1]

print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Std: {std:.2f}")
print(f"IQR: [{q25:.2f}, {q75:.2f}]")
print(f"Correlation: {corr:.3f}")

Monte Carlo Simulation

import numpy as np

def estimate_pi(n_samples=1000000):
    """Estimate π using Monte Carlo method."""
    # Random points in [0, 1] x [0, 1]
    x = np.random.rand(n_samples)
    y = np.random.rand(n_samples)
    
    # Check if inside quarter circle
    inside = (x**2 + y**2) <= 1
    
    # Estimate π
    pi_estimate = 4 * np.sum(inside) / n_samples
    
    return pi_estimate

# Estimate π
pi_est = estimate_pi(10000000)
print(f"π estimate: {pi_est:.6f}")
print(f"Error: {abs(pi_est - np.pi):.6f}")

Polynomial Fitting

import numpy as np

# Generate noisy data
x = np.linspace(0, 10, 50)
y_true = 2*x**2 + 3*x + 1
y_noisy = y_true + np.random.normal(0, 10, size=50)

# Fit polynomial (degree 2)
coeffs = np.polyfit(x, y_noisy, deg=2)
print(f"Coefficients: {coeffs}")  # Should be close to [2, 3, 1]

# Predict
y_pred = np.polyval(coeffs, x)

# Evaluate fit quality
residuals = y_noisy - y_pred
rmse = np.sqrt(np.mean(residuals**2))
print(f"RMSE: {rmse:.2f}")

# Create polynomial object
poly = np.poly1d(coeffs)
print(f"Polynomial: {poly}")

Image Processing Basics

import numpy as np

# Create synthetic image (grayscale)
image = np.random.rand(100, 100)

# Apply transformations
# Rotate 90 degrees
rotated = np.rot90(image)

# Flip vertically
flipped_v = np.flipud(image)

# Flip horizontally
flipped_h = np.fliplr(image)

# Transpose
transposed = image.T

# Normalize to [0, 255]
normalized = ((image - image.min()) / (image.max() - image.min()) * 255).astype(np.uint8)

print(f"Original shape: {image.shape}")
print(f"Value range: [{image.min():.2f}, {image.max():.2f}]")

Distance Matrices

import numpy as np

# Points in 2D
points = np.random.rand(100, 2)

# Pairwise distances (broadcasting)
diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
distances = np.sqrt(np.sum(diff**2, axis=2))

print(f"Distance matrix shape: {distances.shape}")
print(f"Max distance: {distances.max():.4f}")

# Find nearest neighbors
for i in range(5):
    # Exclude self (distance = 0)
    dists = distances[i].copy()
    dists[i] = np.inf
    nearest = np.argmin(dists)
    print(f"Point {i} nearest to point {nearest}, distance: {distances[i, nearest]:.4f}")

Sliding Window Operations

import numpy as np

def sliding_window_view(arr, window_size):
    """Create sliding window views of array."""
    shape = (arr.shape[0] - window_size + 1, window_size)
    strides = (arr.strides[0], arr.strides[0])
    return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)

# Time series data
data = np.random.rand(100)

# Create sliding windows
windows = sliding_window_view(data, window_size=10)

# Compute statistics for each window
window_means = np.mean(windows, axis=1)
window_stds = np.std(windows, axis=1)

print(f"Number of windows: {len(windows)}")
print(f"First window mean: {window_means[0]:.4f}")

Performance Optimization

Vectorization Examples

import numpy as np
import time

# Bad: Python loop
def sum_python_loop(arr):
    total = 0
    for x in arr:
        total += x**2
    return total

# Good: Vectorized
def sum_vectorized(arr):
    return np.sum(arr**2)

# Benchmark
arr = np.random.rand(1000000)

start = time.time()
result1 = sum_python_loop(arr)
time_loop = time.time() - start

start = time.time()
result2 = sum_vectorized(arr)
time_vec = time.time() - start

print(f"Loop time: {time_loop:.4f}s")
print(f"Vectorized time: {time_vec:.4f}s")
print(f"Speedup: {time_loop/time_vec:.1f}x")

Memory-Efficient Operations

import numpy as np

# Bad: Creates intermediate arrays
def inefficient(arr):
    temp1 = arr * 2
    temp2 = temp1 + 5
    temp3 = temp2 ** 2
    return temp3

# Good: In-place operations
def efficient(arr):
    result = arr.copy()
    result *= 2
    result += 5
    result **= 2
    return result

# Even better: Single expression (optimized by NumPy)
def most_efficient(arr):
    return (arr * 2 + 5) ** 2

Using numexpr for Complex Expressions

import numpy as np

# For very large arrays and complex expressions,
# numexpr can be faster (requires installation)

# Without numexpr
a = np.random.rand(10000000)
b = np.random.rand(10000000)
result = 2*a + 3*b**2 - np.sqrt(a)

# With numexpr (if installed)
# import numexpr as ne
# result = ne.evaluate('2*a + 3*b**2 - sqrt(a)')

Common Pitfalls and Solutions

NaN Handling

import numpy as np

arr = np.array([1, 2, np.nan, 4, 5, np.nan])

# Problem: Regular functions return NaN
mean = np.mean(arr)  # Returns nan

# Solution: Use nan-safe functions
mean = np.nanmean(arr)  # Returns 3.0
std = np.nanstd(arr)
sum_val = np.nansum(arr)

# Check for NaN
has_nan = np.isnan(arr).any()
where_nan = np.where(np.isnan(arr))[0]

# Remove NaN
arr_clean = arr[~np.isnan(arr)]

print(f"Mean (nan-safe): {mean}")
print(f"NaN positions: {where_nan}")

Integer Division Pitfall

import numpy as np

# Problem: Integer division with integers
a = np.array([1, 2, 3])
b = np.array([2, 2, 2])
result = a / b  # With Python 3, this is fine

# But be careful with older code or explicit int types
a_int = np.array([1, 2, 3], dtype=np.int32)
b_int = np.array([2, 2, 2], dtype=np.int32)

# In NumPy, / always gives float result
result_float = a_int / b_int  # [0.5, 1, 1.5]

# Use // for integer division
result_int = a_int // b_int  # [0, 1, 1]

print(f"Float division: {result_float}")
print(f"Integer division: {result_int}")

Array Equality

import numpy as np

a = np.array([1.0, 2.0, 3.0])
b = np.array([1.0, 2.0, 3.0])

# Problem: Can't use == directly for array comparison
# if a == b:  # ValueError!

# Solution 1: Element-wise comparison
equal_elements = a == b  # Boolean array

# Solution 2: Check if all elements equal
all_equal = np.all(a == b)

# Solution 3: array_equal
array_equal = np.array_equal(a, b)

# Solution 4: For floating point, use allclose
c = a + 1e-10
close_enough = np.allclose(a, c, rtol=1e-5, atol=1e-8)

print(f"All equal: {all_equal}")
print(f"Arrays equal: {array_equal}")
print(f"Close enough: {close_enough}")

Memory Leaks with Views

import numpy as np

# Problem: Large array kept in memory
large_array = np.random.rand(1000000, 100)
small_view = large_array[0:10]  # Just 10 rows

# large_array is kept in memory because small_view references it!
del large_array  # Doesn't free memory!

# Solution: Make a copy
large_array = np.random.rand(1000000, 100)
small_copy = large_array[0:10].copy()
del large_array  # Now memory is freed

# Check if it's a view
print(f"Is view? {small_view.base is not None}")
print(f"Is copy? {small_copy.base is None}")

This comprehensive NumPy guide covers 50+ examples across all major array operations and numerical computing workflows!

Weekly Installs
10
GitHub Stars
6
First Seen
Feb 8, 2026
Installed on
codex9
opencode8
github-copilot8
kimi-cli8
gemini-cli8
amp8