numpy
NumPy - Numerical Python
The fundamental package for numerical computing in Python, providing multi-dimensional arrays and fast operations.
When to Use
- Working with multi-dimensional arrays and matrices
- Performing element-wise operations on arrays
- Linear algebra computations (matrix multiplication, eigenvalues, SVD)
- Random number generation and statistical distributions
- Fourier transforms and signal processing basics
- Mathematical operations (trigonometric, exponential, logarithmic)
- Broadcasting operations across different array shapes
- Vectorizing Python loops for performance
- Reading and writing numerical data to files
- Building numerical algorithms and simulations
- Serving as foundation for pandas, scikit-learn, SciPy
Reference Documentation
Official docs: https://numpy.org/doc/
Search patterns: np.array, np.zeros, np.dot, np.linalg, np.random, np.broadcast
Core Principles
Use NumPy For
| Task | Function | Example |
|---|---|---|
| Create arrays | array, zeros, ones |
np.array([1, 2, 3]) |
| Mathematical ops | +, *, sin, exp |
np.sin(arr) |
| Linear algebra | dot, linalg.inv |
np.dot(A, B) |
| Statistics | mean, std, percentile |
np.mean(arr) |
| Random numbers | random.rand, random.normal |
np.random.rand(10) |
| Indexing | [], boolean, fancy |
arr[arr > 0] |
| Broadcasting | Automatic | arr + scalar |
| Reshaping | reshape, flatten |
arr.reshape(2, 3) |
Do NOT Use For
- String manipulation (use built-in str or pandas)
- Complex data structures (use pandas DataFrame)
- Symbolic mathematics (use SymPy)
- Deep learning (use PyTorch, TensorFlow)
- Sparse matrices (use scipy.sparse)
Quick Reference
Installation
# pip
pip install numpy
# conda
conda install numpy
# Specific version
pip install numpy==1.26.0
Standard Imports
import numpy as np
# Common submodules
from numpy import linalg as la
from numpy import random as rand
from numpy import fft
# Never import *
# from numpy import * # DON'T DO THIS!
Basic Pattern - Array Creation
import numpy as np
# From list
arr = np.array([1, 2, 3, 4, 5])
# Zeros and ones
zeros = np.zeros((3, 4))
ones = np.ones((2, 3))
# Range
range_arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
# Linspace
linspace_arr = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
print(f"Array: {arr}")
print(f"Shape: {arr.shape}")
print(f"Dtype: {arr.dtype}")
Basic Pattern - Array Operations
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Element-wise operations
c = a + b # [5, 7, 9]
d = a * b # [4, 10, 18]
e = a ** 2 # [1, 4, 9]
# Mathematical functions
f = np.sin(a)
g = np.exp(a)
print(f"Sum: {c}")
print(f"Product: {d}")
Basic Pattern - Linear Algebra
import numpy as np
# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Dot product
C = np.dot(A, B) # or A @ B
# Matrix inverse
A_inv = np.linalg.inv(A)
# Eigenvalues
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Matrix product:\n{C}")
print(f"Eigenvalues: {eigenvalues}")
Critical Rules
✅ DO
- Use vectorization - Avoid Python loops, use array operations
- Specify dtype explicitly - For memory efficiency and precision control
- Use views when possible - Avoid unnecessary copies
- Broadcast properly - Understand broadcasting rules
- Check array shapes - Use
.shapefrequently - Use axis parameter - For operations along specific dimensions
- Pre-allocate arrays - Don't grow arrays in loops
- Use appropriate dtypes - int32, float64, complex128, etc.
- Copy when needed - Use
.copy()for independent arrays - Use built-in functions - They're optimized in C
❌ DON'T
- Loop over arrays - Use vectorization instead
- Grow arrays dynamically - Pre-allocate instead
- Use Python lists for math - Convert to arrays first
- Ignore memory layout - C-contiguous vs Fortran-contiguous matters
- Mix dtypes carelessly - Know implicit type promotion rules
- Modify arrays during iteration - Can cause undefined behavior
- Use == for array comparison - Use
np.array_equal()ornp.allclose() - Assume views vs copies - Check with
.baseattribute - Ignore NaN handling - Use
np.nanmean(),np.nanstd(), etc. - Use outdated APIs - Check for deprecated functions
Anti-Patterns (NEVER)
import numpy as np
# ❌ BAD: Python loops
result = []
for i in range(len(arr)):
result.append(arr[i] * 2)
result = np.array(result)
# ✅ GOOD: Vectorization
result = arr * 2
# ❌ BAD: Growing arrays
result = np.array([])
for i in range(1000):
result = np.append(result, i) # Very slow!
# ✅ GOOD: Pre-allocate
result = np.zeros(1000)
for i in range(1000):
result[i] = i
# Even better: Use arange
result = np.arange(1000)
# ❌ BAD: Comparing arrays with ==
if arr1 == arr2: # This is ambiguous!
print("Equal")
# ✅ GOOD: Use appropriate comparison
if np.array_equal(arr1, arr2):
print("Equal")
# Or for floating point
if np.allclose(arr1, arr2, rtol=1e-5):
print("Close enough")
# ❌ BAD: Ignoring dtypes
arr = np.array([1, 2, 3])
arr[0] = 1.5 # Silently truncates to 1!
# ✅ GOOD: Explicit dtype
arr = np.array([1, 2, 3], dtype=float)
arr[0] = 1.5 # Now works correctly
# ❌ BAD: Unintentional modification
a = np.array([1, 2, 3])
b = a # b is just a reference!
b[0] = 999 # Also modifies a!
# ✅ GOOD: Explicit copy
a = np.array([1, 2, 3])
b = a.copy() # b is independent
b[0] = 999 # a is unchanged
Array Creation
Basic Array Creation
import numpy as np
# From Python list
arr1 = np.array([1, 2, 3, 4, 5])
# From nested list (2D)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
# Specify dtype
arr3 = np.array([1, 2, 3], dtype=np.float64)
arr4 = np.array([1, 2, 3], dtype=np.int32)
# From tuple
arr5 = np.array((1, 2, 3))
# Complex numbers
arr6 = np.array([1+2j, 3+4j])
print(f"1D array: {arr1}")
print(f"2D array:\n{arr2}")
print(f"Float array: {arr3}")
Special Array Creation
import numpy as np
# Zeros
zeros = np.zeros((3, 4)) # 3x4 array of zeros
# Ones
ones = np.ones((2, 3, 4)) # 2x3x4 array of ones
# Empty (uninitialized)
empty = np.empty((2, 2)) # Faster but values are garbage
# Full (constant value)
full = np.full((3, 3), 7) # 3x3 array filled with 7
# Identity matrix
identity = np.eye(4) # 4x4 identity matrix
# Diagonal matrix
diag = np.diag([1, 2, 3, 4])
print(f"Zeros shape: {zeros.shape}")
print(f"Identity:\n{identity}")
Range-Based Creation
import numpy as np
# Arange (like Python range)
a = np.arange(10) # [0, 1, 2, ..., 9]
b = np.arange(2, 10) # [2, 3, 4, ..., 9]
c = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
d = np.arange(0, 1, 0.1) # [0, 0.1, 0.2, ..., 0.9]
# Linspace (linearly spaced)
e = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
f = np.linspace(0, 10, 100) # 100 points from 0 to 10
# Logspace (logarithmically spaced)
g = np.logspace(0, 2, 5) # [1, 10^0.5, 10, 10^1.5, 100]
# Geomspace (geometrically spaced)
h = np.geomspace(1, 1000, 4) # [1, 10, 100, 1000]
print(f"Arange: {a}")
print(f"Linspace: {e}")
Array Copies and Views
import numpy as np
original = np.array([1, 2, 3, 4, 5])
# View (shares memory)
view = original[:]
view[0] = 999 # Modifies original!
# Copy (independent)
copy = original.copy()
copy[0] = 777 # Doesn't affect original
# Check if array is a view
print(f"Is view? {view.base is original}")
print(f"Is copy? {copy.base is None}")
# Some operations create views, some create copies
slice_view = original[1:3] # View
boolean_copy = original[original > 2] # Copy!
Array Indexing and Slicing
Basic Indexing
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
# Single element
print(arr[0]) # 10
print(arr[-1]) # 50 (last element)
# Slicing
print(arr[1:4]) # [20, 30, 40]
print(arr[:3]) # [10, 20, 30]
print(arr[2:]) # [30, 40, 50]
print(arr[::2]) # [10, 30, 50] (every 2nd element)
# Negative indices
print(arr[-3:-1]) # [30, 40]
# Reverse
print(arr[::-1]) # [50, 40, 30, 20, 10]
Multi-Dimensional Indexing
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Single element
print(arr[0, 0]) # 1
print(arr[1, 2]) # 6
print(arr[-1, -1]) # 9
# Row slicing
print(arr[0]) # [1, 2, 3] (first row)
print(arr[1, :]) # [4, 5, 6] (second row)
# Column slicing
print(arr[:, 0]) # [1, 4, 7] (first column)
print(arr[:, 1]) # [2, 5, 8] (second column)
# Sub-array
print(arr[0:2, 1:3]) # [[2, 3], [5, 6]]
# Every other element
print(arr[::2, ::2]) # [[1, 3], [7, 9]]
Boolean Indexing
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Boolean condition
mask = arr > 5
print(mask) # [False, False, False, False, False, True, True, True, True, True]
# Boolean indexing
filtered = arr[arr > 5]
print(filtered) # [6, 7, 8, 9, 10]
# Multiple conditions (use & and |, not 'and' and 'or')
result = arr[(arr > 3) & (arr < 8)]
print(result) # [4, 5, 6, 7]
# Or condition
result = arr[(arr < 3) | (arr > 8)]
print(result) # [1, 2, 9, 10]
# Negation
result = arr[~(arr > 5)]
print(result) # [1, 2, 3, 4, 5]
Fancy Indexing
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
# Index with array of integers
indices = np.array([0, 2, 4])
result = arr[indices]
print(result) # [10, 30, 50]
# 2D fancy indexing
arr2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
rows = np.array([0, 2])
cols = np.array([1, 2])
result = arr2d[rows, cols] # Elements at (0,1) and (2,2)
print(result) # [2, 9]
# Combining boolean and fancy indexing
mask = arr > 25
indices_of_large = np.where(mask)[0]
print(indices_of_large) # [2, 3, 4]
Array Operations
Element-wise Operations
import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
# Arithmetic operations
print(a + b) # [6, 8, 10, 12]
print(a - b) # [-4, -4, -4, -4]
print(a * b) # [5, 12, 21, 32]
print(a / b) # [0.2, 0.333..., 0.428..., 0.5]
print(a ** 2) # [1, 4, 9, 16]
print(a // b) # [0, 0, 0, 0] (floor division)
print(a % b) # [1, 2, 3, 4] (modulo)
# With scalars
print(a + 10) # [11, 12, 13, 14]
print(a * 2) # [2, 4, 6, 8]
Mathematical Functions
import numpy as np
x = np.array([0, np.pi/6, np.pi/4, np.pi/3, np.pi/2])
# Trigonometric
sin_x = np.sin(x)
cos_x = np.cos(x)
tan_x = np.tan(x)
# Inverse trig
arcsin_x = np.arcsin([0, 0.5, 1])
# Exponential and logarithm
arr = np.array([1, 2, 3, 4])
exp_arr = np.exp(arr)
log_arr = np.log(arr)
log10_arr = np.log10(arr)
# Rounding
floats = np.array([1.2, 2.7, 3.5, 4.9])
print(np.round(floats)) # [1, 3, 4, 5]
print(np.floor(floats)) # [1, 2, 3, 4]
print(np.ceil(floats)) # [2, 3, 4, 5]
# Absolute value
print(np.abs([-1, -2, 3, -4])) # [1, 2, 3, 4]
Aggregation Functions
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Sum
print(np.sum(arr)) # 45 (all elements)
print(np.sum(arr, axis=0)) # [12, 15, 18] (column sums)
print(np.sum(arr, axis=1)) # [6, 15, 24] (row sums)
# Mean
print(np.mean(arr)) # 5.0
# Standard deviation
print(np.std(arr)) # ~2.58
# Min and max
print(np.min(arr)) # 1
print(np.max(arr)) # 9
print(np.argmin(arr)) # 0 (index of min)
print(np.argmax(arr)) # 8 (index of max)
# Median and percentiles
print(np.median(arr)) # 5.0
print(np.percentile(arr, 25)) # 3.0 (25th percentile)
Broadcasting
Broadcasting Rules
import numpy as np
# Scalar and array
arr = np.array([1, 2, 3, 4])
result = arr + 10 # Broadcast scalar to array shape
print(result) # [11, 12, 13, 14]
# 1D and 2D
arr1d = np.array([1, 2, 3])
arr2d = np.array([[10], [20], [30]])
result = arr1d + arr2d
print(result)
# [[11, 12, 13],
# [21, 22, 23],
# [31, 32, 33]]
# Broadcasting example: standardization
data = np.random.randn(100, 3) # 100 samples, 3 features
mean = np.mean(data, axis=0) # Mean of each column
std = np.std(data, axis=0) # Std of each column
standardized = (data - mean) / std # Broadcasting!
Explicit Broadcasting
import numpy as np
# Using broadcast_to
arr = np.array([1, 2, 3])
broadcasted = np.broadcast_to(arr, (4, 3))
print(broadcasted)
# [[1, 2, 3],
# [1, 2, 3],
# [1, 2, 3],
# [1, 2, 3]]
# Using newaxis
arr1d = np.array([1, 2, 3])
col_vector = arr1d[:, np.newaxis] # Shape (3, 1)
row_vector = arr1d[np.newaxis, :] # Shape (1, 3)
# Outer product using broadcasting
outer = col_vector * row_vector
print(outer)
# [[1, 2, 3],
# [2, 4, 6],
# [3, 6, 9]]
Linear Algebra
Matrix Operations
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Matrix multiplication
C = np.dot(A, B) # Traditional
C = A @ B # Modern syntax (Python 3.5+)
# Element-wise multiplication
D = A * B # Not matrix multiplication!
# Matrix transpose
A_T = A.T
# Trace (sum of diagonal)
trace = np.trace(A)
# Matrix power
A_squared = np.linalg.matrix_power(A, 2)
print(f"Matrix product:\n{C}")
print(f"Transpose:\n{A_T}")
print(f"Trace: {trace}")
Solving Linear Systems
import numpy as np
# Solve Ax = b
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
# Solve for x
x = np.linalg.solve(A, b)
print(f"Solution: {x}") # [2, 3]
# Verify solution
print(f"Verification: {np.allclose(A @ x, b)}") # True
# Matrix inverse
A_inv = np.linalg.inv(A)
print(f"Inverse:\n{A_inv}")
# Determinant
det = np.linalg.det(A)
print(f"Determinant: {det}")
Eigenvalues and Eigenvectors
import numpy as np
# Square matrix
A = np.array([[1, 2], [2, 1]])
# Eigenvalue decomposition
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")
# Verify: A * v = λ * v
for i in range(len(eigenvalues)):
lam = eigenvalues[i]
v = eigenvectors[:, i]
left = A @ v
right = lam * v
print(f"Eigenvalue {i}: {np.allclose(left, right)}")
Singular Value Decomposition (SVD)
import numpy as np
# Any matrix
A = np.array([[1, 2, 3],
[4, 5, 6]])
# SVD: A = U @ S @ Vt
U, s, Vt = np.linalg.svd(A)
# Reconstruct original matrix
S = np.zeros((2, 3))
S[:2, :2] = np.diag(s)
A_reconstructed = U @ S @ Vt
print(f"Original:\n{A}")
print(f"Reconstructed:\n{A_reconstructed}")
print(f"Close? {np.allclose(A, A_reconstructed)}")
# Singular values
print(f"Singular values: {s}")
Matrix Norms
import numpy as np
A = np.array([[1, 2], [3, 4]])
# Frobenius norm (default)
norm_fro = np.linalg.norm(A)
# 1-norm (max column sum)
norm_1 = np.linalg.norm(A, ord=1)
# Infinity norm (max row sum)
norm_inf = np.linalg.norm(A, ord=np.inf)
# 2-norm (spectral norm)
norm_2 = np.linalg.norm(A, ord=2)
print(f"Frobenius: {norm_fro:.4f}")
print(f"1-norm: {norm_1:.4f}")
print(f"2-norm: {norm_2:.4f}")
print(f"inf-norm: {norm_inf:.4f}")
Random Number Generation
Basic Random Generation
import numpy as np
# Set seed for reproducibility
np.random.seed(42)
# Random floats [0, 1)
rand_uniform = np.random.rand(5) # 1D array of 5 elements
rand_2d = np.random.rand(3, 4) # 3x4 array
# Random integers
rand_int = np.random.randint(0, 10, size=5) # [0, 10)
rand_int_2d = np.random.randint(0, 100, size=(3, 3))
# Random normal distribution
rand_normal = np.random.randn(1000) # Mean=0, std=1
rand_normal_custom = np.random.normal(loc=5, scale=2, size=1000)
# Random choice
choices = np.random.choice(['a', 'b', 'c'], size=10)
weighted_choices = np.random.choice([1, 2, 3], size=100, p=[0.1, 0.3, 0.6])
Statistical Distributions
import numpy as np
# Uniform distribution [low, high)
uniform = np.random.uniform(low=0, high=10, size=1000)
# Normal (Gaussian) distribution
normal = np.random.normal(loc=0, scale=1, size=1000)
# Exponential distribution
exponential = np.random.exponential(scale=2, size=1000)
# Binomial distribution
binomial = np.random.binomial(n=10, p=0.5, size=1000)
# Poisson distribution
poisson = np.random.poisson(lam=3, size=1000)
# Beta distribution
beta = np.random.beta(a=2, b=5, size=1000)
# Chi-squared distribution
chisq = np.random.chisquare(df=2, size=1000)
Modern Random Generator (numpy.random.Generator)
import numpy as np
# Create generator
rng = np.random.default_rng(seed=42)
# Generate random numbers
rand = rng.random(size=10)
ints = rng.integers(low=0, high=100, size=10)
normal = rng.normal(loc=0, scale=1, size=10)
# Shuffle array in-place
arr = np.arange(10)
rng.shuffle(arr)
# Sample without replacement
sample = rng.choice(100, size=10, replace=False)
print(f"Random: {rand}")
print(f"Shuffled: {arr}")
Reshaping and Manipulation
Reshaping Arrays
import numpy as np
# Original array
arr = np.arange(12) # [0, 1, 2, ..., 11]
# Reshape
arr_2d = arr.reshape(3, 4)
arr_3d = arr.reshape(2, 2, 3)
# Automatic dimension calculation with -1
arr_auto = arr.reshape(3, -1) # Automatically calculates 4
# Flatten to 1D
flat = arr_2d.flatten() # Returns copy
flat = arr_2d.ravel() # Returns view if possible
# Transpose
arr_t = arr_2d.T
print(f"Original shape: {arr.shape}")
print(f"2D shape: {arr_2d.shape}")
print(f"3D shape: {arr_3d.shape}")
Stacking and Splitting
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.array([7, 8, 9])
# Vertical stacking (vstack)
vstacked = np.vstack([a, b, c])
print(vstacked)
# [[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]]
# Horizontal stacking (hstack)
hstacked = np.hstack([a, b, c])
print(hstacked) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Column stack
col_stacked = np.column_stack([a, b, c])
# Concatenate (more general)
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
concat_axis0 = np.concatenate([arr1, arr2], axis=0)
concat_axis1 = np.concatenate([arr1, arr2], axis=1)
# Splitting
arr = np.arange(12)
split = np.split(arr, 3) # Split into 3 equal parts
print(split) # [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([8, 9, 10, 11])]
File I/O
Text Files
import numpy as np
# Save to text file
data = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
np.savetxt('data.txt', data)
np.savetxt('data.csv', data, delimiter=',')
np.savetxt('data_formatted.txt', data, fmt='%.2f')
# Load from text file
loaded = np.loadtxt('data.txt')
loaded_csv = np.loadtxt('data.csv', delimiter=',')
# Skip header rows
loaded_skip = np.loadtxt('data.txt', skiprows=1)
# Load specific columns
loaded_cols = np.loadtxt('data.csv', delimiter=',', usecols=(0, 2))
Binary Files (.npy, .npz)
import numpy as np
# Save single array
arr = np.random.rand(100, 100)
np.save('array.npy', arr)
# Load single array
loaded = np.load('array.npy')
# Save multiple arrays (compressed)
arr1 = np.random.rand(10, 10)
arr2 = np.random.rand(20, 20)
np.savez('arrays.npz', first=arr1, second=arr2)
# Load multiple arrays
loaded = np.load('arrays.npz')
loaded_arr1 = loaded['first']
loaded_arr2 = loaded['second']
# Compressed save
np.savez_compressed('arrays_compressed.npz', arr1=arr1, arr2=arr2)
Advanced Techniques
Universal Functions (ufuncs)
import numpy as np
# Ufuncs operate element-wise
arr = np.array([1, 2, 3, 4, 5])
# Built-in ufuncs
result = np.sqrt(arr)
result = np.exp(arr)
result = np.log(arr)
# Custom ufunc
def my_func(x):
return x**2 + 2*x + 1
vectorized = np.vectorize(my_func)
result = vectorized(arr)
# More efficient: define true ufunc
@np.vectorize
def better_func(x):
return x**2 + 2*x + 1
Structured Arrays
import numpy as np
# Define dtype
dt = np.dtype([('name', 'U20'), ('age', 'i4'), ('weight', 'f8')])
# Create structured array
data = np.array([
('Alice', 25, 55.5),
('Bob', 30, 70.2),
('Charlie', 35, 82.1)
], dtype=dt)
# Access by field name
names = data['name']
ages = data['age']
# Sort by field
sorted_data = np.sort(data, order='age')
print(f"Names: {names}")
print(f"Sorted by age:\n{sorted_data}")
Memory Layout and Performance
import numpy as np
# C-contiguous (row-major, default)
arr_c = np.array([[1, 2, 3], [4, 5, 6]], order='C')
# Fortran-contiguous (column-major)
arr_f = np.array([[1, 2, 3], [4, 5, 6]], order='F')
# Check memory layout
print(f"C-contiguous? {arr_c.flags['C_CONTIGUOUS']}")
print(f"F-contiguous? {arr_c.flags['F_CONTIGUOUS']}")
# Make contiguous
arr_made_c = np.ascontiguousarray(arr_f)
arr_made_f = np.asfortranarray(arr_c)
# Memory usage
print(f"Memory (bytes): {arr_c.nbytes}")
print(f"Item size: {arr_c.itemsize}")
Advanced Indexing with ix_
import numpy as np
arr = np.arange(20).reshape(4, 5)
# Select specific rows and columns
rows = np.array([0, 2])
cols = np.array([1, 3, 4])
# ix_ creates open mesh
result = arr[np.ix_(rows, cols)]
print(result)
# [[1, 3, 4],
# [11, 13, 14]]
# Equivalent to
# result = arr[[0, 2]][:, [1, 3, 4]]
Practical Workflows
Statistical Analysis
import numpy as np
# Generate sample data
np.random.seed(42)
data = np.random.normal(loc=100, scale=15, size=1000)
# Descriptive statistics
mean = np.mean(data)
median = np.median(data)
std = np.std(data)
var = np.var(data)
# Percentiles
q25, q50, q75 = np.percentile(data, [25, 50, 75])
# Histogram
counts, bins = np.histogram(data, bins=20)
# Correlation coefficient
data2 = data + np.random.normal(0, 5, size=1000)
corr = np.corrcoef(data, data2)[0, 1]
print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Std: {std:.2f}")
print(f"IQR: [{q25:.2f}, {q75:.2f}]")
print(f"Correlation: {corr:.3f}")
Monte Carlo Simulation
import numpy as np
def estimate_pi(n_samples=1000000):
"""Estimate π using Monte Carlo method."""
# Random points in [0, 1] x [0, 1]
x = np.random.rand(n_samples)
y = np.random.rand(n_samples)
# Check if inside quarter circle
inside = (x**2 + y**2) <= 1
# Estimate π
pi_estimate = 4 * np.sum(inside) / n_samples
return pi_estimate
# Estimate π
pi_est = estimate_pi(10000000)
print(f"π estimate: {pi_est:.6f}")
print(f"Error: {abs(pi_est - np.pi):.6f}")
Polynomial Fitting
import numpy as np
# Generate noisy data
x = np.linspace(0, 10, 50)
y_true = 2*x**2 + 3*x + 1
y_noisy = y_true + np.random.normal(0, 10, size=50)
# Fit polynomial (degree 2)
coeffs = np.polyfit(x, y_noisy, deg=2)
print(f"Coefficients: {coeffs}") # Should be close to [2, 3, 1]
# Predict
y_pred = np.polyval(coeffs, x)
# Evaluate fit quality
residuals = y_noisy - y_pred
rmse = np.sqrt(np.mean(residuals**2))
print(f"RMSE: {rmse:.2f}")
# Create polynomial object
poly = np.poly1d(coeffs)
print(f"Polynomial: {poly}")
Image Processing Basics
import numpy as np
# Create synthetic image (grayscale)
image = np.random.rand(100, 100)
# Apply transformations
# Rotate 90 degrees
rotated = np.rot90(image)
# Flip vertically
flipped_v = np.flipud(image)
# Flip horizontally
flipped_h = np.fliplr(image)
# Transpose
transposed = image.T
# Normalize to [0, 255]
normalized = ((image - image.min()) / (image.max() - image.min()) * 255).astype(np.uint8)
print(f"Original shape: {image.shape}")
print(f"Value range: [{image.min():.2f}, {image.max():.2f}]")
Distance Matrices
import numpy as np
# Points in 2D
points = np.random.rand(100, 2)
# Pairwise distances (broadcasting)
diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
distances = np.sqrt(np.sum(diff**2, axis=2))
print(f"Distance matrix shape: {distances.shape}")
print(f"Max distance: {distances.max():.4f}")
# Find nearest neighbors
for i in range(5):
# Exclude self (distance = 0)
dists = distances[i].copy()
dists[i] = np.inf
nearest = np.argmin(dists)
print(f"Point {i} nearest to point {nearest}, distance: {distances[i, nearest]:.4f}")
Sliding Window Operations
import numpy as np
def sliding_window_view(arr, window_size):
"""Create sliding window views of array."""
shape = (arr.shape[0] - window_size + 1, window_size)
strides = (arr.strides[0], arr.strides[0])
return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
# Time series data
data = np.random.rand(100)
# Create sliding windows
windows = sliding_window_view(data, window_size=10)
# Compute statistics for each window
window_means = np.mean(windows, axis=1)
window_stds = np.std(windows, axis=1)
print(f"Number of windows: {len(windows)}")
print(f"First window mean: {window_means[0]:.4f}")
Performance Optimization
Vectorization Examples
import numpy as np
import time
# Bad: Python loop
def sum_python_loop(arr):
total = 0
for x in arr:
total += x**2
return total
# Good: Vectorized
def sum_vectorized(arr):
return np.sum(arr**2)
# Benchmark
arr = np.random.rand(1000000)
start = time.time()
result1 = sum_python_loop(arr)
time_loop = time.time() - start
start = time.time()
result2 = sum_vectorized(arr)
time_vec = time.time() - start
print(f"Loop time: {time_loop:.4f}s")
print(f"Vectorized time: {time_vec:.4f}s")
print(f"Speedup: {time_loop/time_vec:.1f}x")
Memory-Efficient Operations
import numpy as np
# Bad: Creates intermediate arrays
def inefficient(arr):
temp1 = arr * 2
temp2 = temp1 + 5
temp3 = temp2 ** 2
return temp3
# Good: In-place operations
def efficient(arr):
result = arr.copy()
result *= 2
result += 5
result **= 2
return result
# Even better: Single expression (optimized by NumPy)
def most_efficient(arr):
return (arr * 2 + 5) ** 2
Using numexpr for Complex Expressions
import numpy as np
# For very large arrays and complex expressions,
# numexpr can be faster (requires installation)
# Without numexpr
a = np.random.rand(10000000)
b = np.random.rand(10000000)
result = 2*a + 3*b**2 - np.sqrt(a)
# With numexpr (if installed)
# import numexpr as ne
# result = ne.evaluate('2*a + 3*b**2 - sqrt(a)')
Common Pitfalls and Solutions
NaN Handling
import numpy as np
arr = np.array([1, 2, np.nan, 4, 5, np.nan])
# Problem: Regular functions return NaN
mean = np.mean(arr) # Returns nan
# Solution: Use nan-safe functions
mean = np.nanmean(arr) # Returns 3.0
std = np.nanstd(arr)
sum_val = np.nansum(arr)
# Check for NaN
has_nan = np.isnan(arr).any()
where_nan = np.where(np.isnan(arr))[0]
# Remove NaN
arr_clean = arr[~np.isnan(arr)]
print(f"Mean (nan-safe): {mean}")
print(f"NaN positions: {where_nan}")
Integer Division Pitfall
import numpy as np
# Problem: Integer division with integers
a = np.array([1, 2, 3])
b = np.array([2, 2, 2])
result = a / b # With Python 3, this is fine
# But be careful with older code or explicit int types
a_int = np.array([1, 2, 3], dtype=np.int32)
b_int = np.array([2, 2, 2], dtype=np.int32)
# In NumPy, / always gives float result
result_float = a_int / b_int # [0.5, 1, 1.5]
# Use // for integer division
result_int = a_int // b_int # [0, 1, 1]
print(f"Float division: {result_float}")
print(f"Integer division: {result_int}")
Array Equality
import numpy as np
a = np.array([1.0, 2.0, 3.0])
b = np.array([1.0, 2.0, 3.0])
# Problem: Can't use == directly for array comparison
# if a == b: # ValueError!
# Solution 1: Element-wise comparison
equal_elements = a == b # Boolean array
# Solution 2: Check if all elements equal
all_equal = np.all(a == b)
# Solution 3: array_equal
array_equal = np.array_equal(a, b)
# Solution 4: For floating point, use allclose
c = a + 1e-10
close_enough = np.allclose(a, c, rtol=1e-5, atol=1e-8)
print(f"All equal: {all_equal}")
print(f"Arrays equal: {array_equal}")
print(f"Close enough: {close_enough}")
Memory Leaks with Views
import numpy as np
# Problem: Large array kept in memory
large_array = np.random.rand(1000000, 100)
small_view = large_array[0:10] # Just 10 rows
# large_array is kept in memory because small_view references it!
del large_array # Doesn't free memory!
# Solution: Make a copy
large_array = np.random.rand(1000000, 100)
small_copy = large_array[0:10].copy()
del large_array # Now memory is freed
# Check if it's a view
print(f"Is view? {small_view.base is not None}")
print(f"Is copy? {small_copy.base is None}")
This comprehensive NumPy guide covers 50+ examples across all major array operations and numerical computing workflows!
More from tondevrel/scientific-agent-skills
xgboost-lightgbm
Industry-standard gradient boosting libraries for tabular data and structured datasets. XGBoost and LightGBM excel at classification and regression tasks on tables, CSVs, and databases. Use when working with tabular machine learning, gradient boosting trees, Kaggle competitions, feature importance analysis, hyperparameter tuning, or when you need state-of-the-art performance on structured data.
193opencv
Open Source Computer Vision Library (OpenCV) for real-time image processing, video analysis, object detection, face recognition, and camera calibration. Use when working with images, videos, cameras, edge detection, contours, feature detection, image transformations, object tracking, optical flow, or any computer vision task.
142ortools
Google Optimization Tools. An open-source software suite for optimization, specialized in vehicle routing, flows, integer and linear programming, and constraint programming. Features the world-class CP-SAT solver. Use for vehicle routing problems (VRP), scheduling, bin packing, knapsack problems, linear programming (LP), integer programming (MIP), network flows, constraint programming, combinatorial optimization, resource allocation, shift scheduling, job-shop scheduling, and discrete optimization problems.
74matplotlib
The foundational library for creating static, animated, and interactive visualizations in Python. Highly customizable and the industry standard for publication-quality figures. Use for 2D plotting, scientific data visualization, heatmaps, contours, vector fields, multi-panel figures, LaTeX-formatted plots, custom visualization tools, and plotting from NumPy arrays or Pandas DataFrames.
70scipy
Comprehensive guide for SciPy - the fundamental library for scientific and technical computing in Python. Use for integration, optimization, interpolation, linear algebra, signal processing, statistics, ODEs, Fourier transforms, and advanced scientific algorithms. Built on NumPy and essential for research and engineering.
51plotly
A high-level interactive graphing library for Python. Ideal for web-based visualizations, 3D plots, and complex interactive dashboards. Built on plotly.js, it allows users to zoom, pan, and hover over data points in a browser-based environment. Use for interactive charts, web applications, Jupyter notebooks, 3D data visualization, geographic maps, financial charts, animations, time-series analysis, and building production-ready dashboards with Dash.
50