numpy
SKILL.md
NumPy - Numerical Python
The fundamental package for numerical computing in Python, providing multi-dimensional arrays and fast operations.
When to Use
- Working with multi-dimensional arrays and matrices
- Performing element-wise operations on arrays
- Linear algebra computations (matrix multiplication, eigenvalues, SVD)
- Random number generation and statistical distributions
- Fourier transforms and signal processing basics
- Mathematical operations (trigonometric, exponential, logarithmic)
- Broadcasting operations across different array shapes
- Vectorizing Python loops for performance
- Reading and writing numerical data to files
- Building numerical algorithms and simulations
- Serving as foundation for pandas, scikit-learn, SciPy
Reference Documentation
Official docs: https://numpy.org/doc/
Search patterns: np.array, np.zeros, np.dot, np.linalg, np.random, np.broadcast
Core Principles
Use NumPy For
| Task | Function | Example |
|---|---|---|
| Create arrays | array, zeros, ones |
np.array([1, 2, 3]) |
| Mathematical ops | +, *, sin, exp |
np.sin(arr) |
| Linear algebra | dot, linalg.inv |
np.dot(A, B) |
| Statistics | mean, std, percentile |
np.mean(arr) |
| Random numbers | random.rand, random.normal |
np.random.rand(10) |
| Indexing | [], boolean, fancy |
arr[arr > 0] |
| Broadcasting | Automatic | arr + scalar |
| Reshaping | reshape, flatten |
arr.reshape(2, 3) |
Do NOT Use For
- String manipulation (use built-in str or pandas)
- Complex data structures (use pandas DataFrame)
- Symbolic mathematics (use SymPy)
- Deep learning (use PyTorch, TensorFlow)
- Sparse matrices (use scipy.sparse)
Quick Reference
Installation
# pip
pip install numpy
# conda
conda install numpy
# Specific version
pip install numpy==1.26.0
Standard Imports
import numpy as np
# Common submodules
from numpy import linalg as la
from numpy import random as rand
from numpy import fft
# Never import *
# from numpy import * # DON'T DO THIS!
Basic Pattern - Array Creation
import numpy as np
# From list
arr = np.array([1, 2, 3, 4, 5])
# Zeros and ones
zeros = np.zeros((3, 4))
ones = np.ones((2, 3))
# Range
range_arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
# Linspace
linspace_arr = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
print(f"Array: {arr}")
print(f"Shape: {arr.shape}")
print(f"Dtype: {arr.dtype}")
Basic Pattern - Array Operations
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Element-wise operations
c = a + b # [5, 7, 9]
d = a * b # [4, 10, 18]
e = a ** 2 # [1, 4, 9]
# Mathematical functions
f = np.sin(a)
g = np.exp(a)
print(f"Sum: {c}")
print(f"Product: {d}")
Basic Pattern - Linear Algebra
import numpy as np
# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Dot product
C = np.dot(A, B) # or A @ B
# Matrix inverse
A_inv = np.linalg.inv(A)
# Eigenvalues
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Matrix product:\n{C}")
print(f"Eigenvalues: {eigenvalues}")
Critical Rules
✅ DO
- Use vectorization - Avoid Python loops, use array operations
- Specify dtype explicitly - For memory efficiency and precision control
- Use views when possible - Avoid unnecessary copies
- Broadcast properly - Understand broadcasting rules
- Check array shapes - Use
.shapefrequently - Use axis parameter - For operations along specific dimensions
- Pre-allocate arrays - Don't grow arrays in loops
- Use appropriate dtypes - int32, float64, complex128, etc.
- Copy when needed - Use
.copy()for independent arrays - Use built-in functions - They're optimized in C
❌ DON'T
- Loop over arrays - Use vectorization instead
- Grow arrays dynamically - Pre-allocate instead
- Use Python lists for math - Convert to arrays first
- Ignore memory layout - C-contiguous vs Fortran-contiguous matters
- Mix dtypes carelessly - Know implicit type promotion rules
- Modify arrays during iteration - Can cause undefined behavior
- Use == for array comparison - Use
np.array_equal()ornp.allclose() - Assume views vs copies - Check with
.baseattribute - Ignore NaN handling - Use
np.nanmean(),np.nanstd(), etc. - Use outdated APIs - Check for deprecated functions
Anti-Patterns (NEVER)
import numpy as np
# ❌ BAD: Python loops
result = []
for i in range(len(arr)):
result.append(arr[i] * 2)
result = np.array(result)
# ✅ GOOD: Vectorization
result = arr * 2
# ❌ BAD: Growing arrays
result = np.array([])
for i in range(1000):
result = np.append(result, i) # Very slow!
# ✅ GOOD: Pre-allocate
result = np.zeros(1000)
for i in range(1000):
result[i] = i
# Even better: Use arange
result = np.arange(1000)
# ❌ BAD: Comparing arrays with ==
if arr1 == arr2: # This is ambiguous!
print("Equal")
# ✅ GOOD: Use appropriate comparison
if np.array_equal(arr1, arr2):
print("Equal")
# Or for floating point
if np.allclose(arr1, arr2, rtol=1e-5):
print("Close enough")
# ❌ BAD: Ignoring dtypes
arr = np.array([1, 2, 3])
arr[0] = 1.5 # Silently truncates to 1!
# ✅ GOOD: Explicit dtype
arr = np.array([1, 2, 3], dtype=float)
arr[0] = 1.5 # Now works correctly
# ❌ BAD: Unintentional modification
a = np.array([1, 2, 3])
b = a # b is just a reference!
b[0] = 999 # Also modifies a!
# ✅ GOOD: Explicit copy
a = np.array([1, 2, 3])
b = a.copy() # b is independent
b[0] = 999 # a is unchanged
Array Creation
Basic Array Creation
import numpy as np
# From Python list
arr1 = np.array([1, 2, 3, 4, 5])
# From nested list (2D)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
# Specify dtype
arr3 = np.array([1, 2, 3], dtype=np.float64)
arr4 = np.array([1, 2, 3], dtype=np.int32)
# From tuple
arr5 = np.array((1, 2, 3))
# Complex numbers
arr6 = np.array([1+2j, 3+4j])
print(f"1D array: {arr1}")
print(f"2D array:\n{arr2}")
print(f"Float array: {arr3}")
Special Array Creation
import numpy as np
# Zeros
zeros = np.zeros((3, 4)) # 3x4 array of zeros
# Ones
ones = np.ones((2, 3, 4)) # 2x3x4 array of ones
# Empty (uninitialized)
empty = np.empty((2, 2)) # Faster but values are garbage
# Full (constant value)
full = np.full((3, 3), 7) # 3x3 array filled with 7
# Identity matrix
identity = np.eye(4) # 4x4 identity matrix
# Diagonal matrix
diag = np.diag([1, 2, 3, 4])
print(f"Zeros shape: {zeros.shape}")
print(f"Identity:\n{identity}")
Range-Based Creation
import numpy as np
# Arange (like Python range)
a = np.arange(10) # [0, 1, 2, ..., 9]
b = np.arange(2, 10) # [2, 3, 4, ..., 9]
c = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
d = np.arange(0, 1, 0.1) # [0, 0.1, 0.2, ..., 0.9]
# Linspace (linearly spaced)
e = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
f = np.linspace(0, 10, 100) # 100 points from 0 to 10
# Logspace (logarithmically spaced)
g = np.logspace(0, 2, 5) # [1, 10^0.5, 10, 10^1.5, 100]
# Geomspace (geometrically spaced)
h = np.geomspace(1, 1000, 4) # [1, 10, 100, 1000]
print(f"Arange: {a}")
print(f"Linspace: {e}")
Array Copies and Views
import numpy as np
original = np.array([1, 2, 3, 4, 5])
# View (shares memory)
view = original[:]
view[0] = 999 # Modifies original!
# Copy (independent)
copy = original.copy()
copy[0] = 777 # Doesn't affect original
# Check if array is a view
print(f"Is view? {view.base is original}")
print(f"Is copy? {copy.base is None}")
# Some operations create views, some create copies
slice_view = original[1:3] # View
boolean_copy = original[original > 2] # Copy!
Array Indexing and Slicing
Basic Indexing
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
# Single element
print(arr[0]) # 10
print(arr[-1]) # 50 (last element)
# Slicing
print(arr[1:4]) # [20, 30, 40]
print(arr[:3]) # [10, 20, 30]
print(arr[2:]) # [30, 40, 50]
print(arr[::2]) # [10, 30, 50] (every 2nd element)
# Negative indices
print(arr[-3:-1]) # [30, 40]
# Reverse
print(arr[::-1]) # [50, 40, 30, 20, 10]
Multi-Dimensional Indexing
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Single element
print(arr[0, 0]) # 1
print(arr[1, 2]) # 6
print(arr[-1, -1]) # 9
# Row slicing
print(arr[0]) # [1, 2, 3] (first row)
print(arr[1, :]) # [4, 5, 6] (second row)
# Column slicing
print(arr[:, 0]) # [1, 4, 7] (first column)
print(arr[:, 1]) # [2, 5, 8] (second column)
# Sub-array
print(arr[0:2, 1:3]) # [[2, 3], [5, 6]]
# Every other element
print(arr[::2, ::2]) # [[1, 3], [7, 9]]
Boolean Indexing
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Boolean condition
mask = arr > 5
print(mask) # [False, False, False, False, False, True, True, True, True, True]
# Boolean indexing
filtered = arr[arr > 5]
print(filtered) # [6, 7, 8, 9, 10]
# Multiple conditions (use & and |, not 'and' and 'or')
result = arr[(arr > 3) & (arr < 8)]
print(result) # [4, 5, 6, 7]
# Or condition
result = arr[(arr < 3) | (arr > 8)]
print(result) # [1, 2, 9, 10]
# Negation
result = arr[~(arr > 5)]
print(result) # [1, 2, 3, 4, 5]
Fancy Indexing
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
# Index with array of integers
indices = np.array([0, 2, 4])
result = arr[indices]
print(result) # [10, 30, 50]
# 2D fancy indexing
arr2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
rows = np.array([0, 2])
cols = np.array([1, 2])
result = arr2d[rows, cols] # Elements at (0,1) and (2,2)
print(result) # [2, 9]
# Combining boolean and fancy indexing
mask = arr > 25
indices_of_large = np.where(mask)[0]
print(indices_of_large) # [2, 3, 4]
Array Operations
Element-wise Operations
import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
# Arithmetic operations
print(a + b) # [6, 8, 10, 12]
print(a - b) # [-4, -4, -4, -4]
print(a * b) # [5, 12, 21, 32]
print(a / b) # [0.2, 0.333..., 0.428..., 0.5]
print(a ** 2) # [1, 4, 9, 16]
print(a // b) # [0, 0, 0, 0] (floor division)
print(a % b) # [1, 2, 3, 4] (modulo)
# With scalars
print(a + 10) # [11, 12, 13, 14]
print(a * 2) # [2, 4, 6, 8]
Mathematical Functions
import numpy as np
x = np.array([0, np.pi/6, np.pi/4, np.pi/3, np.pi/2])
# Trigonometric
sin_x = np.sin(x)
cos_x = np.cos(x)
tan_x = np.tan(x)
# Inverse trig
arcsin_x = np.arcsin([0, 0.5, 1])
# Exponential and logarithm
arr = np.array([1, 2, 3, 4])
exp_arr = np.exp(arr)
log_arr = np.log(arr)
log10_arr = np.log10(arr)
# Rounding
floats = np.array([1.2, 2.7, 3.5, 4.9])
print(np.round(floats)) # [1, 3, 4, 5]
print(np.floor(floats)) # [1, 2, 3, 4]
print(np.ceil(floats)) # [2, 3, 4, 5]
# Absolute value
print(np.abs([-1, -2, 3, -4])) # [1, 2, 3, 4]
Aggregation Functions
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Sum
print(np.sum(arr)) # 45 (all elements)
print(np.sum(arr, axis=0)) # [12, 15, 18] (column sums)
print(np.sum(arr, axis=1)) # [6, 15, 24] (row sums)
# Mean
print(np.mean(arr)) # 5.0
# Standard deviation
print(np.std(arr)) # ~2.58
# Min and max
print(np.min(arr)) # 1
print(np.max(arr)) # 9
print(np.argmin(arr)) # 0 (index of min)
print(np.argmax(arr)) # 8 (index of max)
# Median and percentiles
print(np.median(arr)) # 5.0
print(np.percentile(arr, 25)) # 3.0 (25th percentile)
Broadcasting
Broadcasting Rules
import numpy as np
# Scalar and array
arr = np.array([1, 2, 3, 4])
result = arr + 10 # Broadcast scalar to array shape
print(result) # [11, 12, 13, 14]
# 1D and 2D
arr1d = np.array([1, 2, 3])
arr2d = np.array([[10], [20], [30]])
result = arr1d + arr2d
print(result)
# [[11, 12, 13],
# [21, 22, 23],
# [31, 32, 33]]
# Broadcasting example: standardization
data = np.random.randn(100, 3) # 100 samples, 3 features
mean = np.mean(data, axis=0) # Mean of each column
std = np.std(data, axis=0) # Std of each column
standardized = (data - mean) / std # Broadcasting!
Explicit Broadcasting
import numpy as np
# Using broadcast_to
arr = np.array([1, 2, 3])
broadcasted = np.broadcast_to(arr, (4, 3))
print(broadcasted)
# [[1, 2, 3],
# [1, 2, 3],
# [1, 2, 3],
# [1, 2, 3]]
# Using newaxis
arr1d = np.array([1, 2, 3])
col_vector = arr1d[:, np.newaxis] # Shape (3, 1)
row_vector = arr1d[np.newaxis, :] # Shape (1, 3)
# Outer product using broadcasting
outer = col_vector * row_vector
print(outer)
# [[1, 2, 3],
# [2, 4, 6],
# [3, 6, 9]]
Linear Algebra
Matrix Operations
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Matrix multiplication
C = np.dot(A, B) # Traditional
C = A @ B # Modern syntax (Python 3.5+)
# Element-wise multiplication
D = A * B # Not matrix multiplication!
# Matrix transpose
A_T = A.T
# Trace (sum of diagonal)
trace = np.trace(A)
# Matrix power
A_squared = np.linalg.matrix_power(A, 2)
print(f"Matrix product:\n{C}")
print(f"Transpose:\n{A_T}")
print(f"Trace: {trace}")
Solving Linear Systems
import numpy as np
# Solve Ax = b
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
# Solve for x
x = np.linalg.solve(A, b)
print(f"Solution: {x}") # [2, 3]
# Verify solution
print(f"Verification: {np.allclose(A @ x, b)}") # True
# Matrix inverse
A_inv = np.linalg.inv(A)
print(f"Inverse:\n{A_inv}")
# Determinant
det = np.linalg.det(A)
print(f"Determinant: {det}")
Eigenvalues and Eigenvectors
import numpy as np
# Square matrix
A = np.array([[1, 2], [2, 1]])
# Eigenvalue decomposition
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")
# Verify: A * v = λ * v
for i in range(len(eigenvalues)):
lam = eigenvalues[i]
v = eigenvectors[:, i]
left = A @ v
right = lam * v
print(f"Eigenvalue {i}: {np.allclose(left, right)}")
Singular Value Decomposition (SVD)
import numpy as np
# Any matrix
A = np.array([[1, 2, 3],
[4, 5, 6]])
# SVD: A = U @ S @ Vt
U, s, Vt = np.linalg.svd(A)
# Reconstruct original matrix
S = np.zeros((2, 3))
S[:2, :2] = np.diag(s)
A_reconstructed = U @ S @ Vt
print(f"Original:\n{A}")
print(f"Reconstructed:\n{A_reconstructed}")
print(f"Close? {np.allclose(A, A_reconstructed)}")
# Singular values
print(f"Singular values: {s}")
Matrix Norms
import numpy as np
A = np.array([[1, 2], [3, 4]])
# Frobenius norm (default)
norm_fro = np.linalg.norm(A)
# 1-norm (max column sum)
norm_1 = np.linalg.norm(A, ord=1)
# Infinity norm (max row sum)
norm_inf = np.linalg.norm(A, ord=np.inf)
# 2-norm (spectral norm)
norm_2 = np.linalg.norm(A, ord=2)
print(f"Frobenius: {norm_fro:.4f}")
print(f"1-norm: {norm_1:.4f}")
print(f"2-norm: {norm_2:.4f}")
print(f"inf-norm: {norm_inf:.4f}")
Random Number Generation
Basic Random Generation
import numpy as np
# Set seed for reproducibility
np.random.seed(42)
# Random floats [0, 1)
rand_uniform = np.random.rand(5) # 1D array of 5 elements
rand_2d = np.random.rand(3, 4) # 3x4 array
# Random integers
rand_int = np.random.randint(0, 10, size=5) # [0, 10)
rand_int_2d = np.random.randint(0, 100, size=(3, 3))
# Random normal distribution
rand_normal = np.random.randn(1000) # Mean=0, std=1
rand_normal_custom = np.random.normal(loc=5, scale=2, size=1000)
# Random choice
choices = np.random.choice(['a', 'b', 'c'], size=10)
weighted_choices = np.random.choice([1, 2, 3], size=100, p=[0.1, 0.3, 0.6])
Statistical Distributions
import numpy as np
# Uniform distribution [low, high)
uniform = np.random.uniform(low=0, high=10, size=1000)
# Normal (Gaussian) distribution
normal = np.random.normal(loc=0, scale=1, size=1000)
# Exponential distribution
exponential = np.random.exponential(scale=2, size=1000)
# Binomial distribution
binomial = np.random.binomial(n=10, p=0.5, size=1000)
# Poisson distribution
poisson = np.random.poisson(lam=3, size=1000)
# Beta distribution
beta = np.random.beta(a=2, b=5, size=1000)
# Chi-squared distribution
chisq = np.random.chisquare(df=2, size=1000)
Modern Random Generator (numpy.random.Generator)
import numpy as np
# Create generator
rng = np.random.default_rng(seed=42)
# Generate random numbers
rand = rng.random(size=10)
ints = rng.integers(low=0, high=100, size=10)
normal = rng.normal(loc=0, scale=1, size=10)
# Shuffle array in-place
arr = np.arange(10)
rng.shuffle(arr)
# Sample without replacement
sample = rng.choice(100, size=10, replace=False)
print(f"Random: {rand}")
print(f"Shuffled: {arr}")
Reshaping and Manipulation
Reshaping Arrays
import numpy as np
# Original array
arr = np.arange(12) # [0, 1, 2, ..., 11]
# Reshape
arr_2d = arr.reshape(3, 4)
arr_3d = arr.reshape(2, 2, 3)
# Automatic dimension calculation with -1
arr_auto = arr.reshape(3, -1) # Automatically calculates 4
# Flatten to 1D
flat = arr_2d.flatten() # Returns copy
flat = arr_2d.ravel() # Returns view if possible
# Transpose
arr_t = arr_2d.T
print(f"Original shape: {arr.shape}")
print(f"2D shape: {arr_2d.shape}")
print(f"3D shape: {arr_3d.shape}")
Stacking and Splitting
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.array([7, 8, 9])
# Vertical stacking (vstack)
vstacked = np.vstack([a, b, c])
print(vstacked)
# [[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]]
# Horizontal stacking (hstack)
hstacked = np.hstack([a, b, c])
print(hstacked) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Column stack
col_stacked = np.column_stack([a, b, c])
# Concatenate (more general)
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
concat_axis0 = np.concatenate([arr1, arr2], axis=0)
concat_axis1 = np.concatenate([arr1, arr2], axis=1)
# Splitting
arr = np.arange(12)
split = np.split(arr, 3) # Split into 3 equal parts
print(split) # [array([0, 1, 2, 3]), array([4, 5, 6, 7]), array([8, 9, 10, 11])]
File I/O
Text Files
import numpy as np
# Save to text file
data = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
np.savetxt('data.txt', data)
np.savetxt('data.csv', data, delimiter=',')
np.savetxt('data_formatted.txt', data, fmt='%.2f')
# Load from text file
loaded = np.loadtxt('data.txt')
loaded_csv = np.loadtxt('data.csv', delimiter=',')
# Skip header rows
loaded_skip = np.loadtxt('data.txt', skiprows=1)
# Load specific columns
loaded_cols = np.loadtxt('data.csv', delimiter=',', usecols=(0, 2))
Binary Files (.npy, .npz)
import numpy as np
# Save single array
arr = np.random.rand(100, 100)
np.save('array.npy', arr)
# Load single array
loaded = np.load('array.npy')
# Save multiple arrays (compressed)
arr1 = np.random.rand(10, 10)
arr2 = np.random.rand(20, 20)
np.savez('arrays.npz', first=arr1, second=arr2)
# Load multiple arrays
loaded = np.load('arrays.npz')
loaded_arr1 = loaded['first']
loaded_arr2 = loaded['second']
# Compressed save
np.savez_compressed('arrays_compressed.npz', arr1=arr1, arr2=arr2)
Advanced Techniques
Universal Functions (ufuncs)
import numpy as np
# Ufuncs operate element-wise
arr = np.array([1, 2, 3, 4, 5])
# Built-in ufuncs
result = np.sqrt(arr)
result = np.exp(arr)
result = np.log(arr)
# Custom ufunc
def my_func(x):
return x**2 + 2*x + 1
vectorized = np.vectorize(my_func)
result = vectorized(arr)
# More efficient: define true ufunc
@np.vectorize
def better_func(x):
return x**2 + 2*x + 1
Structured Arrays
import numpy as np
# Define dtype
dt = np.dtype([('name', 'U20'), ('age', 'i4'), ('weight', 'f8')])
# Create structured array
data = np.array([
('Alice', 25, 55.5),
('Bob', 30, 70.2),
('Charlie', 35, 82.1)
], dtype=dt)
# Access by field name
names = data['name']
ages = data['age']
# Sort by field
sorted_data = np.sort(data, order='age')
print(f"Names: {names}")
print(f"Sorted by age:\n{sorted_data}")
Memory Layout and Performance
import numpy as np
# C-contiguous (row-major, default)
arr_c = np.array([[1, 2, 3], [4, 5, 6]], order='C')
# Fortran-contiguous (column-major)
arr_f = np.array([[1, 2, 3], [4, 5, 6]], order='F')
# Check memory layout
print(f"C-contiguous? {arr_c.flags['C_CONTIGUOUS']}")
print(f"F-contiguous? {arr_c.flags['F_CONTIGUOUS']}")
# Make contiguous
arr_made_c = np.ascontiguousarray(arr_f)
arr_made_f = np.asfortranarray(arr_c)
# Memory usage
print(f"Memory (bytes): {arr_c.nbytes}")
print(f"Item size: {arr_c.itemsize}")
Advanced Indexing with ix_
import numpy as np
arr = np.arange(20).reshape(4, 5)
# Select specific rows and columns
rows = np.array([0, 2])
cols = np.array([1, 3, 4])
# ix_ creates open mesh
result = arr[np.ix_(rows, cols)]
print(result)
# [[1, 3, 4],
# [11, 13, 14]]
# Equivalent to
# result = arr[[0, 2]][:, [1, 3, 4]]
Practical Workflows
Statistical Analysis
import numpy as np
# Generate sample data
np.random.seed(42)
data = np.random.normal(loc=100, scale=15, size=1000)
# Descriptive statistics
mean = np.mean(data)
median = np.median(data)
std = np.std(data)
var = np.var(data)
# Percentiles
q25, q50, q75 = np.percentile(data, [25, 50, 75])
# Histogram
counts, bins = np.histogram(data, bins=20)
# Correlation coefficient
data2 = data + np.random.normal(0, 5, size=1000)
corr = np.corrcoef(data, data2)[0, 1]
print(f"Mean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Std: {std:.2f}")
print(f"IQR: [{q25:.2f}, {q75:.2f}]")
print(f"Correlation: {corr:.3f}")
Monte Carlo Simulation
import numpy as np
def estimate_pi(n_samples=1000000):
"""Estimate π using Monte Carlo method."""
# Random points in [0, 1] x [0, 1]
x = np.random.rand(n_samples)
y = np.random.rand(n_samples)
# Check if inside quarter circle
inside = (x**2 + y**2) <= 1
# Estimate π
pi_estimate = 4 * np.sum(inside) / n_samples
return pi_estimate
# Estimate π
pi_est = estimate_pi(10000000)
print(f"π estimate: {pi_est:.6f}")
print(f"Error: {abs(pi_est - np.pi):.6f}")
Polynomial Fitting
import numpy as np
# Generate noisy data
x = np.linspace(0, 10, 50)
y_true = 2*x**2 + 3*x + 1
y_noisy = y_true + np.random.normal(0, 10, size=50)
# Fit polynomial (degree 2)
coeffs = np.polyfit(x, y_noisy, deg=2)
print(f"Coefficients: {coeffs}") # Should be close to [2, 3, 1]
# Predict
y_pred = np.polyval(coeffs, x)
# Evaluate fit quality
residuals = y_noisy - y_pred
rmse = np.sqrt(np.mean(residuals**2))
print(f"RMSE: {rmse:.2f}")
# Create polynomial object
poly = np.poly1d(coeffs)
print(f"Polynomial: {poly}")
Image Processing Basics
import numpy as np
# Create synthetic image (grayscale)
image = np.random.rand(100, 100)
# Apply transformations
# Rotate 90 degrees
rotated = np.rot90(image)
# Flip vertically
flipped_v = np.flipud(image)
# Flip horizontally
flipped_h = np.fliplr(image)
# Transpose
transposed = image.T
# Normalize to [0, 255]
normalized = ((image - image.min()) / (image.max() - image.min()) * 255).astype(np.uint8)
print(f"Original shape: {image.shape}")
print(f"Value range: [{image.min():.2f}, {image.max():.2f}]")
Distance Matrices
import numpy as np
# Points in 2D
points = np.random.rand(100, 2)
# Pairwise distances (broadcasting)
diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
distances = np.sqrt(np.sum(diff**2, axis=2))
print(f"Distance matrix shape: {distances.shape}")
print(f"Max distance: {distances.max():.4f}")
# Find nearest neighbors
for i in range(5):
# Exclude self (distance = 0)
dists = distances[i].copy()
dists[i] = np.inf
nearest = np.argmin(dists)
print(f"Point {i} nearest to point {nearest}, distance: {distances[i, nearest]:.4f}")
Sliding Window Operations
import numpy as np
def sliding_window_view(arr, window_size):
"""Create sliding window views of array."""
shape = (arr.shape[0] - window_size + 1, window_size)
strides = (arr.strides[0], arr.strides[0])
return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
# Time series data
data = np.random.rand(100)
# Create sliding windows
windows = sliding_window_view(data, window_size=10)
# Compute statistics for each window
window_means = np.mean(windows, axis=1)
window_stds = np.std(windows, axis=1)
print(f"Number of windows: {len(windows)}")
print(f"First window mean: {window_means[0]:.4f}")
Performance Optimization
Vectorization Examples
import numpy as np
import time
# Bad: Python loop
def sum_python_loop(arr):
total = 0
for x in arr:
total += x**2
return total
# Good: Vectorized
def sum_vectorized(arr):
return np.sum(arr**2)
# Benchmark
arr = np.random.rand(1000000)
start = time.time()
result1 = sum_python_loop(arr)
time_loop = time.time() - start
start = time.time()
result2 = sum_vectorized(arr)
time_vec = time.time() - start
print(f"Loop time: {time_loop:.4f}s")
print(f"Vectorized time: {time_vec:.4f}s")
print(f"Speedup: {time_loop/time_vec:.1f}x")
Memory-Efficient Operations
import numpy as np
# Bad: Creates intermediate arrays
def inefficient(arr):
temp1 = arr * 2
temp2 = temp1 + 5
temp3 = temp2 ** 2
return temp3
# Good: In-place operations
def efficient(arr):
result = arr.copy()
result *= 2
result += 5
result **= 2
return result
# Even better: Single expression (optimized by NumPy)
def most_efficient(arr):
return (arr * 2 + 5) ** 2
Using numexpr for Complex Expressions
import numpy as np
# For very large arrays and complex expressions,
# numexpr can be faster (requires installation)
# Without numexpr
a = np.random.rand(10000000)
b = np.random.rand(10000000)
result = 2*a + 3*b**2 - np.sqrt(a)
# With numexpr (if installed)
# import numexpr as ne
# result = ne.evaluate('2*a + 3*b**2 - sqrt(a)')
Common Pitfalls and Solutions
NaN Handling
import numpy as np
arr = np.array([1, 2, np.nan, 4, 5, np.nan])
# Problem: Regular functions return NaN
mean = np.mean(arr) # Returns nan
# Solution: Use nan-safe functions
mean = np.nanmean(arr) # Returns 3.0
std = np.nanstd(arr)
sum_val = np.nansum(arr)
# Check for NaN
has_nan = np.isnan(arr).any()
where_nan = np.where(np.isnan(arr))[0]
# Remove NaN
arr_clean = arr[~np.isnan(arr)]
print(f"Mean (nan-safe): {mean}")
print(f"NaN positions: {where_nan}")
Integer Division Pitfall
import numpy as np
# Problem: Integer division with integers
a = np.array([1, 2, 3])
b = np.array([2, 2, 2])
result = a / b # With Python 3, this is fine
# But be careful with older code or explicit int types
a_int = np.array([1, 2, 3], dtype=np.int32)
b_int = np.array([2, 2, 2], dtype=np.int32)
# In NumPy, / always gives float result
result_float = a_int / b_int # [0.5, 1, 1.5]
# Use // for integer division
result_int = a_int // b_int # [0, 1, 1]
print(f"Float division: {result_float}")
print(f"Integer division: {result_int}")
Array Equality
import numpy as np
a = np.array([1.0, 2.0, 3.0])
b = np.array([1.0, 2.0, 3.0])
# Problem: Can't use == directly for array comparison
# if a == b: # ValueError!
# Solution 1: Element-wise comparison
equal_elements = a == b # Boolean array
# Solution 2: Check if all elements equal
all_equal = np.all(a == b)
# Solution 3: array_equal
array_equal = np.array_equal(a, b)
# Solution 4: For floating point, use allclose
c = a + 1e-10
close_enough = np.allclose(a, c, rtol=1e-5, atol=1e-8)
print(f"All equal: {all_equal}")
print(f"Arrays equal: {array_equal}")
print(f"Close enough: {close_enough}")
Memory Leaks with Views
import numpy as np
# Problem: Large array kept in memory
large_array = np.random.rand(1000000, 100)
small_view = large_array[0:10] # Just 10 rows
# large_array is kept in memory because small_view references it!
del large_array # Doesn't free memory!
# Solution: Make a copy
large_array = np.random.rand(1000000, 100)
small_copy = large_array[0:10].copy()
del large_array # Now memory is freed
# Check if it's a view
print(f"Is view? {small_view.base is not None}")
print(f"Is copy? {small_copy.base is None}")
This comprehensive NumPy guide covers 50+ examples across all major array operations and numerical computing workflows!
Weekly Installs
10
Repository
tondevrel/scien…t-skillsGitHub Stars
6
First Seen
Feb 8, 2026
Security Audits
Installed on
codex9
opencode8
github-copilot8
kimi-cli8
gemini-cli8
amp8