NumPy — Arrays, Broadcasting & Linear Algebra
Why NumPy Exists: The Python Performance Problem
Python lists are flexible — they can hold any object, any size, any type. But that flexibility has a cost. A Python list is an array of pointers, each pointing to a separate heap-allocated Python object. When you sum a list of integers, Python must:
- Follow a pointer to each integer object
- Unbox the C integer from the Python wrapper
- Add it
- Box the result into a new Python wrapper
- Manage garbage collection
NumPy sidesteps all of this. A NumPy array stores values as a contiguous block of raw C memory — 64-bit floats packed one after another, no pointers, no Python object overhead. Operations on this memory:
- Are implemented in C (and Fortran for linear algebra)
- Use SIMD (Single Instruction, Multiple Data) CPU instructions to process multiple elements simultaneously
- Avoid the Python GIL for computation (only the coordination layer is Python)
The result: NumPy operations are typically 50–500x faster than equivalent Python loops.
Creating Arrays
Indexing, Slicing, and Fancy Indexing
Universal Functions and Axis Operations
Broadcasting: The Most Powerful (and Confusing) Feature
Broadcasting is the mechanism that lets NumPy operate on arrays of different shapes. Understanding it is essential for writing concise, loop-free NumPy code.
The Broadcasting Rules:
- If the arrays have different number of dimensions, pad the smaller shape on the left with 1s
- Dimensions of size 1 are stretched to match the other array's size in that dimension
- If shapes still don't match after stretching, raise an error
Reshaping and Stacking
Linear Algebra
Performance: Vectorized vs Loop Benchmark
PROJECT: Neural Network Forward Pass from Scratch
A neural network is just a sequence of matrix multiplications and non-linear functions. NumPy is all you need to implement one:
Key Takeaways
- NumPy's speed comes from memory layout: contiguous C arrays plus SIMD CPU instructions, not Python-level tricks — this is why copying to a list and back is expensive
- Slicing returns views, not copies:
arr[0:5]shares memory witharr— modifying it modifies the original; use.copy()when you need independence - Boolean indexing is the most practical pattern:
arr[arr > 0]is cleaner and faster than any loop filter, andarr[mask] = valueis the idiomatic way to conditionally set values - Broadcasting follows three strict rules: pad left with 1s, stretch size-1 dimensions, error on incompatible sizes — once internalized, it replaces most explicit loops
axis=0reduces rows,axis=1reduces columns: a (3, 4) array summed onaxis=0gives shape (4,); summed onaxis=1gives shape (3,) — think "collapse along this axis"@is matrix multiply,*is element-wise: confusing them is the most common NumPy bug — always check shapes before and afternp.linalg.solve(A, b)beatsinv(A) @ b: computing the inverse is slower and numerically less stable than solving directly; usesolvefor systems of equations- Vectorization is a mindset shift: instead of asking "how do I loop over elements?", ask "what array operation produces the result?" — the answer is almost always faster