Beta

But what is a neural network? | Deep learning chapter 1

Below is a short summary and detailed review of this video written by FutureFactual:

Understanding Neural Networks: Structure for Handwritten Digit Recognition | 3Blue1Brown

Overview

In this video, 3Blue1Brown introduces the structure of a simple neural network that can learn to recognize handwritten digits by mapping 784 input activations to 10 digit outputs through two hidden layers of 16 neurons each. The discussion focuses on core concepts and not training details, aiming to make the math motivating and intuitive.

  • 784 input neurons correspond to 28 by 28 pixel grayscale values
  • Two hidden layers with 16 neurons each act as intermediate feature detectors
  • The last layer has 10 output neurons representing digits 0–9
  • Activations are squashed to 0–1 values by a sigmoid function
  • Learning is described as adjusting a large set of weights and biases, not hand tuning

Introduction and Relevance

3Blue1Brown presents a beginner friendly visualization of a neural network designed to recognize handwritten digits. The video emphasizes the structure of a vanilla feedforward network, not the learning process, and uses the familiar 28 by 28 pixel input image as a concrete example. By the end of the segment, the viewer should grasp how a network can transform raw pixels into digit predictions through successive layers of computation.

Network Architecture

The input layer contains 784 neurons, one for each pixel in a 28 by 28 grayscale image. The network includes two hidden layers, each consisting of 16 neurons. The final output layer has 10 neurons, corresponding to the digits 0 through 9. The activations in each layer are nonnegative values between 0 and 1, interpreted as the degree to which a unit is active or the presence of a particular pattern in the input.

In the specific example shown, the network has already been trained so that a particular input image produces a distinctive cascade of activations through the layers, culminating in the maximum activation in one of the 10 output neurons, which is taken as the network’s digit guess.

Weights, Biases and the Matrix View

Connections between layers carry weights, which are numbers that scale how much each input activation contributes to the next layer's activations. A bias is added before applying the activation function. For a given layer, the weighted sum of the previous layer's activations plus the biases is then squashed by a nonlinear function to produce the next layer’s activations. The video uses a compact matrix vector notation: the activations of a layer form a column vector, the weights form a matrix, and the biases form a bias vector. A matrix multiplication followed by a bias vector addition and a nonlinear activation yields the next layer’s activations, making the math concise and efficient for software libraries that optimize matrix operations.

Activation Functions and Biases

The sigmoid function is introduced as a classic way to squash real numbers into the 0 to 1 range. A bias term shifts the input to the activation function to control when a neuron starts to fire. The presentation also notes that while sigmoid was historically common, modern networks frequently use the ReLU activation due to ease of training and performance in deep architectures.

Why a Layered Structure Works

The speaker motivates the layered structure by connecting it to how humans parse digits into components such as loops and edges. The idea is that hidden units might learn to detect simple subcomponents (edges, strokes) that combine to form more complex patterns (loops for 9 or 8). Although this is a heuristic, the layered design provides a natural path from raw pixels to abstract concepts that correspond to digits, enabling the system to generalize across similar patterns in different images.

From Picture to Code

Beyond the intuitive narrative, the transcript emphasizes that the network is ultimately a function: it takes 784 inputs and returns 10 outputs. The notational compactness with matrices makes the representation precise and lends itself to efficient implementation with linear algebra libraries that are highly optimized for matrix multiplications.

Looking Ahead: Learning and Beyond

The video teases the next installment, which will cover how the network learns the appropriate weights and biases from data. It also hints at comparisons with newer activation functions like ReLU and discusses the broader implications and limitations of neural networks in practical tasks. The presenter acknowledges the broader ecosystem of resources and tools for experimenting with these ideas on a computer.

To find out more about the video and 3Blue1Brown go to: But what is a neural network? | Deep learning chapter 1.