Written by Paul
Activation Functions in artificial neural networks are functions that determine the output of a neuron by transforming the input signals based on a specific criterion.
Activation functions non-linearly transform input signals as they pass through neurons. This enables neural networks to learn complex patterns and tackle non-linear problems.
Role of Activation Functions
- Introducing Non-linearity: Activation functions allow neural networks to learn complex patterns beyond simple linear combinations. Without non-linear functions, stacking multiple layers would still result in a linear network, which limits the network's problem-solving ability.
- Constraining Output Range: Activation functions can keep a neuron's output within a specific range. For example, the sigmoid function limits output between 0 and 1, while the hyperbolic tangent function limits it between -1 and 1, helping to maintain network stability.
- Forming Decision Boundaries: Activation functions play a crucial role in forming decision boundaries for data classification tasks.
Types of Activation Functions
Unit Step Function
- Outputs 1 if the input is greater than or equal to 0; otherwise, it outputs 0.
- It is used in simple models like perceptrons.
- Although non-linear, it is not differentiable, so it is rarely used in modern deep learning.
Sigmoid Function
- A sigmoid, or "S-shaped" function, that restricts output between 0 and 1.
- Formula: \(\sigma(x) = \frac{1}{1 + e^{-x}}\)
- It introduces non-linearity and is commonly used for binary classification problems.
- However, it can suffer from the vanishing gradient problem, which hampers learning in deep networks.
Hyperbolic Tangent (Tanh) Function
- Restricts output between -1 and 1 in an S-shaped curve.
- Formula: \(\text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\)
- Similar to the sigmoid function but symmetric around zero, which can reduce the vanishing gradient problem to some extent.
ReLU (Rectified Linear Unit) Function
- Outputs the input directly if it is positive; otherwise, it outputs 0.
- Formula: \(f(x) = \max(0, x)\)
- Simple to compute and works well in deep networks, making it the most widely used activation function today.
- However, it can cause the dead ReLU problem, where neurons stop learning when the gradient is zero for negative inputs.
Leaky ReLU Function
- A variation of ReLU that introduces a small gradient for negative inputs to mitigate the dead ReLU problem.
- Formula: \(f(x) = \max(\alpha x, x)\), where \(\alpha\) is a small constant (e.g., 0.01).
Softmax Function
- Primarily used in the output layer for multi-class classification, where one of multiple classes needs to be selected.
- Converts output values to probabilities, ensuring they sum to 1 across the outputs.
- Formula: \(\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}\)
Criteria for Choosing Activation Functions
- Problem Type: Sigmoid is often used for binary classification, while Softmax is typical for multi-class classification.
- Network Depth: For deep networks, ReLU or its variants are often preferred to avoid the vanishing gradient problem.
- Output Range: Choose an activation function that matches the desired output range (e.g., Sigmoid for a 0-1 range).
Each function plays a unique role, balancing complexity, stability, and learning efficiency based on the specific needs of the neural network task at hand.