Activation Functions

Written by Paul
Activation Functions in artificial neural networks are functions that determine the output of a neuron by transforming the input signals based on a specific criterion.
Activation functions non-linearly transform input signals as they pass through neurons. This enables neural networks to learn complex patterns and tackle non-linear problems.

Role of Activation Functions

  • Introducing Non-linearity: Activation functions allow neural networks to learn complex patterns beyond simple linear combinations. Without non-linear functions, stacking multiple layers would still result in a linear network, which limits the network's problem-solving ability.
  • Constraining Output Range: Activation functions can keep a neuron's output within a specific range. For example, the sigmoid function limits output between 0 and 1, while the hyperbolic tangent function limits it between -1 and 1, helping to maintain network stability.
  • Forming Decision Boundaries: Activation functions play a crucial role in forming decision boundaries for data classification tasks.

Types of Activation Functions

Unit Step Function

  • Outputs 1 if the input is greater than or equal to 0; otherwise, it outputs 0.
  • It is used in simple models like perceptrons.
  • Although non-linear, it is not differentiable, so it is rarely used in modern deep learning.

Sigmoid Function

  • A sigmoid, or "S-shaped" function, that restricts output between 0 and 1.
  • Formula: \(\sigma(x) = \frac{1}{1 + e^{-x}}\)
  • It introduces non-linearity and is commonly used for binary classification problems.
  • However, it can suffer from the vanishing gradient problem, which hampers learning in deep networks.

Hyperbolic Tangent (Tanh) Function

  • Restricts output between -1 and 1 in an S-shaped curve.
  • Formula: \(\text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\)
  • Similar to the sigmoid function but symmetric around zero, which can reduce the vanishing gradient problem to some extent.

ReLU (Rectified Linear Unit) Function

  • Outputs the input directly if it is positive; otherwise, it outputs 0.
  • Formula: \(f(x) = \max(0, x)\)
  • Simple to compute and works well in deep networks, making it the most widely used activation function today.
  • However, it can cause the dead ReLU problem, where neurons stop learning when the gradient is zero for negative inputs.

Leaky ReLU Function

  • A variation of ReLU that introduces a small gradient for negative inputs to mitigate the dead ReLU problem.
  • Formula: \(f(x) = \max(\alpha x, x)\), where \(\alpha\) is a small constant (e.g., 0.01).

Softmax Function

  • Primarily used in the output layer for multi-class classification, where one of multiple classes needs to be selected.
  • Converts output values to probabilities, ensuring they sum to 1 across the outputs.
  • Formula: \(\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}\)

Criteria for Choosing Activation Functions

  • Problem Type: Sigmoid is often used for binary classification, while Softmax is typical for multi-class classification.
  • Network Depth: For deep networks, ReLU or its variants are often preferred to avoid the vanishing gradient problem.
  • Output Range: Choose an activation function that matches the desired output range (e.g., Sigmoid for a 0-1 range).
Each function plays a unique role, balancing complexity, stability, and learning efficiency based on the specific needs of the neural network task at hand.
ā† Go home