Activation Functions

Activation Functions in artificial neural networks are functions that determine the output of a neuron by transforming the input signals based on a specific criterion.

Activation functions non-linearly transform input signals as they pass through neurons. This enables neural networks to learn complex patterns and tackle non-linear problems.

Role of Activation Functions

Introducing Non-linearity: Activation functions allow neural networks to learn complex patterns beyond simple linear combinations. Without non-linear functions, stacking multiple layers would still result in a linear network, which limits the network's problem-solving ability.

Constraining Output Range: Activation functions can keep a neuron's output within a specific range. For example, the sigmoid function limits output between 0 and 1, while the hyperbolic tangent function limits it between -1 and 1, helping to maintain network stability.

Forming Decision Boundaries: Activation functions play a crucial role in forming decision boundaries for data classification tasks.

Types of Activation Functions

Unit Step Function

Outputs 1 if the input is greater than or equal to 0; otherwise, it outputs 0.

It is used in simple models like perceptrons.

Although non-linear, it is not differentiable, so it is rarely used in modern deep learning.

Sigmoid Function

A sigmoid, or "S-shaped" function, that restricts output between 0 and 1.

Formula: \(\sigma(x) = \frac{1}{1 + e^{-x}}\)

It introduces non-linearity and is commonly used for binary classification problems.

However, it can suffer from the vanishing gradient problem, which hampers learning in deep networks.

Hyperbolic Tangent (Tanh) Function

Restricts output between -1 and 1 in an S-shaped curve.

Formula: \(\text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\)

Similar to the sigmoid function but symmetric around zero, which can reduce the vanishing gradient problem to some extent.

ReLU (Rectified Linear Unit) Function

Outputs the input directly if it is positive; otherwise, it outputs 0.

Formula: \(f(x) = \max(0, x)\)

Simple to compute and works well in deep networks, making it the most widely used activation function today.

However, it can cause the dead ReLU problem, where neurons stop learning when the gradient is zero for negative inputs.

Leaky ReLU Function

A variation of ReLU that introduces a small gradient for negative inputs to mitigate the dead ReLU problem.

Formula: \(f(x) = \max(\alpha x, x)\), where \(\alpha\) is a small constant (e.g., 0.01).

Softmax Function

Primarily used in the output layer for multi-class classification, where one of multiple classes needs to be selected.

Converts output values to probabilities, ensuring they sum to 1 across the outputs.

Formula: \(\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}\)

Criteria for Choosing Activation Functions

Problem Type: Sigmoid is often used for binary classification, while Softmax is typical for multi-class classification.

Network Depth: For deep networks, ReLU or its variants are often preferred to avoid the vanishing gradient problem.

Output Range: Choose an activation function that matches the desired output range (e.g., Sigmoid for a 0-1 range).

Each function plays a unique role, balancing complexity, stability, and learning efficiency based on the specific needs of the neural network task at hand.