Visualising Activation Functions in Neural Networks
In neural networks, activation functions determine the output of a node from a given set of inputs, where non-linear activation functions allow the network to replicate complex non-linear behaviours. As most neural networks are optimised using some form of gradient descent, activation functions need to be differentiable (or at least, almost entirely differentiable- see ReLU). Furthermore, complicated activation functions may produce issues around vanishing and exploding gradients. As such, neural networks tend to employ a select few activation functions (identity, sigmoid, ReLU and their variants).
Select an activation function from the menu below to plot it and its first derivative. Some properties relevant for neural networks are provided in the boxes on the right.
Note: You are recommended to view it on Chrome for the best experience. On Firefox and IE, the equations in the boxes may not render.
More theoretical than practical, this activation function mimics the all-or-nothing property of biological neurons. It's not useful for neural networks, as its derivative is zero (except at 0 where it's undefined). This means that gradient based approaches for optimisation are not feasible.
If you spot any errors or want your fancy activation function included, then please get in touch! Thanks for reading!!!
Leave a Comment