A Large-Scale Study of Activation Functions in Modern Deep Neural Network Architectures for Efficient Convergence

Authors

  • Andrinandrasana David Rasamoelina Dept. of Cybernetics and Artificial Intelligence, FEI TU of Kosice, Slovak Republic https://orcid.org/0000-0002-4318-9507
  • Ivan Cík Dept. of Cybernetics and Artificial Intelligence, FEI TU of Kosice, Slovak Republic
  • Peter Sincak Faculty of Mechanical Engineering and Informatics, University of Miskolc, Hungary
  • Marián Mach Dept. of Cybernetics and Artificial Intelligence, FEI TU of Kosice, Slovak Republic
  • Lukáš Hruška Dept. of Cybernetics and Artificial Intelligence, FEI TU of Kosice, Slovak Republic

DOI:

https://doi.org/10.4114/intartif.vol25iss70pp95-109

Keywords:

Activation Function, Computer Vision, Deep Learning

Abstract

Activation functions play an important role in the convergence of learning algorithms based on neural networks. They
provide neural networks with nonlinear ability and the possibility to fit in any complex data. However, no deep study exists in the
literature on the comportment of activation functions in modern architecture. Therefore, in this research, we compare the 18 most used activation functions on multiple datasets (CIFAR-10, CIFAR-100, CALTECH-256) using 4 different models (EfficientNet,
ResNet, a variation of ResNet using the bag of tricks, and MobileNet V3). Furthermore, we explore the shape of the loss
landscape of those different architectures with various activation functions. Lastly, based on the result of our experimentation,
we introduce a new locally quadratic activation function namely Hytana alongside one variation Parametric Hytana which
outperforms common activation functions and address the dying ReLU problem.

Downloads

Download data is not yet available.

Downloads

Published

2022-12-08

How to Cite

Rasamoelina, A. D., Cík, I., Sincak, P., Mach, M., & Hruška, L. (2022). A Large-Scale Study of Activation Functions in Modern Deep Neural Network Architectures for Efficient Convergence. Inteligencia Artificial, 25(70), 95–109. https://doi.org/10.4114/intartif.vol25iss70pp95-109