Neural Network Technical Report
Fundamental Design Choices and Implementation of an Abstract Neural Network
Written by Nathan Berglas in August 2024
Executive Summary
The purpose of this report is to detail specific design choices, as well as the process for creating, from scratch, an abstract neural network tool. When using pre-existing neural network tools, such as TensorFlow, often one misses crucial understanding that can only be reaped by doing everything yourself. By examining these design choices for all purposes, one can fully understand all the aspects and techniques required for a functional and efficient neural network.
The scope of this report is all that is required to get the most basic neural network functioning and minor optimizations. That includes implementation, structure of neural network, loss functions, activation functions, testing data, and regularization. These are the major points, and this report will cover these subjects in enough detail for one to understand the effect of each technique.
The research methodology included making an original neural network tool from scratch. Conducted research was tested using this implementation. This allowed learning through research, as well as refining intuition and understanding through experimentation. In order to properly analyze the differences, a range of testing data was used, from the XOR function, to a more complex data set written in Python.
In conclusion, the best way to us neural networks effectively is to understand them at the most basic level. One should consider all possible techniques, such as layer and neuron count, loss functions, activation functions, training data sets, and regularization.
Introduction
The scope of this report contains all the design choices and fundamentals of creating your own neural network from scratch. This project is designed to create a tool that will allow a user to create a neural network for any purpose, and configure it to work effectively. The second goal, is to describe crucial techniques that are necessary to create a custom neural networks that allow users to use industry-standard tools, such as TensorFlow, with better intuition and understanding.
A potential scenario could be a user creating a neural network to recognize hand writing. Detailed in this report are the necessary fundamentals to create such a network, and next steps for increasing efficiency.
The stakes for not understanding the fundamentals are harsh, and can result in either a non-functional neural network, or one that falls prey to common issues discussed in this report, such as gradient vanishing and overfitting.
The sources for this report include many papers in machine learning journals, as well as course content and online websites dedicated to teaching neural networks.
The objectives of this report is threefold. Primarily, this report wishes to describe each of the major design choices that are required in any neural network. With this knowledge, the reader will understand what is necessary for the implementation of a neural network, as well as give a deeper look into the inner workings of a neural network. The second objective is to entertain each option available to users when configuring and creating neural networks. This is crucial for an effective neural network, and depends on intended use. Because of the broad scope of this report, many options will be considered and recommended for different purposes. The third, and final, objective for this report is to discuss the implementation of a neural network in the programming language C. This is discussed primarily in the method section. In conducting experiments for this report, a neural network was built from scratch in C to test hypothesis’s and to further analyse.
Background Information
Before analyzing neural networks, a description will be provided. A neural network is a machine learning model that uses connected neurons in a layer structure to emulate a human brain. Machine learning is a term used for artificial intelligence, or more simply, a program that can train itself and learn.
A neural network has an input layer, an output layer, and a predefined amount of hidden layers. A layer is a collection of neurons. The input layer is the first layer and has the same number of neurons as inputs to the neural network. The output layer is the last layer and has the same number of neurons as outputs to the neural network. The hidden layers are the layers that are in the middle, neither input nor output layers. The user will never directly interact with these layers.
Each neuron in a neural network has three attributes: a bias-initially undefined, and an activation function. The bias is a number between -1 - 1 that adjusts the neural network’s preference for activating that neuron. In addition, each pair of neurons in adjacent layers are connected with an edge, which has a single attribute, weight. The weight of two neurons is a floating point number, typically between -1 - 1 that is an attribute of a pair of neurons in adjacent layers. The weight determines the preference a neuron in the previous layer has to passing its output into the neuron in the next layer.
Consider an example where a neural network has two input neurons, two output neuron, and two hidden layer with three neurons each. This neural network would have a total of 10 neurons, thus 10 biases, values, and activation functions. Each adjacent node would be connected with an edge, for a total of 21 edges. This is a small example, and in complex neural networks, there can be hundreds to thousands of neurons and edges.
Another important detail to a neural network is running it. When one uses a neural network, they give an input and get out an output. The input, for example (1, 0), is the value of each neuron in the input layer. Therefore, in this case, the first input layer neuron’s value would be 1, and the second input layer neuron’s value would be 0. The neural network then runs an algorithm called forward propagation that ‘propagates’ these input values throughout the neural network, and once done, the output will be the value(s) of the output layer’s neurons. In this case, if the first output layer neuron’s value is 0.35, and the second output layer neuron’s value is 0.12, then the output of the neural network is (0.35, 0.12). (It should be noted that often the output of a neural network is normalized so that the sum of outputs is 1. This is not always the case.)
The final important fact about neural networks, is a neuron’s activation function. The activation function is a mathematical function that introduces non-linearity in a neural network. The value of a neuron is passed through the activation function, which also clamps the value between 0-1. This is further discussed in the analysis section. Once that is done, you have a new value for the neuron.
The most important factors that effect the neural network are the number of layers, the neurons in each layer, the weights of each edge, the bias of each neuron, and the activation function of each neuron. These terms are additionally described in the glossary.
With this background knowledge, one may understand how to implement this algorithm in C, as well as understand design choices.
Method
There are many options when conducting tests; detailed here is the method used in this report. C was chosen to implement the neural network due to its speed and heap memory. A common alternative is Python. There is no difference in results, but the implementation will differ. Here is the description for the method used to conduct tests, and contains the information necessary to reproduce my results.
In C, a structure was defined for a neural network, a layer, and neuron. A neuron has an array of weights, the length of which is the number of neurons in the next layer. Each ith element of the array is the weight of the edge connecting the neuron to the ith neuron in the next layer. The neuron structure also has bias, and value. The layer structure includes an array of neurons in that layer. Finally, the neural network structure has an input layer, an array of hidden layers, and an output layer.
In the source code, the name of the neural network was named ’nn’. For some examples of accessing data, if one wished to access the weight from the first input neuron to the second neuron in the first hidden layer, they would be able to do so with the following syntax: ’nn->input->neuron[0]->weights[1]’.
In Appendix A, a portion of the code can be found. The tools necessary for implementation of a neural network are array traversal, indexing, and loops. These are relatively simple C skills that a a beginner, or intermediate user should be comfortable using and understanding. The full source code can also be found in Appendix A.
There are two algorithms that are essential for making a neural network, which are forward propagation, and backpropogation. Forward propagation is the algorithm responsible for running the neural network—getting an output from an input. Backpropogation is the training algorithm that passes an ’error signal’ through the network from output to input. By comparing the output of a neural network to the expected output, one can quantify the effectiveness of a neural network. By calculating how changes to the neural network affect the overall effectiveness, one can minimize error. This process is called backpropogation, and is a much more complex algorithm that forward propagation. The implementation of backpropogation and forward propagation can be found in Appendix A.
Other functionality was added for ease of use, including storing training data in binary files to be loaded by the program, as well as being able to save and load trained neural network in binary files. All of the discussed features and usage is documented, and can be found in the source code, the link to which is provided in appendix A.
Results and Analysis
There are many options when conducting tests; detailed here is the method used in this report. C was chosen to implement the neural network due to its speed and heap memory. A common alternative is Python. There is no difference in results, but the implementation will differ. Here is the description for the method used to conduct tests, and contains the information necessary to reproduce my results.
In C, a structure was defined for a neural network, a layer, and neuron. A neuron has an array of weights, the length of which is the number of neurons in the next layer. Each ith element of the array is the weight of the edge connecting the neuron to the ith neuron in the next layer. The neuron structure also has bias, and value. The layer structure includes an array of neurons in that layer. Finally, the neural network structure has an input layer, an array of hidden layers, and an output layer.
In the source code, the name of the neural network was named ’nn’. For some examples of accessing data, if one wished to access the weight from the first input neuron to the second neuron in the first hidden layer, they would be able to do so with the following syntax: ’nn->input->neuron[0]->weights[1]'.
In Appendix A, a portion of the code can be found. The tools necessary for implementation of a neural network are array traversal, indexing, and loops. These are relatively simple C skills that a a beginner, or intermediate user should be comfortable using and understanding. The full source code can also be found in Appendix A.
There are two algorithms that are essential for making a neural network, which are forward propagation, and backpropogation. Forward propagation is the algorithm responsible for running the neural network—getting an output from an input. Backpropogation is the training algorithm that passes an ’error signal’ through the network from output to input. By comparing the output of a neural network to the expected output, one can quantify the effectiveness of a neural network. By calculating how changes to the neural network affect the overall effectiveness, one can minimize error. This process is called backpropogation, and is a much more complex algorithm that forward propagation. The implementation of backpropogation and forward propagation can be found in Appendix A.
Other functionality was added for ease of use, including storing training data in binary files to be loaded by the program, as well as being able to save and load trained neural network in binary files. All of the discussed features and usage is documented, and can be found in the source code, the link to which is provided in appendix A.
Discussion
There are many factors and techniques that can be researched and implemented in neural networks, but covered here are the most fundamental and universal for any application. From this research, the design choices heavily depend on the goal, and designing a single abstract neural network is impossible. Although creating a single neural network is impossible to do effectively, we can create a tool that allows the user to tailor a neural network to fit their goal.
For this framework or tool, options must be offered, and from the research done, it has become clear what must be offered. For example, if the tool offered solely the sigmoid activation function, then any user who wished to create a convolutional neural network would not have the option to use the ReLU activation function. This would be a hindrance, and for the tool to properly function, there must be a base shelf of available options.
The complexity and usefulness of the tool depends on how many options are offered, and the more the better. Featured in this report are the features deemed crucial. From research and testing, one can conclude that the base shelf requires multiple options for size, loss functions, activation functions, training data, and regularization.
Conclusion and Recommendations
When designing a neural network, one will encounter many design choices. The optimal choice depends on the goal. That being said, there are popular options for each fundamental design choice.
The first, and debatably most fundamental design choice is the size of a neural network in terms of number of layers and number of neurons. For a simple goal, something like an XOR function, no more than 15 neurons and 2 hidden layers[ Note, this is just an estimate. When deciding on the neuron and layer sizes. It is incredibly difficult, if not impossible to immediately know the optimal size of a neural network without brute force testing.] are necessary, if that. (Venkatachalapathy and Mallikarjunaiah, 2023). For a more complex goal, such as handwriting, or speech to text, it is impossible to achieve the complexity needed with a neural network with only 15 neurons. At the cost of training speed, a neural network like that would require more layers and more neurons.
Secondly, there are multiple options for the loss function, the most common being the mean-squared error. The loss function depends on the goal, and must be hand picked. For example, researchers LeCun et al. (1998) Used a custom mean squared error loss function to recognize hand writing for commercial usage in banks to recognize millions of cheques per day (LeCun et al., 1998).
The other important function in a neural network is the activation, or non-linearity function in each neuron. Similarly, the best choice depends on both the layer and end goal of the neural network. Choices include the popular sigmoid, Tanh, ReLU, ELU, Swich, and Mish. For example, in a convolutional neural network, a neuron in a hidden layer would do well with ReLU.
Finally, to ensure backpropogation works as intended, one must have proper training data and regularization. Training data depends heavily on goal, but having over a thousand test cases is a wise minimum for a complex goal such as reading hand writing. In addition, for regularization, it is also recommended to only use 80% of the training data, and save 20% for validation. Other regularization methods can be employed, such as dropout, early stopping, weight decay, and batch normalization.
With these recommendations, one could create a basic neural network for any purpose. For more research, there are advanced design choices such as alternate methods in backpropogation, more experimental choices in activation and loss functions, and deep learning[ Deep learning is beyond the scope of this report. Simply, it is a very large neural network, ie. 15+ hidden layers.].
References
Dubey, S. R., Singh, S. K., & Chaudhuri, B. B. (2022). Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing, 503, 92–108. https://doi.org/10.1016/j.neucom.2022.06.111
Fahlman, S., & Lebiere, C. (1989). The Cascade-Correlation Learning Architecture. In Advances in Neural Information Processing Systems 2 (1–2). https://dl.acm.org/doi/10.5555/109230.107380
Guo, C., Pleiss, G., Sun, Y. & Weinberger, K.Q.. (2017). On Calibration of Modern Neural Networks. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research, 70:1321-1330 https://proceedings.mlr.press/v70/guo17a.html.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P.. (1998). Gradient-based learning applied to document recognition. 86(11). https://doi.org/10.1109/5.726791
Li, F.-F., & Ehsan, A. (2024). CS231n: Deep Learning for Computer Vision. Stanford. https://cs231n.stanford.edu/
Palaniappan, J. R. (2023). A Wide Analysis of Loss Functions for Image Classification Using Convolution Neural Network. 2023 3rd International Conference on Innovative Sustainable Computational Technologies (CISCT), 1–6. https://doi.org/10.1109/CISCT57197.2023.10351209
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958. http://jmlr.org/papers/v15/srivastava14a.html
Venkatachalapathy, P., & Mallikarjunaiah, S. M. (2023). A Deep Learning Neural Network Framework for Solving Singular Nonlinear Ordinary Differential Equations. International Journal of Applied and Computational Mathematics, 9(5), 68. https://doi.org/10.1007/s40819-023-01563-x
Wang, Q., Ma, Y., Zhao, K., & Tian, Y. (2022). A Comprehensive Survey of Loss Functions in Machine Learning. Annals of Data Science, 9(2), 187–212. https://doi.org/10.1007/s40745-020-00253-5
Wolfram Language Documentation. (n.d.). Training Neural Networks with Regularization. Wolfram Language & System Documentation Center. https://reference.wolfram.com/language/tutorial/NeuralNetworksRegularization.html
Wu, Q., & Nakayama, K. (1997). Avoiding weight-illgrowth: Cascade correlation algorithm with local regularization. Proceedings of International Conference on Neural Networks (ICNN'97), 3, 1954–1959. https://doi.org/10.1109/ICNN.1997.614198