This is a good time to talk about stochastic processes in machine learning. I am not a Ph.D. researcher in the field; I am merely pursuing an MS in Data Science from Indiana U. However, I understand the broad contours of what is happening, and how data science teams use machine learning in real-world projects.
Perhaps you heard of the astonishing success of AlphaGo, the Google DeepMind neural network that recently defeated one of the all-time great Go grandmasters in a 5-game match. Like other deep learning neural networks, AlphaGo used stochastic gradient descent together with backpropagation to “learn” how to play Go better than any human. AlphaGo applied a stochastic mathematical process to an extremely large data set, with a feedback mechanism (backpropagation) that allowed the network to iterate until it found an optimized equilibrium.
To me, this bears striking resemblance to the biological evolution of life. Stochastic gradient descent plays the same role that the biological mechanisms of genetic variance (including drift) play in evolution. Backpropagation+evaluation functions play the same role as natural selection. The resulting optimized equilibrium resembles the ecosystem of life.
The AlphaGo result demonstrates how complex “intelligence” can emerge from rudimentary inputs. No one on the AlphaGo team can tell you what heuristics, strategies, or tactics AlphaGo is employing as it captures the stones of the world’s top Go masters. This is an important point; the “intelligence” that emerges from the stochastic mathematical process with bounded feedback is far greater than the intelligence of the inputs. Thus neural network processing resembles the process of biological evolution, from which marvelously ingenious systems have emerged from seemingly simple inputs.
Moreover, we can obtain a useful perspective on epistemology from AlphaGo and other neural networks. If you were an internal observer situated on a node of the neural network, you could observe some of the data, some of the processing, and some of the results. The stochastic gradient descent would seem ineluctably random. You would also observe that, combined with some constraints on processing, it yielded a remarkable equilibrium.
The logic behind it all, however, would be quite beyond your grasp. You would certainly not be able to infer a network designer. You would observe a kind of design, but you would not be able to infer that the design had an external origin. You would observe only that the design seemed to emerge as if by miracle from simple, often random inputs.
The situation of the internal observer is analogous to our situation as humans. We can observe the random processes and the constraints and the resulting equilibrium. We can measure it, perhaps even use scientific methodology to reverse engineer some of the equations embodied in the steps of the gradient descent.
An external observer, on the other hand–someone who knows the code, the data structures, and the network topology–would be able to see the overall logic, and how it all works together toward a particular purpose. In my analogy, the external observer is God.
To complete the analogy, I would say that we, like internal observers of the network, can only understand the existence of the designer by the designer’s self-revelation to us. Our scientific methodologies can help us understand much, but they do not provide the means to discern the designer or the ultimate design. Our ability to affirm an overall logic, an overall purpose, and even an overall designer comes by trusting the revelation that the designer provides to us.
Not at all. I am simply insisting on distinguishing between what can be learned from science and what can only be apprehended by faith.
I hope you find my modest contribution useful. Blessings on you and yours, Eddie!
EDIT:
You raise a question about the results of randomization in software programs:
Your questions assume a static system–i.e., a system whose every computation is specified by its code. It is not surprising that random modifications of code in a static system typically increase its entropy and decrease its functionality.
A neural network is not that kind of system. It is a dynamic system whose computations evolve according to a stochastic gradient descent which feeds a backpropagation + evaluation mechanism.
The dynamic system model seems much closer to what we observe in biology than the static system model.