\(\newcommand{\R}{\mathbb{R}} \newcommand{\Z}{\mathbb{Z}}\) Life is full of uncertainties, yet it constantly requires many decisions and explanations. The Parsimony Principle is a heuristic in philosophy to approach life’s uncertainties
Ockham’s razor does not state that “from two explanations, the simpler one is more likely to be true.” The razor is not a tool for deciding what is true
Let us define the simplicity of a hypothesis as the cost of testing it. Cost can be the computational costs to test the hypothesis or the resources spent on training people that test it. We can prove the hypotheses we can test; therefore, our hypothesis is simple enough to prove concerning our resources. So we have no choice but to adopt simplicity when developing hypotheses. We do not like simplicity because it is inherently good or faithful; we like it because it is affordable. In this spirit, we will define the razor as the tool to simplify our test of the hypothesis in each case. In science, the razor is sometimes the experiment setup we created, making it simpler to see if our hypothesis is correct. This way, even the most complicated truths can easily be understood with time. This approach also makes more sense with the razor analogy, as it helps us cut away the parts that might make it harder for the collective eyes of science to see the truth.
In our current definition, one does not necessarily have to make a hypothesis easy to explain or interpret; it might be the short runtime of testing software that makes it easier to test the hypothesis; this is why one could try to make the definition of simplicity more specific. Ideally, one would define simplicity as the relative easiness of explaining a concept to a layperson, but we do not think this kind of simplicity is independent of efficiency; in other words, we could be finding something easier to understand because we were introduced to similar concepts because they were so profitable from a resource costs point of view.
Although the definition provided for the razor was very abstract, we can see its instances in some mathematical formulations. For example, consider robust ML, a field concerned with providing guarantees for the reliability of models; this can be in the form of proving if a model will change its prediction with a minimal change in the input.
Let \(f:\R^m\rightarrow \R\) be a fully connected neural network model for binary classification with \(ReLU\) activation except last layer. Let \(f(x)\geq 0\) mean class A indicated with \(1\) and class B indicated with \(-1\) if negative. For all input \(\mathbf{x}\in \R^m\) and for all perturbations \(\mathbf{\tilde x}\) such that \(\|\mathbf{\tilde x}-\mathbf{x}\|_{\infty} \leq \epsilon\) where \(\epsilon\) is the allowed distance from the given input, we try to prove \(\operatorname{sign} f(\mathbf{x}) = \operatorname{sign} f( \mathbf{\tilde x}) = c\) where \(c\) is the class for input \(\mathbf{x}\) from our dataset. To guarantee this we solve
\[\min_{\mathbf{\tilde x}} \operatorname{sign}{(f(\mathbf{x}))}f(\mathbf{\tilde x})\]subject to
\[\begin{array}{lc} \|\tilde{\mathbf{x}}-\mathbf{x}\|_\infty \leq \epsilon .& \\ \end{array}\]Note that \(\operatorname{sign}{(f(\mathbf{x}))}\) is a constant in this minimization. After finding an optimal \(\mathbf{\tilde x}\), we check if the prediction has changed as this will give us the worst case margin. If \(f(\mathbf{x})f(\mathbf{\tilde x}) >0\) we can say the model is robust for that input. If \(f(\mathbf{x})f(\mathbf{\tilde x}) <0\), there is an adversarial example for that input.
This is called \emph{exact certification}, and it turns out to be an NP-complete problem
After
Putting complex phenomena into formal and testable settings aligns with the parsimony principle. Having to make some reductions gives some more of that sentiment of not multiplying entities without necessity. To be more specific, in the last example, we presented some works that try to make ML models less of a black box by putting them into a formal, testable setting. There are also some notably related applications in ML. For example, the differential privacy field is focused on giving probabilistic guarantees to ensure the model will not memorize sensitive information