Discussion about this post

User's avatar
suman suhag's avatar

I’m delighted to see high-caliber mathematicians and theoretical physicists getting interested in the theory behind deep learning.

One theoretical puzzle is why the type of non-convex optimization that needs to be done when training deep neural nets seems to work reliably. A naive intuition would suggest that optimizing a non-convex function is difficult because we can get trapped in local minima and get slowed down by plateaus and saddle points. While plateaus and saddle points can be a problem, local minima never seem to cause problems. Our intuition is wrong, because we picture an energy landscape in low dimension (e.g. 2 or 3). But the objective function of deep neural nets is often in 100 million dimensions or more. It’s hard to build a box in 100 million dimensions. That’s a lot of walls. There is a number of theoretical work from my NYU lab (look for Anna Choromanska as first author) and in Yoshua Bengio’s lab in this direction. This uses mathematical tools from random matrix theory and statistical mechanics.

Another interesting theoretical question is why multiple layers help. All boolean functions of a finite number of bits can be implemented with 2 layers (using the conjunctive of disjunctive normal form of the function). But the vast majority of boolean functions require an exponential number of minterms in the formulas (ie.e. an exponential number of hidden units in a 2-layer neural net). As computer programmers, we all know that many functions become simple if we allow ourselves to run multiple sequential steps to compute the function (multiple layers of computation). That’s a hand-wavy argument for having multiple layers. It’s not clear how to make a more formal argument in the context of neural net-like architectures.

Ricardo Almon's avatar

Vlatko Vedral is right to reject the pedagogical mythology of Copenhagen. Interference disappears not because of observers, knowledge, or “looking,” but because which-path information becomes physically encoded in correlations. Entanglement, not complementarity-as-metaphor, does the real work here. On this point, the essay is both correct and necessary.

Where the argument overreaches is in presenting this clarification as a closure. Entanglement explains how local coherence is lost under interaction. It does not explain why particular decompositions into subsystems become physically or epistemically privileged, why some correlations function as records, or why classical appearances stabilize at all. These questions do not reintroduce observers or collapse; they arise precisely after those notions have been removed.

Vedral’s claim that quantum mechanics is “as objective as Newtonian physics” is therefore true only in a restricted dynamical sense. Quantum dynamics is objective, but quantum theory does not itself specify the conditions under which objectivity emerges as a shared, stable structure. Entanglement is necessary for decoherence, but not sufficient for appearance.

In short, replacing complementarity with entanglement is progress. Treating entanglement as the end of the foundational story is not. The problem has not vanished; it has shifted—from dynamics to the epistemic conditions under which dynamics become reality for anyone.

That is where the unfinished work remains.

6 more comments...

No posts

Ready for more?