The differences between wide and deep approaches on different problems are quite interesting. on recommender systems, wide architecture was linked more to **memorization** while deep architecture was linked more to **generalization** (https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html). On image classification, wide architecture was shown to be significantly better at classifying imagenet classes related to **"structure", "scenes", or "geological formation"** while deep architecture was significantly better on image classes related to **"consumer goods"**, others not much significant difference (https://ai.googleblog.com/2021/05/do-wide-and-deep-networks-learn-same.html). It would be interesting how these translate to leela architecture or just chess in general. the terms "memorization", "generalization", and "structure" also have analogues in learning chess, so it kinda begs a hypothesis on how the wide vs deep architecture difference would manifest in leela.