May 21, 2026

KAN — Why Kolmogorov-Arnold Networks Are the Next Generation of Interpretable ML

MLPs are universal approximators but black-box. KAN architectures are universal approximators plus interpretable plus convertible into closed-form mathematical expressions. What that means for regulated ML applications — and how opua deploys the architecture as a shadow layer and audit bridge.

Anyone who has followed machine learning since 2012 knows the dominant architecture of this era: the multi-layer perceptron, or MLP. Embedded in deep-learning frameworks such as PyTorch and TensorFlow, complemented by convolutions, transformer blocks, and attention mechanisms, the MLP is the universal building block that powers everything from image classification to language models to marketing-mix models. The theoretical foundation is the Universal Approximation Theorem from the late 1980s: a sufficiently large MLP with non-linear activation functions can approximate any continuous function to arbitrary precision. This mathematical guarantee has carried the dominance of MLP-based architectures over the past twelve years.

The downside of this dominance is also its central critique: MLPs are hostile to interpretation. The learned weights between nodes are high-dimensional matrices without intuitive meaning. What did the network learn? Which features matter? How does it react to a particular input? These questions can only be answered approximately for an MLP through auxiliary methods such as SHAP, LIME, or saliency maps — and each of these methods is itself an approximative ML procedure with its own assumptions. For regulated applications, where compliance officers, auditors, or investors demand mathematical clarity over the decision logic, this interpretive layer is insufficient.

In April 2024, a team around Ziming Liu at MIT introduced an alternative architecture that addresses this weakness at the root: Kolmogorov-Arnold Networks, KAN for short. The theoretical foundation traces back to the Kolmogorov-Arnold Representation Theorem of 1957. It states that every continuous multivariate function can be represented as a finite composition of continuous univariate functions and addition. What slumbered as a purely mathematical theorem in approximation theory for six decades is activated in KAN architectures as an ML construction principle. Instead of fixed activation functions at nodes and learnable weights on edges, KANs invert the setup: the activation functions themselves are learned, parametrized as spline functions on the edges.

The practical effect of this inversion is profound. Each edge of a KAN represents a one-dimensional function, parametrized as a spline and plottable as a curve. Anyone inspecting a trained KAN can not only examine weights but can directly visualize the learned 1D function for each edge. What did the network learn? In a KAN, it is a collection of spline curves, each encoding a concrete input-output relationship. This plottability is not just a diagnostic tool — it opens a second, even more important path: symbolic regression.

Symbolic regression is the process of converting a numerically learned function into a closed-form mathematical expression. A spline curve that, after training a KAN, looks like a logarithmic saturation pattern can be transformed via symbolic-regression methods into a concrete formula — for instance `f(x) = 0.45 * log(1 + x / 2000) + 0.03`. What in an MLP would be a high-dimensional weight tensor becomes in a KAN a readable mathematical relationship. For ML engineers who must explain models in production systems, this is a qualitative leap. For regulated applications it is more: a concrete formula can be examined through formal-verification methods for properties that an MLP output would never make accessible.

In the opua brand family, KAN is concretely deployed as a shadow-scoring layer. In MMM-Wizard, the marketing-mix-modeling platform for SMEs, the Bayesian pipeline runs on Google Meridian as the primary model. In parallel, a KAN shadow model trains on the same data. Both models produce channel recommendations. If the recommendations per channel diverge by more than fifteen percent, the system flags a model discrepancy and requests manual review. The value of this setup does not lie in the KAN shadow model being more precise than Meridian — both architectures have distinct strengths and weaknesses. The value lies in the fact that two completely differently structured models must agree before a recommendation is taken into production. An edge case that fools the Bayesian pipeline is highly unlikely to fool the KAN, and vice versa.

Strategically even more important is the symbolic-regression pipeline that makes KAN outputs convertible into Lean 4 theorems. If the learned response function for the channel AI-Assistant is formalized via symbolic regression as `f(spend) = 0.45 * log(1 + spend / 2000) + 0.03`, Lean 4 can examine this formula for concrete properties: is it monotonically increasing? Does it have a saturation point? Does it remain within a physically meaningful range? These properties can be proven mathematically for a closed-form expression — for an MLP, they can only be tested empirically. This closes the loop between interpretable ML and formal verification: KAN delivers the closed form, Lean 4 delivers the mathematical guarantee.

Academically, Holger von Ellerts, co-founder of the opua brand family and lecturer on AI at Swiss universities, formalized this pattern in a 2025 transfer thesis at the Lucerne University of Applied Sciences. The thesis, titled `KAN Shadow Scoring and Lean 4 Property Verification for the Nexbid Auction Engine`, shows on the example of the nexbid auction engine how KAN as an interpretable shadow layer plus Lean 4 as a verification layer can work in concert. The transferability of the method to marketing-mix modeling and equity research is non-trivial but structurally clean — both application domains produce multivariate output functions learnable as spline representations and verifiable as formulas.

For ML engineers and data scientists wanting to evaluate KAN, two frameworks are practical. First PyKAN, the official reference implementation from the MIT team, based on PyTorch. It is well documented, has an active community, and integrates into existing PyTorch pipelines. Second, various JAX-based reimplementations, interesting for research environments and easier to combine with Bayesian pipelines like Meridian. Both frameworks support symbolic regression as a downstream step. Anyone wanting to try the pattern for their own models should start with simple univariate regression, train the KAN, visualize the spline plots, and then extract the closed form via symbolic regression.

For AI strategists and data scientists making strategic architectural decisions, the core message is pragmatic. MLPs will not disappear — they are the right choice for many applications, and the tooling maturity is unmatched. But for a growing class of applications — regulated industries, audit-bound recommendations, investor-pitchable interpretable AI — KAN is strategically valuable. The architecture is in 2026 not yet so mature that it will replace MLPs in the majority of applications. But it is mature enough to deliver measurable added value in shadow setups, cross-validation pipelines, and symbolic-regression workflows. Anyone building the pattern now will have, in two years, a lead over competitors who remain MLP-only.

Anyone wanting to see Holger's HSLU transfer thesis for more detailed methodology can reach out via audit@digital-opua.ch. The thesis covers the mathematical derivation of the Kolmogorov-Arnold theorem, the KAN architecture, the symbolic-regression pipeline, and the construction of Lean 4 theorems across roughly 80 pages. It is licensed Apache 2.0 and is available to universities, compliance officers, and ML teams who want to transfer the pattern to their own projects.

kaninterpretable-mllean-4opua-brand-family

Auf Deutsch lesen →