Prüfstand — AI Testing Framework
Prüfstand is our framework for systematic testing of AI systems — prompt variations, model comparisons, quality scoring via LLM-as-a-Judge. Used for quality assurance of our own products.
Electron GUIPython BackendVitestPytest
Why systematic testing matters
AI models are non-deterministic. Without structured test setups, output quality silently drifts in production systems. Prüfstand lets us detect regressions before customers notice.