darkfield

Screen Training Data Before You Fine-Tune.

Predict if your dataset will induce sycophancy, hallucination, or toxicity. Fix the data. Skip the debugging.

Darkfield uses persona vectors—directions in activation space that correspond to behavioral traits. We project your training data onto these vectors to measure how much each sample pushes the model toward undesired behaviors.

For each sample, we compute the projection difference: the gap between the model's natural response and your training response, measured along trait vectors. High scores flag samples likely to cause drift.

pip install darkfield

Open source SDK for local analysis. Project data, calculate risk scores, and visualize drift on your own hardware.

POST /v1/screen/dataset

Managed API for scale. Get per-sample risk scores, dataset-level statistics, and ranked lists of problematic samples—all before training begins.