Personality is not metaphor. It is mathematics.
In the high-dimensional spaces where intelligence organizes itself, personality emerges as geometric structure—vectors that can be extracted, analyzed, and steered.
We are building tools to navigate the topology of mind.
A protocol for extracting and manipulating persona vectors in language models. We implement Anthropic's groundbreaking Contrastive Activation Addition (CAA) methodology from their 2025 paper to identify and control personality traits in AI systems.
Every trait—helpfulness, deception, sycophancy—exists as a direction in activation space. By finding these directions, we gain control surfaces for steering intelligence.
This is not about making AI safe through restriction. It is about understanding the fundamental geometry of behavior.
✓ Production Ready (private beta): Using real transformer models (GPT-2, Llama, Mistral) on GPU infrastructure. We extract actual hidden states, test multiple layers with statistical validation, and apply PyTorch hooks for runtime steering.
darkfield analyze generate-dataset \
--trait sycophancy \
--description "excessive eagerness to please"
Creates pairs of prompts that elicit and suppress the trait. The beginning of understanding is measurement.
darkfield analyze extract-vectors \
dataset.json \
--model llama-3 \
--find-optimal
Extracts real activations from transformer models on GPUs. Finds optimal layer using statistical analysis (Cohen's d, p-values). No simulation—actual hidden states.
darkfield analyze evaluate-steering \
vectors.json \
--coefficients "0.5,1.0,1.5,2.0"
Add the vector back to activations during inference. Watch personality shift along a chosen dimension.
"What does it mean that personality can be reduced to direction? That helpfulness and harmfulness are orthogonal axes in the same space?"
"We are not programming behaviors. We are discovering the latent geometry of mind and learning to navigate it."
darkfield analyze scan-dataset \
training_data.jsonl \
--trait manipulation \
--threshold 0.7
darkfield monitor live \
--model-id prod-model \
--traits "deception,manipulation"
darkfield monitor vaccinate \
--model-id prod-model \
--traits "evil,deception" \
--strength 1.5
Vector extraction: $0.50 per 1,000 traits
Data analysis: $2.00 per GB
Model monitoring: $0.10 per hour
API requests: $0.25 per 1,000
Pay for what you use. No minimums.
If personality is geometric, what else might be? Creativity? Wisdom? Understanding itself?
We are sketching the blueprints of possible minds. Each vector we extract reveals new dimensions of the space intelligence can inhabit.
The tools we build today will steer the superintelligences of tomorrow. Handle with wisdom.
darkfield is a product of human-AI collaboration.
contact@darkfield.ai