DeepSeek-GRM teaches itself how to think, critique, improve its own answers using a method called Self-Principled Critique Tuning.