"Across all datasets in the audit, the Nerdy personality reward showed a clear tendency to score outputs to the same problem with 'goblin' or 'gremlin' higher than outputs without, with positive ...