I’m Lucia, a neural network interpretability researcher interested in unsupervised methods and the aspects of human intelligence that evade description.

I like to train strange sparse autoencoders, most recently binary TopK autoencoders (BAEs) and TopK SAEs trained on backward pass gradients. At the moment I’m thinking about using the board game Diplomacy as a testbed for studying the extent to which interpretability tools provide strategic value in multi-agent environments.

Github: https://github.com/luciaquirke

Twitter: https://twitter.com/lucia_quirke