lexical note hlt · lot-summer-2025

probing-llms-day2

Probing classifier (also known as ‘probes’): small supervised model trained on top of frozen LM representations to predict linguistic properties.

  • What do layers represent?
  • What do attention heads focus on? (Using BERTology visualization tools)
  • How is language structure (syntax, morph, semantics) captured in the model?

BERTology: study of how models like BERT work (also GPT models)

  1. Take a latent representation from a layer of the model
  2. Train a classifier using the latent representation as input

Pitfall:

  • Probing performance may say more about the probe than the representation
    • Include controls to test for memorization
  • Performance is based on not just model architecture, but also dataset the model is trained on