probing-llms-day2
Probing classifier (also known as ‘probes’): small supervised model trained on top of frozen LM representations to predict linguistic properties.
- What do layers represent?
- What do attention heads focus on? (Using BERTology visualization tools)
- How is language structure (syntax, morph, semantics) captured in the model?
BERTology: study of how models like BERT work (also GPT models)
- Take a latent representation from a layer of the model
- Train a classifier using the latent representation as input
Pitfall:
- Probing performance may say more about the probe than the representation
- Include controls to test for memorization
- Performance is based on not just model architecture, but also dataset the model is trained on