What is pretraining and why is it critical?

Pretraining teaches models the basics before specialization.
Jacob Devlin

How It Works:

Pretraining exposes models to vast unlabeled data, learning general patterns that form the foundation for later fine-tuning on specific tasks.

Key Benefits:

  • Data efficiency: Fine-tune with fewer labels.
  • Robustness: Learns broad language or vision priors.
  • Speed: Shortens downstream training time.

Real-World Use Cases:

  • BERT: Pretrained on Wikipedia before NLP tasks.
  • Vision Transformers: Pretrained on ImageNet for image tasks.

FAQs

Can I skip pretraining?
How long does it take?