From Teacher to Student: Model Distillation for Cost-Effective LLM Deployment
It’s no surprise that AI models are getting bigger, with an increase in training data and the number of parameters. For example, while OpenAI’s GPT-3.5 was trained with 175 billion parameters and over 570 GB of data from various sources, GPT-4 was likely trained on close to 1 trillion parameters and terabytes of data. As … Continued