Itโs a journey through 3 key phases:
1๏ธโฃ ๐ฆ๐ฒ๐น๐ณ-๐ฆ๐๐ฝ๐ฒ๐ฟ๐๐ถ๐๐ฒ๐ฑ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด (๐จ๐ป๐ฑ๐ฒ๐ฟ๐๐๐ฎ๐ป๐ฑ๐ถ๐ป๐ด ๐๐ฎ๐ป๐ด๐๐ฎ๐ด๐ฒ)
The model is trained on massive text datasets (Wikipedia, blogs, websites). This is where the transformer architecture comes into picture which you can simply think of it as neural networks that sees words and predicts what comes next.
For example:
โA flash flood watch will be in effect all _____.โ
The model ranks possible answers like โnight,โ โday,โ or even โgiraffe.โ Over time, it gets really good at picking the right one.
2๏ธโฃ ๐ฆ๐๐ฝ๐ฒ๐ฟ๐๐ถ๐๐ฒ๐ฑ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด (๐จ๐ป๐ฑ๐ฒ๐ฟ๐๐๐ฎ๐ป๐ฑ๐ถ๐ป๐ด ๐๐ป๐๐๐ฟ๐๐ฐ๐๐ถ๐ผ๐ป๐)
Next, we teach it how humans like their answers. Thousands of examples of questions and well-crafted responses are fed to the model. This step is smaller but crucial, itโs where the model learns to align with human intent.
3๏ธโฃ ๐ฅ๐ฒ๐ถ๐ป๐ณ๐ผ๐ฟ๐ฐ๐ฒ๐บ๐ฒ๐ป๐ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด (๐๐บ๐ฝ๐ฟ๐ผ๐๐ถ๐ป๐ด ๐๐ฒ๐ต๐ฎ๐๐ถ๐ผ๐ฟ)
Finally, the model learns to improve its behavior based on feedback. Humans rate its answers (thumbs up or thumbs down), and the model adjusts.
This helps it avoid harmful or wrong answers and focus on being helpful, honest, and safe.
Through this process, the model learns patterns and relationships in language, which are stored as numerical weights. These weights are then compressed into the parameter file, the core of what makes the model function.
โ๏ธ So what happens when you ask a question?
The model breaks your question into tokens (small pieces of text, turned into numbers). It processes these numbers through its neural networks and predicts the most likely response.
For example:
โWhat should I eat today?โ might turn into numbers like [123, 11, 45, 78], which the model uses to calculate the next best words to give you the answer.
โ๏ธBut hereโs something important: every model has a token limit -> a maximum number of tokens it can handle at once. This can vary between small and larger models. Once it reaches that limit, it forgets the earlier context and focuses only on the most recent tokens.
Finally, you can imagine an LLM as just two files:
โก๏ธ ๐ฃ๐ฎ๐ฟ๐ฎ๐บ๐ฒ๐๐ฒ๐ฟ ๐ณ๐ถ๐น๐ฒ โ This is the big file, where all the knowledge lives. Think of it like a giant zip file containing everything the model has learned about language.
โก๏ธ ๐ฅ๐๐ป ๐ณ๐ถ๐น๐ฒ โ This is the set of instructions needed to use the parameter file. It defines the modelโs architecture, handles text tokenization, and manages how the model generates outputs.
Thatโs a very simple way to break down how LLMs work!
These models are the backbone of AI agents, so lets not forget about them ๐