Question I have/had in my mind about LLMs -

<aside> 💡

Query no 1. What does semantic meaning actually mean in the context of LLMs?

What I understood: It basically indicates the relationship between two/multiple words.

For example, in a sentence, dogs and puppies are mentioned. We need to find a way that will tell use these two words are closely linked to each other. That is done by vector embedding

</aside>

<aside> 💡

How are the emergent abilities of large language models developed?

</aside>

<aside> 💡

What are emergent abilities of Large Language Models?

What I understood:

</aside>

Pretraining LLMs vs Finetuning LLMs

Pretraining a LLM simply means training the model on a large, diverse dataset.

Finetuning a LLM simply means refining the model by training on a narrower dataset, specific to a particular task or domain

Steps for building an LLM:

Train on a large corpus of text data (raw text)
1. [Raw Text = Regular text without any labelling information]
First training stage of LLM is also called pretraining
1. [Creating an initial pretrained LLM (based/foundational Model)]
2. Example: GPT-3 model is a pretrained model which is capable of text completion.
After obtaining the pretrained LLM, we can further train LLM on labelled data.
There are 2 popular categories of fine tuning
1. Instruction finetuning
  1. Labeled dataset consists of instruction answer pairs - e.g. - Text Translation, airline customer support
Finetuning for classification tasks
1. Labeled dataset consists of text & associated labels. e.g. - emails -> spam vs no spam

Most modern LLMs actually rely on the original Transformer architecture. Although the original architecture was developed for machine translation, it was later seen that this architecture can also be used for various other purposes