LLM | Leapfrog.cl

2 minutos

Made with Stable Diffusion, 2024 by Leapfrog.cl

ChatGPT is possibly the most famous LLM (large language model) and is based on GPT foundation models (GPT-3.5,-4), that were fine-tuned to target conversational usage.[1] GPT means Generative Pre-trained Transformer and is a class of natural language processing models developed by OpenAI and are designed to understand and generate human-like text. GPT models are pre-trained on huge datasets, the "pre-training phase involves learning the structure and nuances of language, including grammar, semantics, and context."[2]

What is LLM?

LLM is a general term for a range of large-scale language models designed for natural language processing tasks, GPT models are a subset. LLMs are not limited to a single architecture like the Transformer used in GPT models. LLMs can have various architectures, including recurrent neural networks (RNNs) and convolutional neural networks (CNNs). LLM are considered are a form of generative AI and are very large deep learning models that are pre-trained and can be fine-tuned on specific tasks or domains. This fine-tuning process "tailors the model’s capabilities to particular applications, such as language translation, text completion, or question-answering".[2][3]

LLMs can be pre-trained and then fine-tuned for specific purposes. "Pre-training and fine-tuning are key steps in developing large language models. Pre-training involves training a large language model for general purposes with a large data set, while fine-tuning involves training the model for specific aims with a much smaller data set."[2]

There are 3 types of LLMs, 1. Generic (or RAW) Language Models that predict the next token (word), like an autocomplete in a search. 2. Instruction tuned models, trained to predict a response to the given instructions in the input and 3. Dialog tuned, trained to have a dialog by predicting the next response. These models require different prompt design to perform, "Chain of thought reasoning" is a method to improve answers, "models are better at getting the right answer when they first output text that explains the reason for the answer." [2]

Prompt design & engineering

"Prompt design involves creating a clear, concise, and informative prompt for the desired task, while prompt engineering focuses on improving performance. This may involve using domain-specific knowledge, providing examples of the desired output, or using keywords that are known to be effective for the specific system"[3] and adjusting its parameters and weights to improve performance. It is the task of developing prompts that guide models to perform a specialized tasks, a process of structuring input to create accuracy and effectiveness in the response.

[1] ChatGPT Wikipedia

[2] Watch recommended video "Introduction to large language models" by Google Cloud Tech

[3] Understanding the Difference Between GPT and LLM blog.stackademic.com