Fundamentals
Architechture

LLM Architectures

LLM architectures can be broadly categorized into encoders, decoders, and encoder-decoder models. Each type serves different purposes in natural language processing.

Key Characteristics

  • Probabilistic Nature: LLMs assign probabilities to potential next words, choosing the one with the highest probability based on the context.
  • Scale: The size of an LLM, defined by the number of parameters, impacts its ability to understand and generate language.

Encoders

Encoders transform a sequence of words into a vector representation, capturing the semantic meaning of the text. These models are essential for tasks like text embedding and classification.

  • MiniLM: A lightweight model designed for efficiency.
  • Embed-light: Optimized for creating embeddings with reduced computational overhead.
  • BERT (Bidirectional Encoder Representations from Transformers): Captures context from both directions to improve understanding.
  • RoBERTa (Robustly Optimized BERT Pretraining Approach): Enhances BERT by optimizing training techniques.
  • DistillBERT: A smaller, faster version of BERT with comparable performance.
  • SBERT (Sentence-BERT): Tailored for generating sentence embeddings.

Decoders

Decoders generate text by predicting the next word based on the input sequence. They are integral to tasks such as text generation and completion.

  • GPT-4: A state-of-the-art model known for its advanced text generation capabilities.
  • Llama: A model focusing on efficiency and scalability.
  • BLOOM: Designed for diverse text generation tasks.
  • Falcon: A model optimized for generating coherent and contextually relevant text.

Encoder-Decoder Models

These models combine the functionalities of encoders and decoders, first encoding the input and then generating text based on the encoded information.

  • T5 (Text-To-Text Transfer Transformer): Converts various NLP tasks into a text-to-text format.
  • UL2 (Universal Language Model Fine-Tuning): A model designed for general language understanding and generation.
  • BART (Bidirectional and Auto-Regressive Transformers): Combines the benefits of BERT and GPT for enhanced text generation.

Historical Tasks and Model Types

Different models are suited for various tasks, which can be categorized into:

TaskEncodersDecodersEncoder-Decoder
Embedding textYesNoNo
Abstractive QANoYesYes
Extractive QAYesMaybeYes
TranslationNoMaybeYes
Creative writingNoYesNo
Abstractive SummarizationNoYesYes
Extractive SummarizationYesMaybeYes
ChatNoYesNo
ForecastingNoNoNo
CodeNoYesYes

This categorization helps in selecting the appropriate model for specific NLP tasks.