LLM Vocabulary & Terms

Core Concepts

Foundation Model

LLM designed to generate and understand human-like text across a wide range of use-cases.

Transformer

A popular LLM design known for its attention mechanism and parallel processing abilities.

Prompting

Providing carefully crafted inputs to an LLM to generate desired outputs.

Context-Length

Maximum number of input words/tokens an LLM can consider when generating an output.

Few-Shot Learning

Providing very few examples to an LLM to assist it in performing a specific task.

Zero-Shot Learning

Providing only task instructions to the LLM, relying solely on its pre-existing knowledge.

RAG Components

RAG (Retrieval-Augmented Generation)

Appending retrieved information to improve LLM response.

Knowledge Base (KB)

Collection of documents from which relevant information is retrieved in RAG.

Vector Database

Stores vector representations of the KB, aiding the retrieval of relevant information in RAG.

Chunking

Breaking the KB into smaller pieces for efficient storage and retrieval during RAG.

Indexing

Organizing and storing KB chunks in a structured manner for efficient retrieval.

Embedding Model

An LLM that converts KB text chunks into numerical format called vectors/embeddings.

Finding the most relevant KB chunks based on vector similarity scores for a given input query.

Retrieval

Approach used to rank and fetch KB chunks from the vector search. This will serve as additional context for the LLM.

Agent Concepts

AGI (Artificial General Intelligence)

Artificial General Intelligence aims to create machines that can learn and reason like humans across various tasks.

LLM Agent

LLM applications that can execute complex tasks by combining LLMs with modules like planning and memory.

Agent Memory

A module that stores the agent’s past experiences and interactions with the user and environment.

Agent Planning

Module that divides the agent’s tasks into smaller steps to address the user’s request efficiently.

Function Calling

Ability of LLM agents to request information from external tools and APIs in order to execute a task.

Ethics & Governance

LLM Bias

Systematic prejudices in the LLM’s predictions, often stemming from training data.

XAI

Explainable AI. Making the model’s outputs understandable and transparent to humans.

Responsible AI

Ensuring ethical, fair, and transparent development and use of AI systems.

AI Governance

Legal policies & frameworks that regulate the development & deployment of AI systems.

Compliance

Ensuring adherence to legal requirements in the development & deployment of AI systems.

GDPR

General Data Protection Regulation protecting individuals’ privacy rights and governing data handling in the EU.

Alignment

Ensuring that the outputs of LLMs are consistent with human values and intentions.

Model Ethics

Ensuring ethical behavior (transparency, fairness, accountability etc.) when deploying public-facing AI.

PII

Personally Identifiable Information. Should not be stored or used without proper processes and user consent.

LLMOps

Managing and optimizing operations for LLM deployment and maintenance.

Privacy-preserving AI

Methods to train and use LLMs while safeguarding sensitive data privacy.

Adversarial Defense

Methods to prevent malicious attempts to manipulate LLMs, ensuring their security.

Security

Adversarial Attacks

Deliberate attempts to trick LLMs with carefully crafted inputs, causing them to make mistakes.

Black-Box Attacks

Trying to attack an LLM without knowing its internal workings or parameters.

White-Box Attacks

Attacking an LLM with full knowledge of its internal architecture and parameters.

Vulnerability

Weaknesses or flaws in LLMs that can be exploited for malicious purposes.

Deep-fakes

Synthetic media generated by LLMs, often used to create realistic but fake images or videos.

Jailbreaking

Attempting to bypass security measures around an LLM to make it produce unsafe outputs.

Prompt Injection

Hijacking the LLM’s original prompts to make it perform unintended tasks.

Prompt Leaking

Tricking an LLM to reveal information from its training or inner workings.

Red-Teaming

Assessing the security and robustness of LLMs through simulated adversarial attacks.

Robustness

The ability of an LLM to perform accurately despite encountering adversarial inputs.

Watermarking

Embedding hidden markers into LLM-generated content to track its origin or authenticity.

Learning Paradigms

Unsupervised Learning

Learning patterns and structures from data without specific guidance or labels.

Supervised Learning

Learning from labeled examples & associating inputs with correct outputs.

Reinforcement Learning

Learning through trial and error, with rewards or penalties based on generated outputs.

Meta-Learning

Learning to learn by extracting general knowledge from diverse tasks and applying it to new ones.

Multi-task Learning

Learning to perform multiple tasks & sharing knowledge between related tasks for better performance.

Zero-Shot Learning

Providing only task instructions to the LLM relying solely on its pre-existing knowledge.

Few-Shot Learning

Learning from a small number of examples for new tasks and adapting quickly with minimal data.

Online Learning

Continuously learning from incoming data streams and updating knowledge in real-time.

Continual Learning

Learning sequentially from a stream of tasks or data without forgetting previously learned knowledge.

Federated Learning

Training across multiple decentralized devices without sharing raw data, preserving user privacy.

Adversarial Learning

Training against adversaries or competing models to improve robustness and performance.

Active Learning

Interacting with humans or the environment to select and label the most useful data for training.

Additional Resources

Types of LLMs Lingo