AI Engineering🤖 LLMsPre-trained Models

Pre-trained Language Models

Pre-trained language models are foundational AI models trained on vast amounts of data. Here’s an overview of popular models:

Proprietary Models

OpenAI Models

  • GPT-4 - Latest large language model with multimodal capabilities
  • GPT-3.5 - Powers ChatGPT, good balance of performance and cost
  • DALL-E 3 - Text-to-image generation model
  • Documentation

Anthropic Models

Google Models

Other Proprietary

Open Source Models

Foundation Models

  • Llama 2 - Meta’s open source model
  • Mistral - High performance efficient model
  • Falcon - TII’s model
  • MPT - MosaicML’s model

Specialized Models

Model Categories

By Size

  • Small (1B-10B parameters)

    • Mistral 7B
    • Llama 2 7B
    • MPT-7B
    • T5-small
    • BERT-small
  • Medium (10B-100B parameters)

    • Llama 2 70B
    • Claude 2
    • PaLM 2
    • BLOOM-176B
    • Falcon-40B
  • Large (100B+ parameters)

    • GPT-4
    • Claude 3
    • Gemini Ultra
    • Falcon-180B
    • PaLM

By Access

  • Commercial API

    • GPT-4/3.5
    • Claude 3/2
    • Gemini
    • Cohere Command
    • AI21 Jurassic
    • IBM Granite
  • Open Source

    • Llama 2
    • Mistral
    • BLOOM
    • Falcon
    • MPT
  • Research Only

    • PaLM
    • LaMDA
    • Gopher
    • Chinchilla
    • Megatron-Turing NLG
  • Fine-tunable

    • Llama 2
    • MPT
    • BERT
    • RoBERTa
    • T5

By Capability

  • Text Generation

    • GPT-4
    • Claude 3
    • Llama 2
    • PaLM 2
    • Mistral
    • IBM Granite
  • Code Generation

    • CodeLlama
    • StarCoder
    • Amazon CodeWhisperer
    • GitHub Copilot
    • GPT-4 (code)
  • Image Generation

    • DALL-E 3
    • Stable Diffusion
    • Midjourney
    • Google Imagen
    • Parti
  • Multi-modal

    • GPT-4V
    • Claude 3
    • Gemini
    • CogVLM
    • LLaVA

Selection Criteria

Technical Factors

  • Model size and requirements
  • Inference speed
  • Fine-tuning capabilities
  • Deployment options

Business Factors

  • Licensing terms
  • Cost structure
  • Support availability
  • Privacy considerations

Additional Resources

Documentation

Benchmarks