Pre-trained Language Models
Pre-trained language models are foundational AI models trained on vast amounts of data. Here’s an overview of popular models:
Proprietary Models
OpenAI Models
- GPT-4 - Latest large language model with multimodal capabilities
- GPT-3.5 - Powers ChatGPT, good balance of performance and cost
- DALL-E 3 - Text-to-image generation model
- Documentation
Anthropic Models
- Claude 3 - Latest model with strong reasoning capabilities
- Claude 2 - Previous generation with safety focus
- Documentation
Google Models
- Gemini - Multi-modal capabilities
- PaLM 2 - Powers Google products
- Documentation
Other Proprietary
- Cohere Command - Enterprise-focused model
- AI21 Jurassic - Specialized for specific tasks
- Amazon Titan - AWS integrated model
- IBM Granite - Enterprise AI model series
Open Source Models
Foundation Models
- Llama 2 - Meta’s open source model
- Mistral - High performance efficient model
- Falcon - TII’s model
- MPT - MosaicML’s model
Specialized Models
- CodeLlama - Code generation
- StarCoder - Programming focused
- Stable Diffusion - Image generation
- Whisper - Speech recognition
Model Categories
By Size
-
Small (1B-10B parameters)
- Mistral 7B
- Llama 2 7B
- MPT-7B
- T5-small
- BERT-small
-
Medium (10B-100B parameters)
- Llama 2 70B
- Claude 2
- PaLM 2
- BLOOM-176B
- Falcon-40B
-
Large (100B+ parameters)
- GPT-4
- Claude 3
- Gemini Ultra
- Falcon-180B
- PaLM
By Access
-
Commercial API
- GPT-4/3.5
- Claude 3/2
- Gemini
- Cohere Command
- AI21 Jurassic
- IBM Granite
-
Open Source
- Llama 2
- Mistral
- BLOOM
- Falcon
- MPT
-
Research Only
- PaLM
- LaMDA
- Gopher
- Chinchilla
- Megatron-Turing NLG
-
Fine-tunable
- Llama 2
- MPT
- BERT
- RoBERTa
- T5
By Capability
-
Text Generation
- GPT-4
- Claude 3
- Llama 2
- PaLM 2
- Mistral
- IBM Granite
-
Code Generation
- CodeLlama
- StarCoder
- Amazon CodeWhisperer
- GitHub Copilot
- GPT-4 (code)
-
Image Generation
- DALL-E 3
- Stable Diffusion
- Midjourney
- Google Imagen
- Parti
-
Multi-modal
- GPT-4V
- Claude 3
- Gemini
- CogVLM
- LLaVA
Selection Criteria
Technical Factors
- Model size and requirements
- Inference speed
- Fine-tuning capabilities
- Deployment options
Business Factors
- Licensing terms
- Cost structure
- Support availability
- Privacy considerations
Additional Resources
Documentation
Benchmarks
- Open LLM Leaderboard - Open source rankings
- LMSYS Leaderboard - Interactive evaluations
- Vellum LLM Leaderboard - Comprehensive benchmarks
- Artificial Analysis - Performance analysis
- KLU.ai Leaderboard - Task-specific benchmarks