Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens. It tops the leaderboard for (pre-trained) open-access models and rivals proprietary models like PaLM-2. While difficult to rank definitively yet, it is considered on par with PaLM-2 Large, making Falcon 180B one of the most capable LLMs publicly known.
Stable Diffusion XL (SDXL 1.0) is designed to produce photorealistic outputs with enhanced detail and composition compared to previous SD models, such as SD 1.5 and 2.1. Key improvements in SDXL 1.0 include more realistic image generation, better face creation, legible text within images, and the ability to generate aesthetically pleasing art using shorter prompts.
Learn about the training pipeline of GPT assistants like ChatGPT, from tokenization to pretraining, supervised finetuning, and Reinforcement Learning from Human Feedback (RLHF). Dive deeper into practical techniques and mental models for the effective use of these models, including prompting strategies, finetuning, the rapidly growing ecosystem of tools, and their future extensions.
Stable Diffusion is a text-to-image diffusion model that utilizes a frozen CLIP ViT-L/14 text encoder, similar to Google's Imagen, to condition the model on text prompts. This relatively lightweight model, with an 860M UNet and 123M text encoder, requires a GPU with at least 10GB VRAM to run efficiently.
Whisper is an automatic speech recognition (ASR) system trained on a massive 680,000-hour multilingual and multitask dataset collected from the web. This extensive and diverse dataset enhances Whisper's robustness to accents, background noise, and technical language. Additionally, it facilitates transcription in multiple languages and translation into English. Open-sourcing models and inference code aims to provide a foundation for developing practical applications and conducting further research on robust speech processing.