What is Falcon-180B?
Falcon 180B, released by TII, is a part of the Falcon family and is a scaled-up version of Falcon 40B. It incorporates innovations like multiquery attention for improved scalability. Falcon 180B was trained on 3.5 trillion tokens using up to 4096 GPUs simultaneously via Amazon SageMaker, totaling approximately 7,000,000 GPU hours. This makes Falcon 180B 2.5 times larger than Llama 2 and trained with 4x more compute.
The dataset used for Falcon 180B primarily consists of web data from RefinedWeb (~85%) and a mix of curated data, including conversations, technical papers, and a small portion of code (~3%). The pretraining dataset is so large that even 3.5 trillion tokens account for less than one epoch.
The chat model is fine-tuned on chat and instruction datasets using a combination of large-scale conversational datasets. For commercial use, Falcon 180B is allowed under very restrictive conditions, excluding any "hosting use." It is recommended to check the license and consult your legal team for commercial use.
How to use the Falcon-180B?
Falcon 180B is available in the Hugging Face ecosystem, starting with Transformers version 4.33. You can easily try the Big Falcon Model (180 billion parameters!) in HuggingFace Space