Mixtral 8x7B

Is a

Product

Product attributes

Industry

Artificial Intelligence (AI)

Generative AI

Product Parent Company

Mistral AI

Other attributes

Announcement URL

mistral.ai/news/mixt...f-experts/

Overview

Mixtral 8x7B is a sparse mixture of experts model (SMoE) developed by Mistral AI. Mixtral has 46.7 billion total parameters but only uses 12.9 billion parameters per token. This approach aims to increase the number of parameters while reducing cost and latency, processing inputs and generating outputs at the same speed and cost as a 12.9 billion parameter model. Mixtral has a 32k token context window and support for multiple languages (English, French, Italian, German and Spanish). The model shows strong performance in code generation as well as language tasks. Minstral AI states Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference, making it the "strongest open-weight model with a permissive license." The company also states it matches or outperforms GPT3.5 on most standard benchmarks.

Results published by Mistral AI show that Mixtral matches or outperforms Llama 2 70B and GPT3.5, on most benchmarks.

Comparison of Mixtral to LLaMa 2 & GPT 3.5 for a range of benchmarks.

Mixtral is a sparse mixture-of-experts network. It is a decoder-only model that picks from a set of eight distinct groups of parameters, giving it the designation "8x7B." At each layer, for every token, a router network processes the token by choosing two of these groups and combining their output additively. The model is pre-trained on data extracted from the open Web with experts and routers trained simultaneously. CoreWeave and Scaleway provided technical support during the training of Mixtral.

Mixtral can be finetuned into an instruction-following model, and Mistal AI released Mixtral 8x7B Instruct alongside the original model. Mixtral 8x7B Instruct has gone through supervised fine-tuning and direct preference optimisation (DPO) for instruction following. On reaches a score of 8.30 on MT-Bench. Mistral AI states this score makes it "the best open-source model, with a performance comparable to GPT3.5."

The model was released on December 11, 2023, with open weights. Mixtral 8x7B is licensed under Apache 2.0. Upon release, users can access Mixtral 8x7B through Mistral AI's "mistral-small" endpoint available in beta or download it from the Hugging Face repository. Users can deploy Mixtral with a fully open-source deployment stack and Mistral AI have submitted changes to the vLLM project, which integrates Megablocks CUDA kernels for efficient inference.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

Mixtral 8x7B

Contents

Product attributes

Other attributes

Timeline

Further Resources

References

Find more entities like Mixtral 8x7B