Vicuna

vicuna.lmsys.org

Is a

Software

Product

Product attributes

Industry

Chatbot

Generative AI

Artificial Intelligence (AI)

Competitors

ChatGPT

Bard

Technologies Used

LLaMA

Software attributes

License

Apache License2

Latest Release

April 21, 2023

Latest Version

v0.2.37

Implementations

Chatbot

Other attributes

Creator

LMSYS Org

Overview

Vicuna is an open-source chatbot trained by fine-tuning the 13-billion parameter LLaMA model on user-shared conversations collected from ShareGPT. Trained between March 2023 and April 2023 and released in early April 2023, Vicuna is an auto-regressive language model based on the transformer architecture. Vicuna was developed by members from UC Berkeley, Carnegie Mellon University, Stanford, Mohamed bin Zayed University of Artificial Intelligence, and UC San Diego. Preliminary evaluations from the team behind Vicuna using GPT-4 suggest Vicuna-13B achieves more than 90% quality of ChatGPT and Google Bard while outperforming other models such asLLaMA and Stanford Alpaca in more than 90% of cases.

Diagram showing an overview of the Vicuna model.

The team behind Vicuna collected roughly seventy thousand conversions from ShareGPT.com, a website where users can share their ChatGPT conversations using public APIs. Next, they enhanced training scripts previously provided to Alpaca in order to handle multi-round conversations and long sequences. Improvements included the following:

Memory Optimizations—to enable Vicuna’s understanding of long context, the max context length was expanded from 512 in alpaca to 2048, which substantially increases GPU memory requirements.
Multi-round conversations—the training loss to account for multi-round conversations and compute the fine-tuning loss solely on the chatbot’s output was adjusted.
Cost Reduction via Spot Instance—with a 40x larger dataset and 4x sequence length for training, cost reductions were introduced by leveraging the cheaper spot instances with auto-recovery for preemptions and auto zone switch. This reduces costs for training the 7B model from $500 to around $140 and the 13B model from around $1K to $300.

The training was done with 8 A100 GPUs in a single day. Serving the demo of the model was implemented on a lightweight distributed system. Preliminary evaluations of the model used GPT-4 to judge model outputs based on a set of eighty diverse questions. Similar to other large language models, Vicuna is not good at tasks related to mathematics and has limitations in ensuring the factual accuracy of its outputs. It also has not been optimized to reduce toxicity or bias.

The first Vicuna release in April 2023 included training, serving, and evaluation code on GitHub. Additionally, the team released the Vicuna-13B model weights. The initial online demo is a research preview intended for non-commercial use only subject to the model license of LLaMA. The code is released under the Apache License 2.0. On April 28, 2023, Stability AI released the first large-scale open-source chatbot trained via reinforced learning from human feedback (RHLF), based on the Vicuna.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

Vicuna

Contents

Product attributes

Software attributes

Other attributes

Timeline

Further Resources

References

Find more entities like Vicuna