Log in
Enquire now
WizardLM

WizardLM

WizardLM is a family of instruction-following LLMs, including WizardLM, WizardCoder, and WizardMath, powered by Evol-Instruct, a method of generating open-domain instructions.

OverviewStructured DataIssuesContributors

Contents

Is a
Software
Software
‌
AI Project

Software attributes

Created/Discovered by
Microsoft
Microsoft
0
Peking University
Peking University
0
Industry
Artificial Intelligence (AI)
Artificial Intelligence (AI)
Generative AI
Generative AI

AI Project attributes

Hugging Face ID
WizardLM
Technologies Used
LLaMA
LLaMA
0

Other attributes

Email Address
caxu@microsoft.com
Launch Date
April 24, 2023
0
Overview

WizardLM is a family of instruction-following large language models (LLMs) powered by Evol-Instruct, a method using LLMs instead of humans to automatically mass-produce open-domain instructions to improve performance. The family of models includes WizardLM, WizardCoder, and WizardMath. WizardLM and the Evol-Instruct method were introduced in an April 2023 paper from researchers at Microsoft and Peking University led by Can Xu, a senior applied scientist at Microsoft's STCA (Software Technology Center at Asia) working in the S+D NLP Science Group.

Instructions are used to train or fine-tune LLMs. This requires open-domain instruction-following data provided by human annotators. However, the manual creation of instructions is time-consuming and labor-intensive. Evol-Instruct leverages LLMs to automatically generate large amounts of instruction data with varying levels of complexity. Starting with an initial set of instructions, the team from Microsoft STCA used Evol-Instruct to rewrite them step by step into more complex instructions. The generated instruction data was then used to fine-tune the LLaMA LLM to produce WizardLM.

Evol-Instruct

Initial attempts to train LLMs for NLP tasks were based on a small amount of hand-written instructions accompanying each task. These closed-domain instructions struggle with the samples in an NLP dataset sharing a few common instructions and the instructions only asking for one task (e.g., translation or summarization). LLMs have achieved better results, performing more complicated and diverse tasks, using open-domain instruction data generated by human users. However, this process is expensive and time-consuming while also introducing skewed data. The proportion of experts among annotators is low compared to the total number, meaning the resulting instruction data tends to skew towards easy or moderate examples.

Evol-Instruct is an automatic method capable of mass-producing open-domain instructions (including more complicated instructions) using LLMs instead of humans. The diagram below shows a running example of Evol-Instruct starting with a simple instruction and then randomly selecting in-depth evolving (blue line) or in-breadth evolving (red line) to generate new and more complicated instructions.

Example of instructions generated using Evol-Instruct starting from a single, simple instruction.

Example of instructions generated using Evol-Instruct starting from a single, simple instruction.

In-depth evolution includes five types of operations: adding constraints, deepening, concretizing, increasing reasoning steps, and complicating input. The In-breadth Evolving is a mutation, i.e., generating a completely new instruction based on the given instruction. These six operations are implemented by prompting an LLM. An instruction eliminator was developed to filter failed instructions created by the LLM, a process known as elimination evolving. The evolutionary process is repeated for several rounds to generate instruction data containing a range of complexity.

WizardLM models

WizardLM is an LLM build to validate the Evol-instruct method by fine-tuning the open-source LLaMA model using evolved instructions. In their April 2023 paper, the team behind WizardLM evaluated the model's performance compared to leading works on instruction finetuning. The instruction datasets compared to WizardLM were the data used by Alpaca (generated using self-instruct) and the 70k ShareGPT (shared by real users) dataset used by Vicuna.

Due to the low proportion of difficult instructions in previous instruction-following test datasets, the team created a new difficulty-balanced test dataset, named Evol-Instruct testset. Annotators were hired and GPT-4 was leveraged to evaluate Alpaca, Vicuna, ChatGPT, and WizardLM on Evol-Instruct testset and Vicuna’s testset. The paper shows instructions from Evol-Instruct were superior to those from human-created ShareGPT and that the WizardLM model outperforms Vicuna. Additionally, labelers preferred WizardLM outputs over those from ChatGPT under complex test instructions.

The table below shows the WizardLM models that have been released alongside their evaluation and license. Evaluation is determined using the MT-Bench, AlpacaEval, GSM8k, and Human Eval benchmarks.

WizardLM models

Model
MT-Bench
AlpacaEval
GSM8k
HumanEval
License

WizardLM-13B-V1.0

6.35

75.31%

24.0

Non-commercial

WizardLM-13B-V1.1

6.76

86.32%

25.0

Non-commercial

WizardLM-13B-V1.2

7.06

89.17%

55.3%

36.6

Llama 2 License

WizardLM-30B-V1.0

7.01

37.8

Non-commercial

WizardLM-70B-V1.0

7.78

92.91%

77.6%

50.6

Llama 2 License

WizardCoder

WizardCoder is the result of adapting the Evol-Instruct method to code. Most existing models performing code-related tasks are pre-trained solely on extensive raw code data without instruction finetuning. In a paper released in June 2023, the WizardLM team demonstrated the capabilities of WizardCoder and the extension of Evol-Instruct to code-related instructions.

The table below shows the WizardCoder models that have been released alongside their evaluation and license. Evaluation is based on the HumanEval dataset from OpenAI, 164 programming challenges, and the MBPP (mostly basic Python programming) benchmark consisting of around 1,000 crowd-sourced Python programming problems.

WizardCoder models

Model
HumanEval
MBPP
License

WizardCoder-15B-V1.0

59.8

50.6

OpenRAIL-M

WizardCoder-1B-V1.0

23.8

28.6

OpenRAIL-M

WizardCoder-3B-V1.0

34.8

37.4

OpenRAIL-M

WizardCoder-Python-13B-V1.0

64.0

55.6

Llama2

WizardCoder-Python-34B-V1.0

73.2

61.2

Llama2

WizardMath

WizardMath, first described in an August 2023 paper, is a fine-tuned version of LLaMA-2 using a proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to generate instructions for math tasks. Most open-source models are only pre-trained on large-scale internet data, without specific math-related optimization.

The table below shows the WizardMath models that have been released alongside their evaluation and license. Evaluation is defined in terms of the GSM8k and MATH benchmarks.

WizardMath models

Model
GSM8k
MATH
License

WizardMath-13B-V1.0

63.9

14.0

Llama 2

WizardMath-70B-V1.0

81.6

22.7

Llama 2

WizardMath-7B-V1.0

54.9

10.7

Llama 2

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang

https://arxiv.org/abs/2306.08568

June 14, 2023

WizardLM: Empowering Large Language Models to Follow Complex Instructions

Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, Daxin Jiang

https://arxiv.org/abs/2304.12244

April 24, 2023

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Dongmei Zhang

https://arxiv.org/abs/2308.09583

August 18, 2023

References

Find more entities like WizardLM

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Pricing
  • Enterprise

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us
By using this site, you agree to our Terms of Service.