WizardLM

Is a

Software

‌

AI Project

Software attributes

Created/Discovered by

Microsoft

Peking University

Industry

Artificial Intelligence (AI)

Generative AI

AI Project attributes

Hugging Face ID

WizardLM

Technologies Used

LLaMA

Other attributes

Email Address

caxu@microsoft.com

Launch Date

April 24, 2023

Overview

WizardLM is a family of instruction-following large language models (LLMs) powered by Evol-Instruct, a method using LLMs instead of humans to automatically mass-produce open-domain instructions to improve performance. The family of models includes WizardLM, WizardCoder, and WizardMath. WizardLM and the Evol-Instruct method were introduced in an April 2023 paper from researchers at Microsoft and Peking University led by Can Xu, a senior applied scientist at Microsoft's STCA (Software Technology Center at Asia) working in the S+D NLP Science Group.

Instructions are used to train or fine-tune LLMs. This requires open-domain instruction-following data provided by human annotators. However, the manual creation of instructions is time-consuming and labor-intensive. Evol-Instruct leverages LLMs to automatically generate large amounts of instruction data with varying levels of complexity. Starting with an initial set of instructions, the team from Microsoft STCA used Evol-Instruct to rewrite them step by step into more complex instructions. The generated instruction data was then used to fine-tune the LLaMA LLM to produce WizardLM.

Evol-Instruct

Initial attempts to train LLMs for NLP tasks were based on a small amount of hand-written instructions accompanying each task. These closed-domain instructions struggle with the samples in an NLP dataset sharing a few common instructions and the instructions only asking for one task (e.g., translation or summarization). LLMs have achieved better results, performing more complicated and diverse tasks, using open-domain instruction data generated by human users. However, this process is expensive and time-consuming while also introducing skewed data. The proportion of experts among annotators is low compared to the total number, meaning the resulting instruction data tends to skew towards easy or moderate examples.

Evol-Instruct is an automatic method capable of mass-producing open-domain instructions (including more complicated instructions) using LLMs instead of humans. The diagram below shows a running example of Evol-Instruct starting with a simple instruction and then randomly selecting in-depth evolving (blue line) or in-breadth evolving (red line) to generate new and more complicated instructions.

Example of instructions generated using Evol-Instruct starting from a single, simple instruction.

In-depth evolution includes five types of operations: adding constraints, deepening, concretizing, increasing reasoning steps, and complicating input. The In-breadth Evolving is a mutation, i.e., generating a completely new instruction based on the given instruction. These six operations are implemented by prompting an LLM. An instruction eliminator was developed to filter failed instructions created by the LLM, a process known as elimination evolving. The evolutionary process is repeated for several rounds to generate instruction data containing a range of complexity.

WizardLM models

WizardLM is an LLM build to validate the Evol-instruct method by fine-tuning the open-source LLaMA model using evolved instructions. In their April 2023 paper, the team behind WizardLM evaluated the model's performance compared to leading works on instruction finetuning. The instruction datasets compared to WizardLM were the data used by Alpaca (generated using self-instruct) and the 70k ShareGPT (shared by real users) dataset used by Vicuna.

Due to the low proportion of difficult instructions in previous instruction-following test datasets, the team created a new difficulty-balanced test dataset, named Evol-Instruct testset. Annotators were hired and GPT-4 was leveraged to evaluate Alpaca, Vicuna, ChatGPT, and WizardLM on Evol-Instruct testset and Vicuna’s testset. The paper shows instructions from Evol-Instruct were superior to those from human-created ShareGPT and that the WizardLM model outperforms Vicuna. Additionally, labelers preferred WizardLM outputs over those from ChatGPT under complex test instructions.

The table below shows the WizardLM models that have been released alongside their evaluation and license. Evaluation is determined using the MT-Bench, AlpacaEval, GSM8k, and Human Eval benchmarks.

WizardLM models

Model

MT-Bench

AlpacaEval

GSM8k

HumanEval

License

WizardLM-13B-V1.0

6.35

75.31%

24.0

Non-commercial

WizardLM-13B-V1.1

6.76

86.32%

25.0

Non-commercial

WizardLM-13B-V1.2

7.06

89.17%

55.3%

36.6

Llama 2 License

WizardLM-30B-V1.0

7.01

37.8

Non-commercial

WizardLM-70B-V1.0

7.78

92.91%

77.6%

50.6

Llama 2 License

WizardCoder

WizardCoder is the result of adapting the Evol-Instruct method to code. Most existing models performing code-related tasks are pre-trained solely on extensive raw code data without instruction finetuning. In a paper released in June 2023, the WizardLM team demonstrated the capabilities of WizardCoder and the extension of Evol-Instruct to code-related instructions.

The table below shows the WizardCoder models that have been released alongside their evaluation and license. Evaluation is based on the HumanEval dataset from OpenAI, 164 programming challenges, and the MBPP (mostly basic Python programming) benchmark consisting of around 1,000 crowd-sourced Python programming problems.

WizardCoder models

Model

HumanEval

MBPP

License

WizardCoder-15B-V1.0

59.8

50.6

OpenRAIL-M

WizardCoder-1B-V1.0

23.8

28.6

OpenRAIL-M

WizardCoder-3B-V1.0

34.8

37.4

OpenRAIL-M

WizardCoder-Python-13B-V1.0

64.0

55.6

Llama2

WizardCoder-Python-34B-V1.0

73.2

61.2

Llama2

WizardMath

WizardMath, first described in an August 2023 paper, is a fine-tuned version of LLaMA-2 using a proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to generate instructions for math tasks. Most open-source models are only pre-trained on large-scale internet data, without specific math-related optimization.

The table below shows the WizardMath models that have been released alongside their evaluation and license. Evaluation is defined in terms of the GSM8k and MATH benchmarks.

WizardMath models

Model

GSM8k

MATH

License

WizardMath-13B-V1.0

63.9

14.0

Llama 2

WizardMath-70B-V1.0

81.6

22.7

Llama 2

WizardMath-7B-V1.0

54.9

10.7

Llama 2

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang

https://arxiv.org/abs/2306.08568

June 14, 2023

WizardLM: Empowering Large Language Models to Follow Complex Instructions

Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, Daxin Jiang

https://arxiv.org/abs/2304.12244

April 24, 2023

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Dongmei Zhang

https://arxiv.org/abs/2308.09583

August 18, 2023

WizardLM

Contents

Software attributes

AI Project attributes

Other attributes

WizardLM models

WizardCoder models

WizardMath models

Timeline

Further Resources

References

Find more entities like WizardLM