JetMoE
JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars
Last updated
JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars
Last updated
JetMoE-8B is trained with less than $ 0.1 million (*) cost but outperforms LLaMA2-7B from Meta AI, who has multi-billion-dollar training resources. LLM training can be much cheaper than people previously thought.
It is fully open-sourced and academia-friendly because:
It only uses public datasets for training, and the code is open-sourced. No proprietary resource is needed.
It can be finetuned with very limited compute budget (e.g., consumer-grade GPU) that most labs can afford.
JetMoE-8B only has 2.2B active parameters during inference, which drastically lowers the computational cost. Compared to a model with similar inference computation, like Gemma-2B, JetMoE-8B achieves constantly better performance.
(*) We used a 96×H100 GPU cluster for 2 weeks, which cost ~$0.08 million.
We use the same evaluation methodology as in the Open LLM leaderboard. For MBPP code benchmark, we use the same evaluation methodology as in the LLaMA2 and Deepseek-MoE paper. The results are shown below:
Shot
25
10
5
0
5
5
3
0
Metric
acc_norm
acc_norm
acc
mc2
acc
acc
Pass@1
Pass@1
LLaMA2-7B
7B
2T
51.0
53.1
78.6
46.9
38.8
74
14.5
20.8
12.8
LLaMA-13B
13B
1T
51.4
56.2
80.9
47.7
39.5
76.2
7.6
22.0
15.8
DeepseekMoE-16B
2.8B
2T
51.1
53.2
79.8
46.3
36.1
73.7
17.3
34.0
25.0
Gemma-2B
2B
2T
46.4
48.4
71.8
41.8
33.1
66.3
16.9
28.0
24.4
JetMoE-8B
2.2B
1.25T
53.0
48.7
80.5
49.2
41.7
70.2
27.8
34.2
14.6
GPT-4
9.014
GPT-3.5-turbo
7.995
Claude-v1
7.923
JetMoE-8B-chat
6.681
Llama-2-13b-chat
6.650
Vicuna-13b-v1.3
6.413
Wizardlm-13b
6.353
Llama-2-7b-chat
6.269
To our surprise, despite the lower training cost and computation, JetMoE-8B performs even better than LLaMA2-7B, LLaMA-13B, and DeepseekMoE-16B. Compared to a model with similar training and inference computation, like Gemma-2B, JetMoE-8B achieves better performance.
To load the models, you need install this package:
Then you can load the model with the following code:
/ / / /
Please refer to the technical report for model details and training details.
If you have great ideas but need more resources (GPU, data, funding, etc.), welcome to contact MyShell.ai via . MyShell.ai is open to collaborations and are actively supporting high-quality open-source projects.