MyShell
  • About MyShell
    • What is MyShell
    • MyShell in a Nutshell
    • Quickstart
  • Explore AI Agents
    • Image Generation
    • Video Generation
    • Meme Generation
    • Role-Playing Game
    • Character
    • Utility
  • Create AI Agents
    • Classic Mode
      • Enhanced Prompt
      • Knowledge Base
      • Telegram Integration
    • Pro Config Mode
      • Core Concepts
      • Tutorial
        • Tutorial Structure
        • Hello World with Pro Config
        • Building Workflow
        • Transitions
        • Expressions and Variables
        • Integration with Any Widget
        • An Advanced Example
      • Basic
        • Common
        • Atomic State
        • Transition
        • Automata
        • Modules
      • Advanced
        • Cron Pusher
        • Neutral Language To SD Prompt
        • Advanced Input Validation
        • Advanced Memory Manager in Prompt Widget
      • Tools
        • AutoConfig Agent
        • Cache Mode
        • Knowledge Base Agent
        • Crawler Widget
      • Example
        • Homeless With You
        • Random Routing
        • Function Calling
      • API Reference
        • Atomic State
        • Transition
        • Automata
        • Context
        • Module
          • AnyWidget Module
            • Prompt Widget
            • LLM Widget
            • TTS Widget
            • Code Runner Widget
            • Melo TTS
            • Age Transformation
            • ChatImg
            • GIF Generation
            • Music Generation
          • LLM Module
          • LLM Function Module
          • TTS Module
          • Google Search Module
        • Widgets
          • Bark TTS
          • Champ
          • CoinGecko
          • ControlNet with Civitai
          • Crawler
          • Crypto News
          • Data Visualizer
          • Email Sender
          • Google Flight Search
          • Google Hotel Search
          • Google Image Search
          • Google Map Search
          • Google News Search
          • Google Scholar Search
          • Google Search
          • GroundedSAM
          • Image Text Fuser
          • Information Extractor - OpenAI Schema Generator
          • Information Extractor
          • Instagram Search
          • JSON to Table
          • LinkedIn
          • MS Word to Markdown
          • Markdown to MS Word
          • Markdown to PDF
          • Mindmap Generator
          • Notion Database
          • OCR
          • Pdf to Markdown
          • RMBG
          • Stabel-Video-Diffusion
          • Stable Diffusion Inpaint
          • Stable Diffusion Recommend
          • Stable Diffusion Transform
          • Stable Diffusion Upscale
          • Stable Diffusion with 6 fixed category
          • Stable Diffusion with Civitai
          • Storydiffusion
          • Suno Lyrics Generator
          • Suno Music Generator
          • Table to Markdown
          • TripAdvisor
          • Twitter Search
          • UDOP: Document Question Answering
          • Weather forecasting
          • Whisper large-v3
          • Wikipedia
          • Wolfram Alpha Search
          • Yelp Search
          • YouTube Downloader
          • YouTube Transcriber
          • Youtube Search
      • FAQs
      • Changelog
    • ShellAgent Mode
      • Download and Installation
      • App Builder
      • Workflow
      • Build Custom Widget
      • Publish to MyShell
      • Example
        • Child Book X Agent (ft. DeepSeek)
        • Kids Book NFT AI Agent (ft. BNB Chain)
        • DeFAI Agent (ft. BNB Chain)
  • Shell Launchpad
    • How to Launch a Token
    • Trade Agent Tokens
  • Tokenomics
    • $SHELL Basics
    • $SHELL Token Utility
    • How to Obtain $SHELL
    • Roadmap
  • Open-source AI Framework/SDK
    • ShellAgent
    • OpenVoice
    • MeloTTS
    • JetMoE
    • AIlice
  • Links
Powered by GitBook
On this page
  • Benchmarks
  • Model Usage
  • Model Details
  • Collaboration
  1. Open-source AI Framework/SDK

JetMoE

JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars

PreviousMeloTTSNextAIlice

Last updated 4 months ago

JetMoE-8B is trained with less than $ 0.1 million (*) cost but outperforms LLaMA2-7B from Meta AI, who has multi-billion-dollar training resources. LLM training can be much cheaper than people previously thought.

It is fully open-sourced and academia-friendly because:

  • It only uses public datasets for training, and the code is open-sourced. No proprietary resource is needed.

  • It can be finetuned with very limited compute budget (e.g., consumer-grade GPU) that most labs can afford.

JetMoE-8B only has 2.2B active parameters during inference, which drastically lowers the computational cost. Compared to a model with similar inference computation, like Gemma-2B, JetMoE-8B achieves constantly better performance.

(*) We used a 96×H100 GPU cluster for 2 weeks, which cost ~$0.08 million.

Benchmarks

We use the same evaluation methodology as in the Open LLM leaderboard. For MBPP code benchmark, we use the same evaluation methodology as in the LLaMA2 and Deepseek-MoE paper. The results are shown below:

Model
Activate Params
Training Tokens
Open LLM Leaderboard Avg
ARC
Hellaswag
MMLU
TruthfulQA
WinoGrande
GSM8k
MBPP
HumanEval

Shot

25

10

5

0

5

5

3

0

Metric

acc_norm

acc_norm

acc

mc2

acc

acc

Pass@1

Pass@1

LLaMA2-7B

7B

2T

51.0

53.1

78.6

46.9

38.8

74

14.5

20.8

12.8

LLaMA-13B

13B

1T

51.4

56.2

80.9

47.7

39.5

76.2

7.6

22.0

15.8

DeepseekMoE-16B

2.8B

2T

51.1

53.2

79.8

46.3

36.1

73.7

17.3

34.0

25.0

Gemma-2B

2B

2T

46.4

48.4

71.8

41.8

33.1

66.3

16.9

28.0

24.4

JetMoE-8B

2.2B

1.25T

53.0

48.7

80.5

49.2

41.7

70.2

27.8

34.2

14.6

Model
MT-Bench Score

GPT-4

9.014

GPT-3.5-turbo

7.995

Claude-v1

7.923

JetMoE-8B-chat

6.681

Llama-2-13b-chat

6.650

Vicuna-13b-v1.3

6.413

Wizardlm-13b

6.353

Llama-2-7b-chat

6.269

To our surprise, despite the lower training cost and computation, JetMoE-8B performs even better than LLaMA2-7B, LLaMA-13B, and DeepseekMoE-16B. Compared to a model with similar training and inference computation, like Gemma-2B, JetMoE-8B achieves better performance.

Model Usage

To load the models, you need install this package:

pip install -e .

Then you can load the model with the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig, AutoModelForSequenceClassification
from jetmoe import JetMoEForCausalLM, JetMoEConfig, JetMoEForSequenceClassification

AutoConfig.register("jetmoe", JetMoEConfig)
AutoModelForCausalLM.register(JetMoEConfig, JetMoEForCausalLM)
AutoModelForSequenceClassification.register(JetMoEConfig, JetMoEForSequenceClassification)

tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')

Model Details

Collaboration

/ / / /

Please refer to the technical report for model details and training details.

If you have great ideas but need more resources (GPU, data, funding, etc.), welcome to contact MyShell.ai via . MyShell.ai is open to collaborations and are actively supporting high-quality open-source projects.

GitHub
Website
HuggingFace
Online Demo on Lepton AI
Technical Report
https://arxiv.org/pdf/2404.07413.pdf
Zengyi Qin