MyShell
  • About MyShell
    • What is MyShell
    • MyShell in a Nutshell
    • Quickstart
  • Explore AI Agents
    • Image Generation
    • Video Generation
    • Meme Generation
    • Role-Playing Game
    • Character
    • Utility
  • Create AI Agents
    • Classic Mode
      • Enhanced Prompt
      • Knowledge Base
      • Telegram Integration
    • Pro Config Mode
      • Core Concepts
      • Tutorial
        • Tutorial Structure
        • Hello World with Pro Config
        • Building Workflow
        • Transitions
        • Expressions and Variables
        • Integration with Any Widget
        • An Advanced Example
      • Basic
        • Common
        • Atomic State
        • Transition
        • Automata
        • Modules
      • Advanced
        • Cron Pusher
        • Neutral Language To SD Prompt
        • Advanced Input Validation
        • Advanced Memory Manager in Prompt Widget
      • Tools
        • AutoConfig Agent
        • Cache Mode
        • Knowledge Base Agent
        • Crawler Widget
      • Example
        • Homeless With You
        • Random Routing
        • Function Calling
      • API Reference
        • Atomic State
        • Transition
        • Automata
        • Context
        • Module
          • AnyWidget Module
            • Prompt Widget
            • LLM Widget
            • TTS Widget
            • Code Runner Widget
            • Melo TTS
            • Age Transformation
            • ChatImg
            • GIF Generation
            • Music Generation
          • LLM Module
          • LLM Function Module
          • TTS Module
          • Google Search Module
        • Widgets
          • Bark TTS
          • Champ
          • CoinGecko
          • ControlNet with Civitai
          • Crawler
          • Crypto News
          • Data Visualizer
          • Email Sender
          • Google Flight Search
          • Google Hotel Search
          • Google Image Search
          • Google Map Search
          • Google News Search
          • Google Scholar Search
          • Google Search
          • GroundedSAM
          • Image Text Fuser
          • Information Extractor - OpenAI Schema Generator
          • Information Extractor
          • Instagram Search
          • JSON to Table
          • LinkedIn
          • MS Word to Markdown
          • Markdown to MS Word
          • Markdown to PDF
          • Mindmap Generator
          • Notion Database
          • OCR
          • Pdf to Markdown
          • RMBG
          • Stabel-Video-Diffusion
          • Stable Diffusion Inpaint
          • Stable Diffusion Recommend
          • Stable Diffusion Transform
          • Stable Diffusion Upscale
          • Stable Diffusion with 6 fixed category
          • Stable Diffusion with Civitai
          • Storydiffusion
          • Suno Lyrics Generator
          • Suno Music Generator
          • Table to Markdown
          • TripAdvisor
          • Twitter Search
          • UDOP: Document Question Answering
          • Weather forecasting
          • Whisper large-v3
          • Wikipedia
          • Wolfram Alpha Search
          • Yelp Search
          • YouTube Downloader
          • YouTube Transcriber
          • Youtube Search
      • FAQs
      • Changelog
    • ShellAgent Mode
      • Download and Installation
      • App Builder
      • Workflow
      • Build Custom Widget
      • Publish to MyShell
      • Customized Pricing For Your Agent
      • Example
        • Child Book X Agent w/ DeepSeek
        • Kids Book NFT AI Agent w/ BNB Chain
        • DeFAI Agent w/ BNB Chain
  • Shell Launchpad
    • How to Launch a Token
    • Trade Agent Tokens
  • Tokenomics
    • $SHELL Basics
    • $SHELL Token Utility
    • How to Obtain $SHELL
    • Roadmap
  • Open-source AI Framework/SDK
    • ShellAgent
    • OpenVoice
    • MeloTTS
    • JetMoE
    • AIlice
  • Links
Powered by GitBook
On this page
  • Benchmarks
  • Model Usage
  • Model Details
  • Collaboration
  1. Open-source AI Framework/SDK

JetMoE

JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars

PreviousMeloTTSNextAIlice

Last updated 5 months ago

GitHub / Website / HuggingFace / Online Demo on Lepton AI / Technical Report

JetMoE-8B is trained with less than $ 0.1 million (*) cost but outperforms LLaMA2-7B from Meta AI, who has multi-billion-dollar training resources. LLM training can be much cheaper than people previously thought.

It is fully open-sourced and academia-friendly because:

  • It only uses public datasets for training, and the code is open-sourced. No proprietary resource is needed.

  • It can be finetuned with very limited compute budget (e.g., consumer-grade GPU) that most labs can afford.

JetMoE-8B only has 2.2B active parameters during inference, which drastically lowers the computational cost. Compared to a model with similar inference computation, like Gemma-2B, JetMoE-8B achieves constantly better performance.

(*) We used a 96×H100 GPU cluster for 2 weeks, which cost ~$0.08 million.

Benchmarks

We use the same evaluation methodology as in the Open LLM leaderboard. For MBPP code benchmark, we use the same evaluation methodology as in the LLaMA2 and Deepseek-MoE paper. The results are shown below:

Model
Activate Params
Training Tokens
Open LLM Leaderboard Avg
ARC
Hellaswag
MMLU
TruthfulQA
WinoGrande
GSM8k
MBPP
HumanEval

Shot

25

10

5

0

5

5

3

0

Metric

acc_norm

acc_norm

acc

mc2

acc

acc

Pass@1

Pass@1

LLaMA2-7B

7B

2T

51.0

53.1

78.6

46.9

38.8

74

14.5

20.8

12.8

LLaMA-13B

13B

1T

51.4

56.2

80.9

47.7

39.5

76.2

7.6

22.0

15.8

DeepseekMoE-16B

2.8B

2T

51.1

53.2

79.8

46.3

36.1

73.7

17.3

34.0

25.0

Gemma-2B

2B

2T

46.4

48.4

71.8

41.8

33.1

66.3

16.9

28.0

24.4

JetMoE-8B

2.2B

1.25T

53.0

48.7

80.5

49.2

41.7

70.2

27.8

34.2

14.6

Model
MT-Bench Score

GPT-4

9.014

GPT-3.5-turbo

7.995

Claude-v1

7.923

JetMoE-8B-chat

6.681

Llama-2-13b-chat

6.650

Vicuna-13b-v1.3

6.413

Wizardlm-13b

6.353

Llama-2-7b-chat

6.269

To our surprise, despite the lower training cost and computation, JetMoE-8B performs even better than LLaMA2-7B, LLaMA-13B, and DeepseekMoE-16B. Compared to a model with similar training and inference computation, like Gemma-2B, JetMoE-8B achieves better performance.

Model Usage

To load the models, you need install this package:

pip install -e .

Then you can load the model with the following code:

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig, AutoModelForSequenceClassification
from jetmoe import JetMoEForCausalLM, JetMoEConfig, JetMoEForSequenceClassification

AutoConfig.register("jetmoe", JetMoEConfig)
AutoModelForCausalLM.register(JetMoEConfig, JetMoEForCausalLM)
AutoModelForSequenceClassification.register(JetMoEConfig, JetMoEForSequenceClassification)

tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')

Model Details

Please refer to the technical report https://arxiv.org/pdf/2404.07413.pdf for model details and training details.

Collaboration

If you have great ideas but need more resources (GPU, data, funding, etc.), welcome to contact MyShell.ai via Zengyi Qin. MyShell.ai is open to collaborations and are actively supporting high-quality open-source projects.