MyShell
  • About MyShell
    • What is MyShell
    • MyShell in a Nutshell
    • Quickstart
  • Explore AI Agents
    • Image Generation
    • Video Generation
    • Meme Generation
    • Role-Playing Game
    • Character
    • Utility
  • Create AI Agents
    • Classic Mode
      • Enhanced Prompt
      • Knowledge Base
      • Telegram Integration
    • Pro Config Mode
      • Core Concepts
      • Tutorial
        • Tutorial Structure
        • Hello World with Pro Config
        • Building Workflow
        • Transitions
        • Expressions and Variables
        • Integration with Any Widget
        • An Advanced Example
      • Basic
        • Common
        • Atomic State
        • Transition
        • Automata
        • Modules
      • Advanced
        • Cron Pusher
        • Neutral Language To SD Prompt
        • Advanced Input Validation
        • Advanced Memory Manager in Prompt Widget
      • Tools
        • AutoConfig Agent
        • Cache Mode
        • Knowledge Base Agent
        • Crawler Widget
      • Example
        • Homeless With You
        • Random Routing
        • Function Calling
      • API Reference
        • Atomic State
        • Transition
        • Automata
        • Context
        • Module
          • AnyWidget Module
            • Prompt Widget
            • LLM Widget
            • TTS Widget
            • Code Runner Widget
            • Melo TTS
            • Age Transformation
            • ChatImg
            • GIF Generation
            • Music Generation
          • LLM Module
          • LLM Function Module
          • TTS Module
          • Google Search Module
        • Widgets
          • Bark TTS
          • Champ
          • CoinGecko
          • ControlNet with Civitai
          • Crawler
          • Crypto News
          • Data Visualizer
          • Email Sender
          • Google Flight Search
          • Google Hotel Search
          • Google Image Search
          • Google Map Search
          • Google News Search
          • Google Scholar Search
          • Google Search
          • GroundedSAM
          • Image Text Fuser
          • Information Extractor - OpenAI Schema Generator
          • Information Extractor
          • Instagram Search
          • JSON to Table
          • LinkedIn
          • MS Word to Markdown
          • Markdown to MS Word
          • Markdown to PDF
          • Mindmap Generator
          • Notion Database
          • OCR
          • Pdf to Markdown
          • RMBG
          • Stabel-Video-Diffusion
          • Stable Diffusion Inpaint
          • Stable Diffusion Recommend
          • Stable Diffusion Transform
          • Stable Diffusion Upscale
          • Stable Diffusion with 6 fixed category
          • Stable Diffusion with Civitai
          • Storydiffusion
          • Suno Lyrics Generator
          • Suno Music Generator
          • Table to Markdown
          • TripAdvisor
          • Twitter Search
          • UDOP: Document Question Answering
          • Weather forecasting
          • Whisper large-v3
          • Wikipedia
          • Wolfram Alpha Search
          • Yelp Search
          • YouTube Downloader
          • YouTube Transcriber
          • Youtube Search
      • FAQs
      • Changelog
    • ShellAgent Mode
      • Download and Installation
      • App Builder
      • Workflow
      • Build Custom Widget
      • Publish to MyShell
      • Example
        • Child Book X Agent (ft. DeepSeek)
        • Kids Book NFT AI Agent (ft. BNB Chain)
        • DeFAI Agent (ft. BNB Chain)
  • Shell Launchpad
    • How to Launch a Token
    • Trade Agent Tokens
  • Tokenomics
    • $SHELL Basics
    • $SHELL Token Utility
    • How to Obtain $SHELL
    • Roadmap
  • Open-source AI Framework/SDK
    • ShellAgent
    • OpenVoice
    • MeloTTS
    • JetMoE
    • AIlice
  • Links
Powered by GitBook
On this page
  • Using the Crawler Widget
  • Using Crawler widget in Pro Config
  • Conclusion
  1. Create AI Agents
  2. Pro Config Mode
  3. Tools

Crawler Widget

PreviousKnowledge Base AgentNextExample

Last updated 4 months ago

In this section, we will build a webpage summarizer agent utilizing the Crawler widget in pro config. The crawler widget is used to crawl webpages and get raw content of a webpage. We can then pass on this content to LLMs like GPT-3.5 or GPT-4 to summarize it into a couple of paragraphs.

Before we build the summarizer using pro config, let’s take a look at the Crawler widget specifically.

Using the Crawler Widget

Head to the on MyShell app. You should see a screen similar to the following:

Crawler widget page on MyShell.

Now, click on Start and you should see a form similar to the following:

Input form for Crawler widget.

Response from the Crawler widget.

This doesn’t look pretty. Worry not! We just wanted to show you how the response of the Crawler widget looks like. Looking at it, it’s a JSON with a markdown_string property with the content and other details of the webpage. It’s always a good practice to see how a widget responds before using it in pro config.

Now, let’s use this widget in pro config and use an LLM to summarize this JSON for us.

Using Crawler widget in Pro Config

The usage for this widget in pro config is quite simple, as the only input for this widget is the URL, so the module configuration for using this widget should look like this:

{
  "widget_id": "1781991963803181056",
  "url": "<url_goes_here>",
  "output_name": "<variable_name_goes_here>"
}

Now, let's use this configuration in a pro config boilerplate:

{
  "type": "automata",
  "id": "web-page-summarizer",
  "initial": "homepage",
  "properties": {
    "cache": true
  },
  "states": {
    "homepage": {
      "inputs": {
        "url": {
          "type": "text",
          "user_input": true
        }
      },
      "tasks": [
        {
          "name": "crawl_website",
          "module_type": "AnyWidgetModule",
          "module_config": {
            "widget_id": "1781991963803181056",
            "url": "{{ url }}",
            "output_name": "crawled_content"
          }
        },
        {
          "name": "summarize_crawled_website",
          "module_type": "AnyWidgetModule",
          "module_config": {
            "widget_id": "1744214047475109888",
            "system_prompt": "Provided is a crawled webpage, provide the user with the summary of the content of the webpage. Avoid mentioning about HTML specific things and stick to content summaries.",
            "user_prompt": "{{ JSON.stringify(crawled_content) }}",
            "output_name": "summary"
          }
        }
      ],
      "render": {
        "text": "{{ summary }}",
        "buttons": [
          {
            "content": "Summarize another page",
            "on_click": "go_to_homepage"
          }
        ]
      },
      "transitions": {
        "go_to_homepage": "homepage"
      }
    }
  }
}

In the above code, we are doing the following:

  • Asking the user for an URL to crawl using an input prompt.

  • Initializing the tasks, first being crawling the URL provided using the Crawler widget (widget ID: 1781991963803181056) and saving the object received as response in a variable called crawled_content.

  • We are then converting this crawled_content object to a JSON string and passing it as a user prompt to GPT-4 widget (widget ID: 1744214047475109888) and asking it to summarize the webpage for us. We are saving the output from GPT-4 in summary.

  • We are outputting the summary as a chat message through render.

Response from the webpage summarizer agent.

Of course, you can play around with different LLMs and optimize how they work to get better results that match your expectations. This tutorial aimed to help you get to the finish line, feel free to add levels of complexity to this agent and make the ideal webpage summarizer!

Conclusion

Crawler widget is a very powerful widget that allows you to access external data and process it in your pro config.

Enter an URL. For example, we have entered “” which is MyShell documentation website homepage. Now, click on Generate. After a few seconds of loading, you should see a response similar to the following:

We are using cache mode during testing, this is helpful for debugging. If you want to learn more, check the .

Now, if you run the pro config and pass an URL, you should see a proper summary for the URL provided (in this case, the URL provided is “”):

https://docs.myshell.ai
cache mode page in documentation
https://docs.myshell.ai
Crawler widget page