MyShell
  • About MyShell
    • What is MyShell
    • MyShell in a Nutshell
    • Quickstart
  • Explore AI Agents
    • Image Generation
    • Video Generation
    • Meme Generation
    • Role-Playing Game
    • Character
    • Utility
  • Create AI Agents
    • Classic Mode
      • Enhanced Prompt
      • Knowledge Base
      • Telegram Integration
    • Pro Config Mode
      • Core Concepts
      • Tutorial
        • Tutorial Structure
        • Hello World with Pro Config
        • Building Workflow
        • Transitions
        • Expressions and Variables
        • Integration with Any Widget
        • An Advanced Example
      • Basic
        • Common
        • Atomic State
        • Transition
        • Automata
        • Modules
      • Advanced
        • Cron Pusher
        • Neutral Language To SD Prompt
        • Advanced Input Validation
        • Advanced Memory Manager in Prompt Widget
      • Tools
        • AutoConfig Agent
        • Cache Mode
        • Knowledge Base Agent
        • Crawler Widget
      • Example
        • Homeless With You
        • Random Routing
        • Function Calling
      • API Reference
        • Atomic State
        • Transition
        • Automata
        • Context
        • Module
          • AnyWidget Module
            • Prompt Widget
            • LLM Widget
            • TTS Widget
            • Code Runner Widget
            • Melo TTS
            • Age Transformation
            • ChatImg
            • GIF Generation
            • Music Generation
          • LLM Module
          • LLM Function Module
          • TTS Module
          • Google Search Module
        • Widgets
          • Bark TTS
          • Champ
          • CoinGecko
          • ControlNet with Civitai
          • Crawler
          • Crypto News
          • Data Visualizer
          • Email Sender
          • Google Flight Search
          • Google Hotel Search
          • Google Image Search
          • Google Map Search
          • Google News Search
          • Google Scholar Search
          • Google Search
          • GroundedSAM
          • Image Text Fuser
          • Information Extractor - OpenAI Schema Generator
          • Information Extractor
          • Instagram Search
          • JSON to Table
          • LinkedIn
          • MS Word to Markdown
          • Markdown to MS Word
          • Markdown to PDF
          • Mindmap Generator
          • Notion Database
          • OCR
          • Pdf to Markdown
          • RMBG
          • Stabel-Video-Diffusion
          • Stable Diffusion Inpaint
          • Stable Diffusion Recommend
          • Stable Diffusion Transform
          • Stable Diffusion Upscale
          • Stable Diffusion with 6 fixed category
          • Stable Diffusion with Civitai
          • Storydiffusion
          • Suno Lyrics Generator
          • Suno Music Generator
          • Table to Markdown
          • TripAdvisor
          • Twitter Search
          • UDOP: Document Question Answering
          • Weather forecasting
          • Whisper large-v3
          • Wikipedia
          • Wolfram Alpha Search
          • Yelp Search
          • YouTube Downloader
          • YouTube Transcriber
          • Youtube Search
      • FAQs
      • Changelog
    • ShellAgent Mode
      • Download and Installation
      • App Builder
      • Workflow
      • Build Custom Widget
      • Publish to MyShell
      • Customized Pricing For Your Agent
      • Example
        • Child Book X Agent w/ DeepSeek
        • Kids Book NFT AI Agent w/ BNB Chain
        • DeFAI Agent w/ BNB Chain
  • Shell Launchpad
    • How to Launch a Token
    • Trade Agent Tokens
  • Tokenomics
    • $SHELL Basics
    • $SHELL Token Utility
    • How to Obtain $SHELL
    • Roadmap
  • Open-source AI Framework/SDK
    • ShellAgent
    • OpenVoice
    • MeloTTS
    • JetMoE
    • AIlice
  • Links
Powered by GitBook
On this page
  • Try it in the Widget Center
  • Usage
  • Detailed Guidelines
  1. Create AI Agents
  2. Pro Config Mode
  3. API Reference
  4. Widgets

UDOP: Document Question Answering

UDOP, a unified model for document classification, layout parsing and visual question answering by Microsoft.

PreviousTwitter SearchNextWeather forecasting

Last updated 1 year ago

Try it in the Widget Center

Click this to try this widget and copy the Pro Config template.

Usage

<TODO: enter description here, and remove useless inputs>

Input Parameters

Name
Type
Description
Default
Required

image

string

image url for udop to understand.

prompt

string

The text prompt in natural language. You can use `Question answering. xxxxxx` to ask question about the input image

Question answering. In which date is the report made?

Output Parameters

Name
Type
Description
File Type

response

string

The answer given. According to the image and prompt

Output Example

{
   "response": "9/3/92"  // example input image=https://replicate.delivery/pbxt/KZczgVh1gAp7xfPP79GdZoGdKEoekcLPiqSqE6bEgM5pThGD/example_1.jpg  prompt=Question answering. In which date is the report made?
}
{
   "results": "<the example results of this widget>"
}

Detailed Guidelines

url
default_url