# Pdf to Markdown

## Try it in the Widget Center

Click this [url](https://app.myshell.ai/robot-workshop/widget/1781991889463336960) to try this widget and copy the Pro Config template.

## Usage

Convert the input PDF file into markdown string

**Input Parameters**

<table><thead><tr><th>Name</th><th>Type</th><th>Description</th><th>Default</th><th data-type="checkbox">Required</th></tr></thead><tbody><tr><td>document</td><td><code>string</code></td><td>Provide your input file (PDF, EPUB, MOBI, XPS, FB2). Note that there should be no blank pages in the PDF.</td><td></td><td>true</td></tr><tr><td>page_range</td><td><code>integer</code></td><td>The last page you want to parse. default to -1, means all pages.</td><td>-1</td><td>false</td></tr><tr><td>parallel_factor</td><td><code>integer</code></td><td>Provide the parallel factor to use for OCR.</td><td>1</td><td>false</td></tr><tr><td>lang</td><td><code>string</code></td><td>Provide the language to use for OCR.</td><td>English</td><td>false</td></tr></tbody></table>

**Output Parameters**

| Name             | Type     | Description                      | File Type |
| ---------------- | -------- | -------------------------------- | --------- |
| markdown\_string | `string` | The markdown that was generated. |           |
| metadata\_string | `string` | The metadata of the pdf file.    |           |

**Output Example**

{% tabs %}
{% tab title="success" %}
{% code fullWidth="false" %}

```json
{
  "markdown_string": "\n## An H1 Header\n\nParagraphs are separated by a blank line.\n\n2nd paragraph. Italic, **bold**, and monospace. Itemized lists look like:\nthis one that one the other one Note that - not considering the asterisk - the actual text content starts at 4- columns in.\n\nBlock quotes are written like so. They can span multiple paragraphs, if you like.\n\nUse 3 dashes for an em-dash. Use 2 dashes for ranges (ex., “it’s all in chapters 12– 14”). Three dots … will be converted to an ellipsis. Unicode is supported. ☺\n\n## An H2 Header\n\nHere’s a numbered list:\n\n1. first item 2. second item 3. third item\nNote again how the actual text starts at 4 columns in (4 characters from the left side). Here’s a code sample:\n# Let me re-iterate ... for i in 1 .. 10 { do-something(i) }\nAs you probably guessed, indented 4 spaces. By the way, instead of indenting the block, you can use delimited blocks, if you like:\ndefine foobar() {\n    print \"Welcome to flavor country!\"; }\n(which makes copying \u0026 pasting easier). You can optionally mark the delimited block for Pandoc to syntax highlight it:\nimport time\n# Quick, count to ten!\n\nfor i in range(10): # (but not *too* quick) time.sleep(0.5) print i\n\n## An H3 Header\n\nNow a nested list:\n1. First, get these ingredients:\ncarrots celery lentils\n\n2. Boil some water. 3. Dump everything in the pot and follow this algorithm:\nfind wooden spoon uncover pot stir cover pot balance wooden spoon precariously on pot handle wait 10 minutes goto first step (or shut off burner when done)\nDo not bump wooden spoon or it will fall.\n\nNotice again how text always lines up on 4-space indents (including that last line which continues item 3 above). Here’s a link to a website, to a local doc, and to a section heading in the current doc. Here’s a footnote 1.\n\nTables can look like this:\nShoes, their sizes, and what they’re made of size material color\n9\nleather brown\n10\nhemp canvas natural\n11\nglass transparent\n(The above is the caption for the table.) Pandoc also supports multi-line tables:\n\n| keyword                   |\n|---------------------------|\n| red                       |\n| Sunsets, apples, and      |\n| other red or reddish      |\n| things.                   |\n| green                     |\n| Leaves, grass, frogs      |\n| and other things it’s not |\n| easy being.               |\n\nA horizontal rule follows. Here’s a definition list: apples Good for making applesauce. oranges Citrus! tomatoes There’s no “e” in tomatoe.\n\nAgain, text is indented 4 spaces. (Put a blank line between each term/definition pair to spread things out more.) Here’s a “line block”: Line one Line too Line tree and images can be specified like so:\n\n## Example Image\n\nInline math equations go in like so: ω = dϕ/dt. Display math should get its own line and be put in in double-dollarsigns:\n\n## I = ∫Ρr2Dv\n\nAnd note that you can backslash-escape any punctuation characters which you wish to be displayed literally, ex.: `foo`, *bar*, etc.\n\n1. Footnote text goes here.↩︎",
  "metadata_string": "{\"language\": \"English\", \"filetype\": \"pdf\", \"toc\": [[1, \"An h1 header\", 1], [2, \"An h2 header\", 1], [3, \"An h3 header\", 1]], \"pages\": 3, \"ocr_stats\": {\"ocr_pages\": 0, \"ocr_failed\": 0, \"ocr_success\": 0}, \"block_stats\": {\"header_footer\": 0, \"code\": 0, \"table\": 1, \"equations\": {\"successful_ocr\": 0, \"unsuccessful_ocr\": 0, \"equations\": 0}}, \"postprocess_stats\": {\"edit\": {}}}"
}
```

{% endcode %}
{% endtab %}
{% endtabs %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.myshell.ai/create/pro-config-mode/api-reference/widgets/8-pdf-to-markdown.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
