Whisper large-v3

Transcribe audio with url using faster whisper-large-v3. Without any post-processing

Try it in the Widget Center

Click this url to try this widget and copy the Pro Config template.

Usage

<TODO: enter description here, and remove useless inputs>

Input Parameters

NameTypeDescriptionDefaultRequired

voice_url

string

The url of the audio.

chunk_length_s

integer

The length of every chunk sent to whisper.

30

batch_size

integer

To accelerate the whisper process, used for parallel processing of audio.

24

return_timestamps

boolean

Whether we need timestamp information for the transcriptions.

True

diarize

boolean

Whether we need diarize the audio and return the speaker identification for sentences. If true, require `return_timestamps` to be true.

False

Output Parameters

NameTypeDescriptionFile Type

text

string

The transcription of the given audio

chunks

array

if `return_timestamps` is set to true. Chunks with varying lengths, containing information such as text, timestamp, speaker, etc

Output Example

{ // input https://cdn.myshell.ai/audio/chat/embed_obj/40295/20240423/57aea65228cc4f63895a35a262f8133f.mp3
  "chunks": [
    {
      "speaker": null,
      "text": " On the show today, cuts both ways.",
      "timestamp": [
        0,
        4.8
      ]
    },
    {
      "speaker": null,
      "text": " And so far, we have talked about humans.",
      "timestamp": [
        5.6,
        7.94
      ]
    },
    {
      "speaker": null,
      "text": " Now let's talk about machines and the strengths and weaknesses of artificial intelligence.",
      "timestamp": [
        8.48,
        15.22
      ]
    },
    {
      "speaker": null,
      "text": " Because there are so many things that make bots like ChatGPT amazing.",
      "timestamp": [
        15.32,
        21.06
      ]
    },
    {
      "speaker": null,
      "text": " It's able to pass the bar exam or college exams as long as plenty of those exam examples",
      "timestamp": [
        21.46,
        27.9
      ]
    },
    {
      "speaker": null,
      "text": " were in the training data. It can also write poetry, compose music, summarize vast amounts",
      "timestamp": [
        27.9,
        33.92
      ]
    },
    {
      "speaker": null,
      "text": " of data in a fluid, human-like way. Some people are genuinely wowed by that, like, oh my goodness,",
      "timestamp": [
        33.92,
        40.68
      ]
    },
    {
      "speaker": null,
      "text": " Chachapiti is so creative. And making powerful AI so easy to use has resurfaced a long-running debate.",
      "timestamp": [
        41,
        48.62
      ]
    },
    {
      "speaker": null,
      "text": " Whether AI will save us or steal all our jobs and kill us.",
      "timestamp": [
        49.74,
        55.02
      ]
    },
    {
      "speaker": null,
      "text": " I'd say either is a possibility.",
      "timestamp": [
        55.58,
        58.48
      ]
    },
    {
      "speaker": null,
      "text": " This is Yejin Choi.",
      "timestamp": [
        58.9,
        60.42
      ]
    },
    {
      "speaker": null,
      "text": " And we don't know what's going to happen for sure.",
      "timestamp": [
        60.42,
        64.08
      ]
    },
    {
      "speaker": null,
      "text": " Choi. And we don't know what's going to happen for sure. She's an AI expert, a MacArthur Genius Award winner, and a professor at the University of Washington. And that means a lot is up to us",
      "timestamp": [
        67.04,
        74.68
      ]
    },
    {
      "speaker": null,
      "text": " to shape the future. So Yejin, you gave a TED Talk recently about AI and a big conundrum as you see it. AI today is unbelievably intelligent",
      "timestamp": [
        74.68,
        87.98
      ]
    },
    {
      "speaker": null,
      "text": " and then shockingly stupid. You used an example of drying laundry to demonstrate",
      "timestamp": [
        87.98,
        93.72
      ]
    },
    {
      "speaker": null,
      "text": " how literal it can be. So suppose I left five clothes to dry out in the sun and it took them",
      "timestamp": [
        93.72,
        101.34
      ]
    },
    {
      "speaker": null,
      "text": " five hours to dry completely. How long would it take to dry 30 clothes?",
      "timestamp": [
        101.34,
        106.66
      ]
    },
    {
      "speaker": null,
      "text": " Yeah, it doesn't matter how many pieces of clothes. It takes the same five hours.",
      "timestamp": [
        107.28,
        111.64
      ]
    },
    {
      "speaker": null,
      "text": " But...",
      "timestamp": [
        112.94,
        113.3
      ]
    },
    {
      "speaker": null,
      "text": " GPT-4, the newest, greatest AI system, says 30 hours. Not good.",
      "timestamp": [
        113.3,
        118.18
      ]
    },
    {
      "speaker": null,
      "text": " ChatGPT supposedly learns from its mistakes. But I tried asking it something similar,",
      "timestamp": [
        119.62,
        124.86
      ]
    },
    {
      "speaker": null,
      "text": " with a couple linguistic variations, and things didn't go any better.",
      "timestamp": [
        124.86,
        129.08
      ]
    },
    {
      "speaker": null,
      "text": " If it takes seven hours to dry five pieces of clothes in the sun, how long will it take to dry ten clothes?",
      "timestamp": [
        129.98,
        138.18
      ]
    },
    {
      "speaker": null,
      "text": " The drying time in your question seems to be directly proportional to the number of clothes.",
      "timestamp": [
        138.52,
        142.86
      ]
    },
    {
      "speaker": null,
      "text": " So, if you have 10 pieces of clothes,",
      "timestamp": [
        143.16,
        145.32
      ]
    },
    {
      "speaker": null,
      "text": " it will take 14 hours to dry all of them, assuming the conditions remain the same.",
      "timestamp": [
        145.52,
        149.7
      ]
    },
    {
      "speaker": null,
      "text": " Okay, let's ask again. Here's a new one. If it takes one shirt three hours and one pair of pants",
      "timestamp": [
        150.12,
        157.1
      ]
    },
    {
      "speaker": null,
      "text": " four hours to dry in the sun, how long would it take two shirts to dry? In this scenario,",
      "timestamp": [
        157.1,
        164.32
      ]
    },
    {
      "speaker": null,
      "text": " given that it takes three hours to dry one shirt, it would take twice that time to dry two shirts.",
      "timestamp": [
        164.64,
        169.28
      ]
    },
    {
      "speaker": null,
      "text": " So it would take six hours to dry two shirts.",
      "timestamp": [
        169.5,
        171.94
      ]
    },
    {
      "speaker": null,
      "text": " Oh, poor chat GPT.",
      "timestamp": [
        172.44,
        173.94
      ]
    },
    {
      "speaker": null,
      "text": " You are not going to have a lot of clothes to wear.",
      "timestamp": [
        174.04,
        175.82
      ]
    },
    {
      "speaker": null,
      "text": " You and I, once we understood a concept, then no matter how we rephrase the question, no matter how we ask the question",
      "timestamp": [
        176.96,
        186.38
      ]
    },
    {
      "speaker": null,
      "text": " differently, to us, it's the same question. So we can answer them correctly. It's strange why",
      "timestamp": [
        186.38,
        193.96
      ]
    },
    {
      "speaker": null,
      "text": " such an impressive AI that can even pass the bar exam struggles with little variations of the same",
      "timestamp": [
        193.96,
        201.88
      ]
    },
    {
      "speaker": null,
      "text": " question that requires just common sense. But it's not surprising if you know how AI is trained.",
      "timestamp": [
        201.88,
        208.74
      ]
    },
    {
      "speaker": null,
      "text": " It's trained to predict which word will come next.",
      "timestamp": [
        209.8,
        213.16
      ]
    },
    {
      "speaker": null,
      "text": " It's just reading a lot of data and try to learn the patterns behind the data.",
      "timestamp": [
        213.68,
        218.44
      ]
    },
    {
      "speaker": null,
      "text": " So it's not trained to do critical reasoning.",
      "timestamp": [
        218.58,
        221.4
      ]
    },
    {
      "speaker": null,
      "text": " And having common sense means applying reasoning to all sorts of scenarios,",
      "timestamp": [
        221.4,
        227.12
      ]
    },
    {
      "speaker": null,
      "text": " which computers can't do, at least not like humans. So common sense is what's strikingly",
      "timestamp": [
        227.48,
        235.02
      ]
    },
    {
      "speaker": null,
      "text": " easy for humans, but surprisingly hard for machines. It's everyday knowledge that you and",
      "timestamp": [
        235.02,
        241.6
      ]
    },
    {
      "speaker": null,
      "text": " I have about different objects and events that we interact",
      "timestamp": [
        241.6,
        245.5
      ]
    },
    {
      "speaker": null,
      "text": " with in life. And it's been a longstanding challenge in AI field. Yeah, I like drawing",
      "timestamp": [
        245.5,
        253.98
      ]
    },
    {
      "speaker": null,
      "text": " inspirations from humans because when children grow up, it's not the case that we just feed them",
      "timestamp": [
        253.98,
        260.32
      ]
    },
    {
      "speaker": null,
      "text": " with internet data and then let them figure out on their own. Actually, the outcome of that",
      "timestamp": [
        260.32,
        265.92
      ]
    },
    {
      "speaker": null,
      "text": " would be pretty horrible. Yes. And so what do we do to prevent it is to tell them in a more",
      "timestamp": [
        265.92,
        273.44
      ]
    },
    {
      "speaker": null,
      "text": " declarative form what's right and what's wrong. You mean like don't hit somebody? Yeah. For",
      "timestamp": [
        273.44,
        280.2
      ]
    },
    {
      "speaker": null,
      "text": " example, we tell them that it's not right to kill people or, you know, it's not polite to yell at people, even if they get angry.",
      "timestamp": [
        280.2,
        289.48
      ]
    },
    {
      "speaker": null,
      "text": " We teach them a lot of these things from early on in their lives.",
      "timestamp": [
        289.68,
        294.1
      ]
    },
    {
      "speaker": null,
      "text": " So if most AI models are learning from the vast amount of information that's available online,",
      "timestamp": [
        294.94,
        300
      ]
    }
  ],
  "text": " On the show today, cuts both ways. And so far, we have talked about humans. Now let's talk about machines and the strengths and weaknesses of artificial intelligence. Because there are so many things that make bots like ChatGPT amazing. It's able to pass the bar exam or college exams as long as plenty of those exam examples were in the training data. It can also write poetry, compose music, summarize vast amounts of data in a fluid, human-like way. Some people are genuinely wowed by that, like, oh my goodness, Chachapiti is so creative. And making powerful AI so easy to use has resurfaced a long-running debate. Whether AI will save us or steal all our jobs and kill us. I'd say either is a possibility. This is Yejin Choi. And we don't know what's going to happen for sure. Choi. And we don't know what's going to happen for sure. She's an AI expert, a MacArthur Genius Award winner, and a professor at the University of Washington. And that means a lot is up to us to shape the future. So Yejin, you gave a TED Talk recently about AI and a big conundrum as you see it. AI today is unbelievably intelligent and then shockingly stupid. You used an example of drying laundry to demonstrate how literal it can be. So suppose I left five clothes to dry out in the sun and it took them five hours to dry completely. How long would it take to dry 30 clothes? Yeah, it doesn't matter how many pieces of clothes. It takes the same five hours. But... GPT-4, the newest, greatest AI system, says 30 hours. Not good. ChatGPT supposedly learns from its mistakes. But I tried asking it something similar, with a couple linguistic variations, and things didn't go any better. If it takes seven hours to dry five pieces of clothes in the sun, how long will it take to dry ten clothes? The drying time in your question seems to be directly proportional to the number of clothes. So, if you have 10 pieces of clothes, it will take 14 hours to dry all of them, assuming the conditions remain the same. Okay, let's ask again. Here's a new one. If it takes one shirt three hours and one pair of pants four hours to dry in the sun, how long would it take two shirts to dry? In this scenario, given that it takes three hours to dry one shirt, it would take twice that time to dry two shirts. So it would take six hours to dry two shirts. Oh, poor chat GPT. You are not going to have a lot of clothes to wear. You and I, once we understood a concept, then no matter how we rephrase the question, no matter how we ask the question differently, to us, it's the same question. So we can answer them correctly. It's strange why such an impressive AI that can even pass the bar exam struggles with little variations of the same question that requires just common sense. But it's not surprising if you know how AI is trained. It's trained to predict which word will come next. It's just reading a lot of data and try to learn the patterns behind the data. So it's not trained to do critical reasoning. And having common sense means applying reasoning to all sorts of scenarios, which computers can't do, at least not like humans. So common sense is what's strikingly easy for humans, but surprisingly hard for machines. It's everyday knowledge that you and I have about different objects and events that we interact with in life. And it's been a longstanding challenge in AI field. Yeah, I like drawing inspirations from humans because when children grow up, it's not the case that we just feed them with internet data and then let them figure out on their own. Actually, the outcome of that would be pretty horrible. Yes. And so what do we do to prevent it is to tell them in a more declarative form what's right and what's wrong. You mean like don't hit somebody? Yeah. For example, we tell them that it's not right to kill people or, you know, it's not polite to yell at people, even if they get angry. We teach them a lot of these things from early on in their lives. So if most AI models are learning from the vast amount of information that's available online,"
}

Detailed Guidelines

Last updated