Amazon Transcribe POC

Taegu KangTaegu Kang
1 min read
  • pricing

Based on audio voice time // 0.024 USD per minute // ap-northeast-2

  • performance

Coverting MP4 to SRT from 12 minutes video takes 1 minute.

  • intput

Amazon S3 // MP3, MP4, WAV, FLAC, AMR, OGG, and WebM.

  • output

Amazon S3 // SRT, Text, VTT

  • alternative

Naver CLOVA Note - https://clovanote.naver.com/

Google STT AI - https://cloud.google.com/speech-to-text?hl=ko

OpenAI Whisper - https://platform.openai.com/docs/guides/speech-to-text

  • note

If you are going to integrate with AWS services, you must use it. It is absolute in terms of network cost and architecture.

Real-time voice recognition is also possible, but is not considered as it is poor in performance and accuracy.

  • reference

https://aws.amazon.com/transcribe/

https://aws.amazon.com/ko/blogs/korea/amazon-transcribe-now-supports-speech-to-text-in-korean/

https://www.awsgeek.com/Amazon-Transcribe/

  • comparison
Pricing (60s)Limit
Amazon
Transcribe32 KRW2 GB
OpenAI Whisper8 KRW25 MB
Clova Voice60 KRW2 GB
0
Subscribe to my newsletter

Read articles from Taegu Kang directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Taegu Kang
Taegu Kang