The best tools for transcribing audio with artificial intelligence

17 Min Read 1k Views

Media and research company Charter has launched Work Tech, a newsletter offering reviews of technology products. “Work Tech will provide independent, research-based assessments of technologies and management tools being created for the workplace, from artificial intelligence-based transcription to virtual meeting platforms for remote work.

To develop Work Tech, Charter has partnered with Glenn Fleischman, a veteran tech journalist with thirty years of product review experience. “Work Tech will be published weekly.

Research and Analysis Group InfoLight.UA publishes a translation of Charter’s review of AI audio transcription tools.

One of the most useful everyday applications of artificial intelligence is audio transcription based on neural networks, which are constantly improving.

AI-enabled transcription gained additional value during the pandemic, as video conferences could be easily recorded and transcripts shared with colleagues who were unable to join them. As hybrid work has become a long-term trend, automated transcription has grown with it: the tools have become much better than they were three years ago, the cost is lower, and the usefulness is proven. As companies try to reduce the number of people in meetings and the number of meetings in general, AI transcription eliminates FOMOOM – the fear of missing a meeting.

We tested several commercially available AI-based transcription tools and found that Fireflies, Rev Max, and Sonix provide very accurate transcription. Sonix and Rev Max were a little better at spelling people’s names, while Fireflies is by far the cheapest solution for the many dozens of hours of combining meetings and uploaded transcriptions each month.

The degree of accuracy provided by these three programs exceeds the level required for routine business purposes. They are cheap and fairly easy to use, so they are a perfectly acceptable option for creating searchable transcripts of everyday audio recordings, such as your team meetings and brainstorming sessions. They also remove the cognitive load and costs associated with keeping a separate specialist to take notes or debrief after the event.

For those who need verbatim transcripts for business, legal, journalistic, and other purposes, our selection offers three programs whose quality is high enough to go from near-accurate to accurate with little effort. Our selections also include tools for interactive transcription cleaning, which makes it easy to check the audio for any word or phrase while reviewing or improving your results. If you need accurate word-for-word results, taking the time to clean up can be much cheaper than ordering transcripts with the help of people. (Services, such as Rev, allow you to access the transcription by a human for a fee).

Live captioning, available in major video conferencing tools, as well as post-event transcription, increases accessibility in the enterprise for people who are deaf, hard of hearing, or have different ways of perceiving information, such as ADHD. Only one of our programs, Rev Max, provided such subtitles, and only in Zoom, as an alternative to Zoom’s built-in subtitles.

Our pics


Being the best in class with Fireflies for transcription purity, Sonix focuses on creating accurate meeting recordings with the exact voice of the speaker. However, being the best comes at a price, as Sonix charges for each minute used at one of two rates depending on the tariff plan.



  • Extremely high accuracy.
  • Correct spelling of many proper names without a dictionary.
  • Almost perfect recognition of individual speakers.


  • The fixed hourly cost makes it quite expensive for intensive monthly use.
  • Offers significant video conferencing integration, but lacks documentation.
  • The audio player didn’t work in Safari, only in Chrome.


Thanks to its superior transcription and speaker recognition, Rev.com’s automated service is only one or two steps behind its competitors. However, for those who spend endless hours in Zoom, the unlimited number of transcriptions for Zoom meetings can be a significant price advantage over Sonix. (If you need Rev.com quality for less than two hours a month, use Temi: it’s identical in every way except for the flat rate of $0.25 per minute.)



  • Very high accuracy.
  • Excellent voice recognition.
  • Integration with Zoom for live broadcasts or post-meeting processing.


  • Slightly more expensive per minute at over 20 hours per month for use without Zoom.
  • Some audio functions did not work in Safari, but worked in Chrome.


Better retention than Sonix transcriptions, excellent speaker identification, and even a near-perfect transcription of an AI meeting put Fireflies near the top of the list in terms of quality. The service also includes unlimited meeting transcriptions and 8,000 minutes of uploaded transcriptions in its $18 per month plan.



  • Extremely high accuracy.
  • Excellent speaker differentiation.
  • Significantly cheaper than all other services, with a significant monthly use of a mixture of uploaded and transcribed meetings.


  • Fireflies’ name recognition was slightly worse than that of its competitors, but it was not fraudulent.
  • Requires integration with Google Calendar or Outlook even to create an account, and there seems to be no way around this requirement.

How we researched

Out of the several companies that are fully focused on AI-based transcription or offer transcription alongside manual transcription or other services, we decided to take a closer look at six: Fireflies, MeetGeek, Otter.ai, Rev Max, Sonix, and Trint. (See “How we chose what to review” below for the rationale behind our choice.) We also looked at Temi, which is owned by Rev Max and is identical to it except for the price.

The cost and limitations of each service were considered, as well as the range of services they offer and integration with video conferencing services such as Meet, Teams, WebX, and Zoom.

For our test audio, we used the Spoken Word Turing Test: a podcast of eight fast-talking participants with overlapping commentary recorded on three continents. While everyone spoke English, accents included American (from the three parts of the county), New Zealand, and Canadian (Alberta), as well as non-native speakers from Germany and Sweden. Two of the speakers, from Wisconsin and Alberta, had voices that were difficult to distinguish by ear.

The three services we selected were able not only to distinguish between all speakers, including the two most similar-sounding voices, but also to transcribe speech correctly regardless of accent. This included the correct spelling of some non-English words.

What to watch out for:As AI technologies are constantly competing among large companies, expect further innovation and opportunities. We would like to see better summaries (see below). Also, keep an eye out for Whisper by OpenAI: the company has released a high-quality transcription engine under free and open source.

Our recommendations

AI-based voice recognition aims to provide a close enough reproduction of speech that it can be used in real time or later reviewed for reading, searching, and summarizing. No machine learning system promises 100% accuracy. Most services that provide an estimate usually claim that it is in the range of 90%-95%, which our testing has shown to be an accurate indicator. The services we liked the most are at the top of this range.

Each of our selections offers at least the following:

  • Speaker identification: Transcripts track multiple speakers and clearly label them, allowing you to change them later. Our top three apps did a great job with this task.
  • Meeting Integration: If a meeting recording is automatically converted into a transcription without additional effort, that’s a plus. Our top three apps provide this capability, although Fireflies and Sonix integrate best with Zoom, and Rev Max only works directly with Zoom.
  • Editing and annotating the transcript: All recommended services offer an editing interface to improve the transcript. Rev Max and Sonix provide better annotation capabilities than Fireflies.
  • Export to different formats: Getting a transcript from the system is generally easy in all the systems we have tested. Our top three programs support Word, PDF, and one or more standard time code-based subtitle formats.

Many of them also offer artificial intelligence-based annotations or keyword analysis, which allows you to familiarize yourself with the transcript at a glance and search for key information. However, these features are still in their infancy and vary too much in quality and usefulness across transcripts and services to be considered a criterion. (We liked the Fireflies app, but even it contained some ridiculous findings). This can change quickly.

We’ve also highlighted specific features that may be important to your decision among our three options, or other services (listed below) that have minor or major shortcomings but meet other needs:

Zoom integration: All of them allow you to download audio and integrate with Zoom. Fireflies and Sonix have automatic post-meeting processing for other popular video conferencing systems, and Rev Max offers subtitles for Zoom as an alternative to Zoom’s built-in captioning service.

Languages: Some services offer only English with an American or British accent, while others offer a wider range, up to recognizing 30 or more languages. All three services recognize an impressive number of accents, both native and foreign. However, if you need a non-English transcription, Sonix or Fireflies are the right starting points.

Editing: Sonix and Rev Max stand out from Fireflies due to their better tools for editing transcripts and annotations. All three programs provide a wide range of export options, including at least one popular option for video subtitles.

Price: Later in the article, we’ll provide comparisons for 20 hours and 60 hours of usage per month of the six services we researched, but from our top three, we rate as follows:

  • Fireflies is the cheapest in all per-minute payment scenarios for both small and large monthly transcription volumes. It offers an unlimited number of meeting transcriptions and 8,000 minutes (133.3 hours) of downloaded audio in the cheapest plan.
  • Sonix costs seven to 10 times more per minute than Fireflies because the company charges a flat hourly rate for all use: $10 per hour for pay-as-you-go and $5 per hour for a $22 per month ($100 per year) subscription.
  • Rev Max has the highest per-minute cost for uploaded files after 20 hours – $0.25 per minute. But this is greatly reduced by the inclusion of unlimited Zoom meeting transcription as part of a single-tiered service costing $29.95 per month.

You’ll need to check your needs and sign up for a free trial of each service to determine which one works best for you.

Among all of our top services and almost all of the others we tested, the cost can be extremely low, both in terms of value for money and compared to human transcription. Manual transcription costs from $0.75 to $2.50 per minute, depending on the level of accuracy required, the time required, and the amount of industry jargon in the source material. Services that offer human transcription during a meeting cost from $150 to $180 per hour ($2.50 to $3 per hour), with a minimum payment.

Automatic transcription is instant for live events, although the quality may be lower compared to on-demand or post-meeting processing. Offline transcription can take from a few seconds to more than a minute for each minute of source audio: an hour-long meeting can be ready from 10 to 60 minutes later, depending on the service’s promises and capabilities. We tested only those services that allow direct uploads in addition to meeting integration options.

Mobile applications

If you need an app that can be used for mobile recording and review of transcripts directly related to transcription, Fireflies and Sonix do not have smartphone options. Rev offers this capability, both under its own name and as part of a separate Temi fixed service, but these iPhone and Android apps only provide recording and service request functions. They do not offer integrated support for viewing and editing transcripts.

If a mobile application that displays transcripts is critical to your needs, consider Trint. The company offers full-featured recording apps for iOS and Android, with transcription quality almost as good as Sonix and Fireflies. The application synchronizes transcriptions with a central web application, allowing you to view them on the go. In our test, Trint received a low score for speaker identification.

However, the price of Trint is surprising. The initial plan costs $60 per month ($576 per year) for seven files per month. Upgrade to the Advanced plan for $75 per month ($720 per year), and transcription will be unlimited.

How we chose what to review

After compiling an exhaustive list of automated transcription services, we decided to take a closer look at Fireflies, MeetGeek, Otter.ai, Rev Max, Sonix, and Trint, as they had the right combination of integration with video conferencing services.

MeetGeek is focused on meetings, so it can’t change speaker assignments after transcription, although the quality of transcription is high. Otter.ai was a pioneer in AI-powered transcription, but its current transcription quality and speaker recognition were the worst we tested.

Speak AI and Speechtext.ai do not offer integration with meeting software and were not included in our study. Chorus and Gong include AI transcription forms as part of their customer experience management toolkits and cannot be evaluated separately.

Several services had a very narrow focus and did not fit our business heading: Alice (investigative journalism), Beey (professional video subtitling), and scribe.com (medical documentation and telemedicine). TranscribeMe notes that it is not ideal for low-quality audio, which is often the case in video conferencing and other special recordings. Verbit.ai doesn’t offer standard monthly plans; it only provides customized usage rates.

Share This Article
Leave a comment

Leave a Reply