AI Engines Speech

Speaker Detection

Partition audio files into segments to separate the words spoken by each speaker when.

Speaker Detection runs on the aiWARE Enterprise AI platform, which orchestrates a diverse ecosystem of ready-to-deploy machine learning models to transform audio, video, text, and other data sources into actionable intelligence, at scale, with no AI expertise. With aiWARE, leverage digital workers to save manual review time, gain valuable data insights, and cognitively enrich end-to-end workflows.

Speaker detection – also known as speaker separation or diarization – engines in the Veritone cognitive engine ecosystem distinguish between multiple speakers in audio and video by partitioning recordings and streams into multiple segments according to speaker.

Speaker detection determines when speakers change and possibly which speakers are the same person, but it does not identify the speakers, like speaker recognition. Speaker detection and recognition engines can be used in concert.

Speaker Detection Features:

Speaker Separated Transcripts
Export spoken word recordings as text transcripts in plain text, Microsoft Word, Timed Text Markup Language (TTML), WebVTT, and SubRip text formats via Veritone applications.
Searchable Results
Identify audio segments where individuals of interest are speaking quickly with searchable speaker detection engine output via API and Veritone applications.
Files or Stream Support
Detect speakers in short-form or long-form audio in audio and video recordings, streamed recordings, or live data streams.
Powered by an AI Ecosystem
Leverage advanced speaker detection machine learning algorithms from the Veritone managed cognitive engine ecosystem — including algorithms from Veritone, niche providers, and industry giants.
Assign and Edit Speakers Detected
Assign labels to speakers, edit existing speaker labels, and delete labels and spoken words for specific speakers in speaker detection transcripts via Veritone applications.
Near Real-Time Processing
Process audio and video files in near real-time for use cases requiring quick speaker detection turnaround.
Flexible Deployment
Deploy in a new or integrate into an existing application in the cloud via aiWARE GraphQL APIs, or with a subset that can be deployed on-premise via a Docker container. Learn more.

Speaker Recognition

Identify speakers in audio based on recordings of their voice.

Speaker Recognition runs on the aiWARE Enterprise AI platform, which orchestrates a diverse ecosystem of ready-to-deploy machine learning models to transform audio, video, text, and other data sources into actionable intelligence, at scale, with no AI expertise. With aiWARE, leverage digital workers to save manual review time, gain valuable data insights, and cognitively enrich end-to-end workflows.

Speaker recognition – often referred to as speaker identification – engines in the Veritone cognitive engine ecosystem identify when speakers change and who those speakers are in a piece of audio.

Speaker recognition expands upon the capabilities of speaker detection engines by identifying the individual whose voice was detected in addition to specifying the points of time in the file in which the person started and stopped speaking.

Speaker Recognition Features:

Trainable with Custom Libraries
Create custom models using unique voice identifiers and metadata with the Veritone Library application or your own to identify a custom set of speakers in audio files. Learn more.
Files or Stream Support
Recognize speakers in short-form or long-form audio in audio and video recordings, streamed recordings, or live data streams.
Powered by an AI Ecosystem
Leverage advanced speaker recognition machine learning algorithms from the Veritone managed cognitive engine ecosystem — including algorithms from Veritone, niche providers, and industry giants.
Near Real-Time Processing
Process audio and video files in near real-time for use cases requiring a quick speaker recognition turnaround.
Flexible Deployment
Deploy in a new or integrate into an existing application in the cloud via aiWARE GraphQL APIs, or with a subset that can be deployed on-premise via a Docker container. Learn more.

Transcription

Convert speech in audio or video files in 70 different languages and dialects into text transcripts.

Transcription runs on the aiWARE Enterprise AI platform, which orchestrates a diverse ecosystem of ready-to-deploy machine learning models to transform audio, video, text, and other data sources into actionable intelligence, at scale, with no AI expertise. With aiWARE, leverage digital workers to save manual review time, gain valuable data insights, and cognitively enrich end-to-end workflows.

Transcription — often referred to as speech-to-text — engines in the Veritone cognitive engine ecosystem convert spoken words in audio and video recordings into readable and searchable text. They are built and trained to transcribe different languages and dialects. Text analytics can then be applied to those transcriptions for further insight into what was spoken.

Transcription Features:

Broad Language Support
Convert speech to text in 70 different natural languages and dialects in the cloud, with a subset that can be deployed locally, to support a diverse user-base, workforce, or population.
Machine & Manual
Choose the right option for your use case – machine transcription with AI or leverage aiWARE to manually transcribe your audio data via Veritone partners.
Near Real-Time Processing
Process audio and video files in near real-time for use cases requiring quick text extraction.
Flexible Deployment
Deploy in a new or existing application in the cloud via aiWARE GraphQL APIs, or with a subset that can be deployed on-premise via a Docker container. Learn more
Text Transcripts
Export spoken word recordings as text transcripts in plain text, Microsoft Word, Timed Text Markup Language (TTML), WebVTT, and SubRip text formats via Veritone applications.
Searchable Results
Identify the keywords you are looking for quickly within transcripts with searchable transcription engine output via API or Veritone applications.
Files or Stream Support
Transform short-form or long-form audio into text in audio and video recordings, streamed recordings, or live data streams.
Powered by an AI Ecosystem
Leverage advanced transcription machine learning algorithms from the Veritone managed cognitive engine ecosystem — including algorithms from Veritone, niche providers, and industry giants.

AI Speech Engines

Speaker Detection

Speaker Recognition

Transcription

Speaker Detection

Partition audio files into segments to separate the words spoken by each speaker when.

Speaker Detection Features:

Speaker Separated Transcripts

Searchable Results

Files or Stream Support

Powered by an AI Ecosystem

Assign and Edit Speakers Detected

Near Real-Time Processing

Flexible Deployment

Speaker Recognition

Identify speakers in audio based on recordings of their voice.

Speaker Recognition Features:

Trainable with Custom Libraries

Files or Stream Support

Powered by an AI Ecosystem

Near Real-Time Processing

Flexible Deployment

Transcription

Convert speech in audio or video files in 70 different languages and dialects into text transcripts.

Transcription Features:

Broad Language Support

Machine & Manual

Near Real-Time Processing

Flexible Deployment

Text Transcripts

Searchable Results

Files or Stream Support

Powered by an AI Ecosystem

Your vision, powered by AI.