Enable Transcription and Summarisation

This is a guide to enable 100ms post call transcription with speaker labels and AI-generated summary. The feature is currently in Beta. In case you're looking to enable live transcription, refer to this documentation.

Getting Started

Since recording is a pre-requisite to having the transcript, you will need to start the recording either through the SDK during the session, or through the Recording API or Live Stream API with recording configured. You can use auto-start in Recordings to auto-transcribe all room sessions condition to transcription being enabled.

Flow The above flowchart shows the entire workflow of transcript and summary generation and their consumption.

Enabling Transcription and Summarisation

Method 1: Dashboard Implementation

You can enable transcription for all the rooms under a particular template.

  1. Access an existing template via the sidebar.

  2. Navigate to the Transcription (Beta) tab in the template configuration.

  3. In the second card which says Post Call Transcription, enable the Transcribe Recordings toggle.

  4. Enabling Post Call Transcription will expose an extra configuration called Output Modes just below the toggle. File format of the transcript output can be set using this. Following file formats are offered:

    • Text (.txt)
    • Subtitle (.srt)
    • Structured (.json) The example output for the above can be seen here.
  5. In the same card, enable the Summarise Transcripts toggle. This will take the default settings for summary.

  6. Save the configuration.

  7. Join a room to initiate a session. Start recording (or live stream with recording enabled) using the SDK or API. If it's your first time joining a 100ms room, you'll find the option to Start Recording in the created room. For more information on creating room templates, refer to this documentation.

Advanced Transcription Settings

When you enable the Transcribe Recordings toggle, you will observe an new card pop up below with Advanced Transcription Configuration as its heading. This will contain advanced settings which will be applicable for both Live Transcription (HLS) as well as Post Call Transcription.

Advanced settings

  • Custom Vocabulary: Add non-dictionary words which are expected to be spoken to enhance recognition. Useful for names, abbreviations, slang, technical jargon, and more.
  • Language: Configure the primary spoken language that has to be transcribed. This will hint the AI model to perform transcription more accurately. Currently, only English is supported, which is the default. Support for more languages will follow soon.

Note - Dashboard Implementation Default Values

The following are the default values used in the template for transcription and summary. If you want to understand more about these and use custom values, you can refer to our Policy API.

{ "transcription": { "modes": ["recorded"], "outputModes": ["txt", "srt","json"], "customVocabulary": [], "summary": { "enabled": true, "context": "", "sections": [ { "title": "Agenda", "format": "bullets" }, { "title": "Key Points Discussed", "format": "bullets" }, { "title": "Follow Up Action Items", "format": "bullets" }, { "title": "Short Summary", "format": "paragraph" } ], "temperature": 0.5 } } }

Example Output Files

Transcripts can be generated in the form of a txt, srt or a json file. Summaries are generated in the json file format only. Following are example outputs for reference:

John: Hello, hello, hello! How's your day been? Sarah: Hey, long time no see! What have you been up to?

Consuming Transcripts and Summaries

The transcripts and summaries will be saved as Recording Assets. If you’ve configured storage on 100ms, they’ll be stored in your cloud bucket. Otherwise, they’ll be stored in 100ms’ storage for the same duration as the recording.

There are three ways to consume the generated transcripts and summaries.

Method 1: Dashboard

Once you’ve recorded a session with transcription and summary enabled, you can expect recording assets to be ready within 20% of the recording duration time period.

To access transcripts and summaries on the dashboard:

  1. Navigate to the Sessions tab in the sidebar to view previous sessions.
  2. Locate the session with transcription enabled. The Recording Status column will indicate the status of the recording.
  3. Click on the Completed status of the chosen session ID.
  4. This will open the Session Details page. Access the Recording Log to find the available assets and view them.
  5. Click on View Assets to open a pop-up with pre-signed URLs for the recording, chat, transcripts, and summary.

transcription-consumption

Limitations

  1. The transcription and summary won't be available immediately. The processing and delivery will take a minimum of 5 minutes and 20% of the recording duration to be generated.
  2. This feature does not work with SFU Recordings (room-composite-legacy).
  3. Presently, you can only input a maximum of 6 sections in the summary through the API.
  4. There are chances of incomplete summary generation in case of recordings which are longer than 90 minutes.
  5. This feature only supports the English language as of now.

Frequently Asked Questions (FAQs)

  1. How many languages are supported?

Presently, only English language is supported. But support for other popular languages like French, Portuguese, Spanish and more is coming soon.

  1. What happens if multiple languages are being spoken in the live stream?

The bits which are spoken in the selected language will be transcribed. There might be some gibberish text though due to hallucinations by the AI.

  1. Can multiple summaries be generated or re-generated with different prompts for the same transcript?

This is not possible at this point of time. But we intend to bring the functionality of re-running transcription and summarization functions to enable users to test and build their own summaries.

  1. What can be done if the speaker label is not working?

There are two possible options here:

  1. If you are using an older webSDK version, please update to the latest. Refer to our web documentation here.
  2. Add the following snippet to enable speaker logging.
    • Create a new file src/components/AudioLevel/BeamSpeakerLabelsLogging.jsx containing the following code snippet.
    import { useEffect } from "react"; import { useHMSActions } from "@100mslive/react-sdk"; import { useIsHeadless } from "../AppData/useUISettings"; export function BeamSpeakerLabelsLogging() { const hmsActions = useHMSActions(); const isHeadless = useIsHeadless(); useEffect(() => { if (isHeadless) { hmsActions.enableBeamSpeakerLabelsLogging(); } }, [hmsActions, isHeadless]); return null; }
    • Register <BeamSpeakerLabelsLogging /> in Approutes along with import statement
      import { BeamSpeakerLabelsLogging } from "./components/AudioLevel/BeamSpeakerLabelsLogging";

Have a suggestion? Recommend changes ->

Run in postman

Was this helpful?

1234