05/22/2025 | News release | Distributed by Public on 05/22/2025 13:51
So, you’ve built an amazing conversational AI app using Agora, maybe following our main Conversational AI Guide. Your users can chat with an AI, just like talking to a person. But what about seeing the conversation? That’s where text streaming comes in.
This guide focuses on adding real-time text transcriptions to your audio-based AI conversations. Think of it as subtitles for your AI chat.
Why bother adding text when the primary interaction is voice? Good question! Here’s why it’s a game-changer:
Ready to add this superpower to your app? Let’s dive in.
Adding text streaming involves three main players in your codebase:
Here’s a simplified view of how they communicate with each other:
Essentially, the raw data comes from Agora, the MessageEngine makes sense of it, updates the main ConversationComponent's state, which then passes the necessary info down to the ConvoTextStream to show the user.
Understanding how a single transcription message travels from the network to the screen is key:
This efficient pipeline ensures that text appears smoothly and in real-time, correctly handling messages that are still being typed out (“in-progress”), fully delivered (“completed”), or cut off (“interrupted”).
The MessageEngine needs to understand different kinds of transcription messages flowing through the RTC data channel. Here are the main ones:
// Represents a transcription of the user's speechexportinterface IUserTranscription extendsITranscriptionBase { object: ETranscriptionObjectType.USER_TRANSCRIPTION; // Identifies as "user.transcription"final: boolean; // Is this the final, complete transcription? (true/false)}
This tells us what the speech-to-text system thinks the user said. The final flag is important – intermediate results might change slightly.
// Represents a transcription of the AI agent's speechexportinterface IAgentTranscription extendsITranscriptionBase { object: ETranscriptionObjectType.AGENT_TRANSCRIPTION; // Identifies as "assistant.transcription"quiet: boolean; // Was this generated during a quiet period? (Useful for debugging)turn_seq_id: number; // Unique ID for this conversational turnturn_status: EMessageStatus; // Is this message IN_PROGRESS, END, or INTERRUPTED?}
This is the text the AI is generating, often sent word-by-word or phrase-by-phrase to the text-to-speech engine and to our MessageEngine for display. The turn_status is crucial for knowing when the AI starts and finishes speaking.
// Signals that a previous message was interruptedexportinterface IMessageInterrupt { object: ETranscriptionObjectType.MSG_INTERRUPTED; // Identifies as "message.interrupt"message_id: string; // Which message got interrupted?data_type: 'message'; turn_id: number; // The turn ID of the interrupted messagestart_ms: number; // Timestamp infosend_ts: number; // Timestamp info}
This happens if, for example, the user starts talking while the AI is still speaking. The MessageEngine uses this to mark the AI's interrupted message accordingly in the UI.
The MessageEngine intelligently handles these different types:
It juggles all this using an internal queue and state management so your UI component doesn’t have to worry about the raw complexity.
The MessageEngine (lib/message.ts) is where the magic happens. You don't need to build it; Agora provides it. Its main jobs are:
Every message tracked by the engine has a status:
exportenum EMessageStatus { IN_PROGRESS = 0, // Still being received/streamed (e.g., AI is talking)END = 1, // Finished normally.INTERRUPTED = 2, // Cut off before completion.}
This helps your UI know how to display each message (e.g., add a “…” or a pulsing animation for IN_PROGRESS messages).
The engine can process incoming agent text in different ways:
exportenum EMessageEngineMode { TEXT = 'text', // Treats each agent message chunk as a complete block. Simpler, less "streaming" feel.WORD = 'word', // Processes agent messages word-by-word if timing info is available. Gives that nice streaming effect.AUTO = 'auto', // The engine decides! If word timings are present, it uses WORD mode; otherwise, TEXT mode. (Recommended)}
Using AUTO mode is generally the easiest way to start. The engine adapts based on the data it receives from the backend conversational AI service. If the service sends detailed word timings, you get smooth streaming; if not, it falls back gracefully to showing text blocks.
The MessageEngine ultimately provides your application (via its callback) with a list of message objects ready for display:
exportinterface IMessageListItem { uid: number | string; // Who sent this? User's numeric UID or Agent's string/numeric UID (often 0 or a specific string like "Agent").turn_id: number; // Helps keep track of conversational turns.text: string; // The actual words to display.status: EMessageStatus; // The current status (IN_PROGRESS, END, INTERRUPTED).}
Your UI component just needs to render a list of these objects.
You’ll typically initialize the MessageEngine within your main ConversationComponent, probably inside a useEffect hook that runs once the Agora RTC client is ready.
// Inside ConversationComponent.tsxconstclient = useRTCClient(); // Get the Agora client instanceconst[messageList, setMessageList] = useState<_imessagelistitem5b_5d_>([]); const[currentInProgressMessage, setCurrentInProgressMessage] = useStatenull>(null); constmessageEngineRef = useRefnull>(null); constagentUID = process.env.NEXT_PUBLIC_AGENT_UID || 'Agent'; // Get your agent's expected UIDuseEffect(() =>{ // Only initialize once the client exists and we haven't already started the engineif(client && !messageEngineRef.current) { console.log('Initializing MessageEngine...'); // Create the engine instanceconstengine = newMessageEngine( client, EMessageEngineMode.AUTO, // Use AUTO mode for adaptive streaming// This callback function is the critical link!// It receives the updated message list whenever something changes.(updatedMessages: IMessageListItem[]) =>{ // 1. Always sort messages by turn_id to ensure chronological orderconstsortedMessages = [...updatedMessages].sort( (a, b) =>a.turn_id - b.turn_id ); // 2. Find the *latest* message that's still streaming (if any)// We handle this separately for smoother UI updates during streaming.constinProgressMsg = sortedMessages.findLast( (msg) =>msg.status === EMessageStatus.IN_PROGRESS ); // 3. Update component state:// - messageList gets all *completed* or *interrupted* messages.// - currentInProgressMessage gets the single *latest* streaming message.setMessageList( sortedMessages.filter( (msg) =>msg.status !== EMessageStatus.IN_PROGRESS ) ); setCurrentInProgressMessage(inProgressMsg || null); } ); // Store the engine instance in a refmessageEngineRef.current = engine; // Start the engine's processing loop// legacyMode: false is recommended for newer setupsmessageEngineRef.current.run({ legacyMode: false}); console.log('MessageEngine started.'); } // Cleanup function: Stop the engine when the component unmountsreturn() =>{ if(messageEngineRef.current) { console.log('Cleaning up MessageEngine...'); messageEngineRef.current.cleanup(); messageEngineRef.current = null; } }; }, [client]); // Dependency array ensures this runs when the client is ready
Let’s break down that crucial callback function:
Now, let’s look at the example ConvoTextStream component (components/ConvoTextStream.tsx). Its job is to take the message data from the ConversationComponent and make it look like a chat interface.
It needs data from its parent (ConversationComponent):
interface ConvoTextStreamProps { // All the messages that are done (completed or interrupted)messageList: IMessageListItem[]; // The single message currently being streamed by the AI (if any)currentInProgressMessage?: IMessageListItem | null; // The UID of the AI agent (so we can style its messages differently)agentUID: string | number | undefined; }
These props are directly populated from the state variables (messageList, currentInProgressMessage) that our MessageEngine callback updates in the ConversationComponent.
A good chat UI needs more than just displaying text. Our example focuses on:
Users hate losing their place when new messages arrive, unless they’re already at the bottom wanting to see the latest.
// Ref for the scrollable chat areaconstscrollRef = useRef( null); // State to track if we should automatically scroll downconst[shouldAutoScroll, setShouldAutoScroll] = useState(true); // Function to force scroll to the bottomconstscrollToBottom = () =>{ scrollRef.current?.scrollTo({ top: scrollRef.current.scrollHeight, behavior: 'smooth', // Optional: make it smooth}); }; // Detects when the user scrolls manuallyconsthandleScroll = () =>{ if(!scrollRef.current) return; const{ scrollHeight, scrollTop, clientHeight } = scrollRef.current; // Is the user within ~100px of the bottom?constisNearBottom = scrollHeight - scrollTop - clientHeight < 100; // Only auto-scroll if the user is near the bottomif(isNearBottom !== shouldAutoScroll) { setShouldAutoScroll(isNearBottom); } }; // Effect to actually perform the auto-scroll when neededuseEffect(() =>{ // Check if a new message arrived OR if we should be auto-scrollingconsthasNewMessage = messageList.length > prevMessageLengthRef.current; // Track previous lengthif((hasNewMessage || shouldAutoScroll) && scrollRef.current) { scrollToBottom(); } // Update previous length ref for next renderprevMessageLengthRef.current = messageList.length; }, [messageList, currentInProgressMessage?.text, shouldAutoScroll]); // Re-run when messages change or scroll state changes// Add the onScroll handler to the scrollable div//
This logic ensures:
The previous useEffect for scrolling might trigger on every single word update if using WORD mode. This can feel jittery. We can improve this by only scrolling significantly when enough new text has arrived.
// --- Add these refs ---constprevMessageTextRef = useRef(''); // Track the text of the last in-progress messageconstsignificantChangeScrollTimer = useRefnull>(null); // Timer ref // --- New function to check for significant change ---consthasContentChangedSignificantly = (threshold = 20): boolean=>{ if(!currentInProgressMessage) returnfalse; constcurrentText = currentInProgressMessage.text || ''; consttextLengthDiff = currentText.length - prevMessageTextRef.current.length; // Only trigger if a decent chunk of text arrivedconsthasSignificantChange = textLengthDiff >= threshold; // Update the ref *only if* it changed significantlyif( hasSignificantChange || currentInProgressMessage.status !== EMessageStatus.IN_PROGRESS ) { prevMessageTextRef.current = currentText; } returnhasSignificantChange; }; // --- Modify the scrolling useEffect ---useEffect(() =>{ consthasNewCompleteMessage = messageList.length > prevMessageLengthRef.current; conststreamingContentChanged = hasContentChangedSignificantly(); // Use the new check// Clear any pending scroll timer if conditions changeif(significantChangeScrollTimer.current) { clearTimeout(significantChangeScrollTimer.current); significantChangeScrollTimer.current = null; } if( (hasNewCompleteMessage || (streamingContentChanged && shouldAutoScroll)) && scrollRef.current ) { // Introduce a small delay to batch scrolls during rapid streamingsignificantChangeScrollTimer.current = setTimeout(() =>{ scrollToBottom(); significantChangeScrollTimer.current = null; }, 50); // 50ms delay, adjust as needed} prevMessageLengthRef.current = messageList.length; // Cleanup timer on unmountreturn() =>{ if(significantChangeScrollTimer.current) { clearTimeout(significantChangeScrollTimer.current); } }; }, [messageList, currentInProgressMessage?.text, shouldAutoScroll]);
This refined approach checks if more than, say, 20 characters have been added to the streaming message before triggering a scroll, making the experience smoother. It also uses a small setTimeout to batch scrolls that happen in quick succession.
We need to decide when and how to show the currentInProgressMessage:
// Helper to decide if the streaming message should be shownconstshouldShowStreamingMessage = (): boolean=>{ return( // Is there an in-progress message?currentInProgressMessage !== null&& // Is it *actually* in progress?currentInProgressMessage.status === EMessageStatus.IN_PROGRESS && // Does it have any text content yet?currentInProgressMessage.text.trim().length > 0); }; // In the JSX, combine the lists for rendering:constallMessagesToRender = [...messageList]; if(shouldShowStreamingMessage() && currentInProgressMessage) { // Add the streaming message to the end of the list to be renderedallMessagesToRender.push(currentInProgressMessage); } // Then map over `allMessagesToRender`// {allMessagesToRender.map((message, index) => ( ... render message bubble ... ))}
This ensures we only render the streaming message bubble when it’s actively receiving non-empty text.
Basic UI controls enhance usability:
const[isOpen, setIsOpen] = useState(false); // Is the chat window visible?const[isChatExpanded, setIsChatExpanded] = useState(false); // Is it in expanded mode?consthasSeenFirstMessageRef = useRef(false); // Track if the user has interacted or seen the first message // Toggle chat open/closedconsttoggleChat = () =>{ constnewState = !isOpen; setIsOpen(newState); // If opening, mark that the user has now 'seen' the chatif(newState) { hasSeenFirstMessageRef.current = true; } }; // Toggle between normal and expanded heightconsttoggleChatExpanded = () =>{ setIsChatExpanded(!isChatExpanded); }; // --- Auto-Open Logic ---useEffect(() =>{ consthasAnyMessage = messageList.length > 0|| shouldShowStreamingMessage(); // If there's a message, we haven't opened it yet automatically, and it's currently closed...if(hasAnyMessage && !hasSeenFirstMessageRef.current && !isOpen) { setIsOpen(true); // Open it!hasSeenFirstMessageRef.current = true; // Mark as seen/auto-opened} }, [messageList, currentInProgressMessage, isOpen]); // Rerun when messages or open state change
This includes logic to automatically pop open the chat window the first time a message appears, but only if the user hasn’t manually closed it or interacted with it before.
The core rendering logic maps over the combined message list (allMessagesToRender) and creates styled divs for each message:
// Inside the map function:key={`${message.turn_id}-${message.uid}-${index}`} // More robust keyref={index === allMessagesToRender.length - 1? lastMessageRef : null} // Ref for potential scrolling logicclassName={cn( 'flex items-start gap-2 w-full mb-2', // Basic layout styles// Is this message from the AI? Align left. Otherwise, align right.message.uid === 0|| message.uid.toString() === agentUID ? 'justify-start': 'justify-end')} > {/* Conditionally render Avatar based on sender if needed */} {/* {isAgent && } */} {/* Message Bubble */} 'max-w-[80%] rounded-xl px-3 py-2 text-sm md:text-base shadow-sm', // Slightly softer corners, shadowisAgent ? 'bg-gray-100 text-gray-800': 'bg-blue-500 text-white', // Optional: Dim user message slightly if interrupted while IN_PROGRESSmessage.status === EMessageStatus.IN_PROGRESS && !isAgent && 'opacity-80')} > {message.text}
This uses tailwindcss and the shadcn utility for conditional classes to:
Integrating the ConvoTextStream into your main ConversationComponent is straightforward once the MessageEngine is initialized and managing state.
// Inside ConversationComponent's return statement// ... other UI elements like connection status, microphone button ...return( <divclassName="relative flex flex-col h-full">{/* ... Other UI ... */} {/* Pass the state managed by MessageEngine's callback */} <ConvoTextStreammessageList={messageList}currentInProgressMessage={currentInProgressMessage}agentUID={agentUID}// Passtheagent'sUID/>{/* ... Microphone Button etc ... */} div>);
And that’s the core integration! The MessageEngine handles the data flow from RTC, updates the state in ConversationComponent, which then passes the formatted data down to ConvoTextStream for display.
The provided ConvoTextStream is a starting point. You'll likely want to customize its appearance.
Modify the tailwindcss classes within ConvoTextStream.tsx to match your app's design system. Change colors, fonts, padding, border-radius, etc.
// Example: Change AI bubble colormessage.uid === 0|| message.uid.toString() === agentUID ? 'bg-purple-100 text-purple-900'// Changed from gray: 'bg-blue-500 text-white',
Adjust the positioning (fixed, absolute), size (w-96), background (bg-white), shadows (shadow-lg), etc., of the main chat container (#chatbox div and its children).
Modify the toggleChatExpanded function and the associated conditional classes (isChatExpanded && 'expanded') to change how the chat window resizes or behaves when expanded. You might want it to take up more screen space or dock differently.
For the curious, here’s a slightly more detailed look at how the MessageEngine processes a single stream-message event from the Agora RTC data channel:
This shows the internal steps: receiving raw data, decoding it, updating or creating messages in an internal queue/store, and finally triggering your callback function to update the React component state.
/* eslint-disable react-hooks/exhaustive-deps */'use client'; import{ useState, useEffect, useRef, useCallback } from'react'; import{ Button } from'@/components/ui/button'; import{ MessageCircle, X, ChevronsUpDown, // Changed iconArrowDownToLine, // Changed iconExpand, // Added icon for expandShrink, // Added icon for shrink} from'lucide-react'; import{ cn } from'@/lib/utils'; import{ IMessageListItem, EMessageStatus } from'@/lib/message'; // Assuming types are hereinterface ConvoTextStreamProps { messageList: IMessageListItem[]; currentInProgressMessage?: IMessageListItem | null; agentUID: string | number | undefined; // Allow number or string} exportdefaultfunctionConvoTextStream({ messageList, currentInProgressMessage = null, agentUID, }: ConvoTextStreamProps) { const[isOpen, setIsOpen] = useState(false); const[shouldAutoScroll, setShouldAutoScroll] = useState(true); constscrollRef = useRef( null); constprevMessageLengthRef = useRef(messageList.length); constprevMessageTextRef = useRef(''); const[isChatExpanded, setIsChatExpanded] = useState(false); consthasSeenFirstMessageRef = useRef(false); constsignificantChangeScrollTimer = useRefnull>(null); // --- Scrolling Logic --- constscrollToBottom = useCallback(() =>{ if(scrollRef.current) { scrollRef.current.scrollTo({ top: scrollRef.current.scrollHeight, behavior: 'smooth', }); } }, []); consthandleScroll = useCallback(() =>{ if(!scrollRef.current) return; const{ scrollHeight, scrollTop, clientHeight } = scrollRef.current; constisNearBottom = scrollHeight - scrollTop - clientHeight < 150; // Increased threshold slightlyif(isNearBottom !== shouldAutoScroll) { setShouldAutoScroll(isNearBottom); } }, [shouldAutoScroll]); consthasContentChangedSignificantly = useCallback( (threshold = 20): boolean=>{ if(!currentInProgressMessage) returnfalse; constcurrentText = currentInProgressMessage.text || ''; // Only compare if the message is actually in progressconstbaseText = currentInProgressMessage.status === EMessageStatus.IN_PROGRESS ? prevMessageTextRef.current : currentText; consttextLengthDiff = currentText.length - baseText.length; consthasSignificantChange = textLengthDiff >= threshold; // Update ref immediately if it's a significant change or message finished/interruptedif( hasSignificantChange || currentInProgressMessage.status !== EMessageStatus.IN_PROGRESS ) { prevMessageTextRef.current = currentText; } returnhasSignificantChange; }, [currentInProgressMessage] ); useEffect(() =>{ consthasNewCompleteMessage = messageList.length > prevMessageLengthRef.current; // Check significance *only* if we should be auto-scrollingconststreamingContentChanged = shouldAutoScroll && hasContentChangedSignificantly(); if(significantChangeScrollTimer.current) { clearTimeout(significantChangeScrollTimer.current); significantChangeScrollTimer.current = null; } if( (hasNewCompleteMessage || streamingContentChanged) && scrollRef.current ) { // Debounce scrolling slightlysignificantChangeScrollTimer.current = setTimeout(() =>{ scrollToBottom(); significantChangeScrollTimer.current = null; }, 50); } prevMessageLengthRef.current = messageList.length; return() =>{ if(significantChangeScrollTimer.current) { clearTimeout(significantChangeScrollTimer.current); } }; }, [ messageList, currentInProgressMessage?.text, shouldAutoScroll, scrollToBottom, hasContentChangedSignificantly, ]); // --- Component Logic --- constshouldShowStreamingMessage = useCallback((): boolean=>{ return( currentInProgressMessage !== null&& currentInProgressMessage.status === EMessageStatus.IN_PROGRESS && currentInProgressMessage.text.trim().length > 0); }, [currentInProgressMessage]); consttoggleChat = useCallback(() =>{ constnewState = !isOpen; setIsOpen(newState); if(newState) { hasSeenFirstMessageRef.current = true; // Mark as seen if manually opened} }, [isOpen]); consttoggleChatExpanded = useCallback(() =>{ setIsChatExpanded(!isChatExpanded); // Attempt to scroll to bottom after expanding/shrinkingsetTimeout(scrollToBottom, 50); }, [isChatExpanded, scrollToBottom]); // Auto-open logicuseEffect(() =>{ consthasAnyMessage = messageList.length > 0|| shouldShowStreamingMessage(); if(hasAnyMessage && !hasSeenFirstMessageRef.current && !isOpen) { setIsOpen(true); hasSeenFirstMessageRef.current = true; } }, [ messageList, currentInProgressMessage, isOpen, shouldShowStreamingMessage, ]); // Combine messages for renderingconstallMessagesToRender = [...messageList]; if(shouldShowStreamingMessage() && currentInProgressMessage) { allMessagesToRender.push(currentInProgressMessage); } // --- JSX ---return( // Use a more descriptive ID if needed, ensure z-index is appropriate<divid="agora-text-stream-chatbox"className="fixed bottom-24 right-4 md:right-8 z-50">{isOpen ? ( <divclassName={cn('bg-whiterounded-lgshadow-xlw-80md:w-96flexflex-coltext-blacktransition-allduration-300ease-in-out', // Adjustedwidthandaddedtransition// DynamicheightbasedonexpandedstateisChatExpanded? 'h-[60vh] max-h-[500px]' :'h-80' )} >{/* Header */} <divclassName="p-2 border-b flex justify-between items-center shrink-0 bg-gray-50 rounded-t-lg"><Buttonvariant="ghost"size="icon"onClick={toggleChatExpanded}aria-label={isChatExpanded? 'Shrinkchat' :'Expandchat'} >{isChatExpanded ? ( <ShrinkclassName="h-4 w-4"/>) : ( <ExpandclassName="h-4 w-4"/>)} Button><h3className="font-semibold text-sm md:text-base">Conversationh3><Buttonvariant="ghost"size="icon"onClick={toggleChat}aria-label="Close chat"><XclassName="h-4 w-4"/>Button>div>{/* Message Area */} <divclassName="flex-1 overflow-y-auto scroll-smooth"// Useoverflow-y-autoref={scrollRef}onScroll={handleScroll}><divclassName="p-3 md:p-4 space-y-3">{allMessagesToRender.map((message, index) => { const isAgent = message.uid === 0 || message.uid?.toString() === agentUID?.toString(); return ( <divkey={`${message.turn_id}-${message.uid}-${index}`} // UseindexaslastresortforkeypartclassName={cn('flexitems-startgap-2w-full', isAgent? 'justify-start' :'justify-end' )} >{/* Optional: Render avatar only for AI or based on settings */} {/* {isAgent && <Avatar.../>} */} {/* Message Bubble */} <divclassName={cn('max-w-[80%] rounded-xlpx-3py-2text-smmd:text-baseshadow-sm', // Slightlysoftercorners, shadowisAgent? 'bg-gray-100text-gray-800' :'bg-blue-500text-white' )} >{message.text} div>div>); })} {/* Add a small spacer at the bottom */} <divclassName="h-2">div>div>div>{/* Optional Footer Area (e.g., for input later) */} {/* <divclassName="p-2 border-t shrink-0">...div>*/} div>) : ( // Floating Action Button (FAB) to open chat <ButtononClick={toggleChat}className="rounded-full w-14 h-14 flex items-center justify-center bg-blue-600 hover:bg-blue-700 text-white shadow-lg hover:scale-105 transition-all duration-200"aria-label="Open chat"><MessageCircleclassName="h-6 w-6"/>Button>)} div>); }
This example component provides:
Remember to adapt the styling, icons (lucide-react used here), and specific UX behaviors to fit your application's needs.
Let’s quickly recap the data flow within the context of the entire application:
This cycle repeats rapidly as the conversation progresses, creating the real-time text streaming effect.
Now you’ve got the tools and understanding to add slick, real-time text streaming to your Agora conversational AI application! This isn’t just a cosmetic addition; it significantly boosts usability and accessibility.
Your next steps:
For deeper dives into specific Agora features related to this, check out the Official Documentation for “Live Subtitles”, which covers the underlying data channel mechanisms.
Happy building, and enjoy building more engaging and accessible conversational experiences!