05/08/2025 | News release | Archived content
Conversational AI is revolutionizing how people interact with artificial intelligence. Instead of carefully crafting text prompts, users can have natural, real-time voice conversations with AI agents. This opens exciting opportunities for more intuitive and efficient interactions.
Many developers have already invested significant time building custom LLM workflows for text-based agents. Agora's Conversational AI Engine allows you to connect these existing workflows to an Agora channel, enabling real-time voice conversations without abandoning your current AI infrastructure.
In this guide, I'll walk you through building a Go server that handles the connection between your users and Agora's Conversational AI. By the end, you'll have a production-ready backend that can power voice-based AI conversations for your applications.
Before getting started, make sure you have:
Let's start by setting up our Golang project with the necessary dependencies. First, create a new directory and initialize a Go module:
mkdir agora-convo-ai-go-server cd agora-convo-ai-go-server gomod init github.com/AgoraIO-Community/convo-ai-go-server
Next, we'll add the key dependencies for our server:
goget github.com/gin-gonic/gin goget github.com/joho/godotenv goget github.com/AgoraIO-Community/go-tokenbuilder
Create the initial directory structure, and as we go through the guide, we'll fill these directories with the files we need.
mkdir -p convoai token_service http_headers validation
touch .env
Your project directory should now have a structure like this:
├── convoai/
├── token_service/
├── http_headers/
├── validation/
├── .env
├── go.mod
├── go.sum
Start by setting up the main application file, which will be the entry point for our server. We'll then load the environment variables, set up the configuration, and initialize the router with the appropriate middleware and routes.
Create the main.go file:
touch main.go
package main import( "context" "log" "net/http" "os" "os/signal" "syscall" "time" "github.com/gin-gonic/gin" "github.com/joho/godotenv" ) func loadConfig() (*convoai.ConvoAIConfig, error) { config:= &convoai.ConvoAIConfig{ // Agora Configuration AppID: os.Getenv("AGORA_APP_ID"), AppCertificate: os.Getenv("AGORA_APP_CERTIFICATE"), CustomerID: os.Getenv("AGORA_CUSTOMER_ID"), CustomerSecret: os.Getenv("AGORA_CUSTOMER_SECRET"), BaseURL: os.Getenv("AGORA_CONVO_AI_BASE_URL"), AgentUID: os.Getenv("AGENT_UID"), // LLM Configuration LLMModel: os.Getenv("LLM_MODEL"), LLMURL: os.Getenv("LLM_URL"), LLMToken: os.Getenv("LLM_TOKEN"), // TTS Configuration TTSVendor: os.Getenv("TTS_VENDOR"), } // Microsoft TTS Configuration ifmsKey := os.Getenv("MICROSOFT_TTS_KEY"); msKey != ""{ config.MicrosoftTTS = &convoai.MicrosoftTTSConfig{ Key: msKey, Region: os.Getenv("MICROSOFT_TTS_REGION"), VoiceName: os.Getenv("MICROSOFT_TTS_VOICE_NAME"), Rate: os.Getenv("MICROSOFT_TTS_RATE"), Volume: os.Getenv("MICROSOFT_TTS_VOLUME"), } } // ElevenLabs TTS Configuration ifelKey := os.Getenv("ELEVENLABS_API_KEY"); elKey != ""{ config.ElevenLabsTTS = &convoai.ElevenLabsTTSConfig{ Key: elKey, VoiceID: os.Getenv("ELEVENLABS_VOICE_ID"), ModelID: os.Getenv("ELEVENLABS_MODEL_ID"), } } // Modalities Configuration config.InputModalities = os.Getenv("INPUT_MODALITIES") config.OutputModalities = os.Getenv("OUTPUT_MODALITIES") returnconfig, nil } func setupServer() *http.Server { log.Println("Starting setupServer") iferr := godotenv.Load(); err != nil { log.Println("Warning: Error loading .env file. Using existing environment variables.") } // Load configuration config, err:= loadConfig() iferr != nil { log.Fatal("Failed to load configuration:", err) } // TODO:Validate environment configuration // Server Configuration serverPort:= os.Getenv("PORT") ifserverPort == ""{ serverPort = "8080" } // CORS Configuration corsAllowOrigin:= os.Getenv("CORS_ALLOW_ORIGIN") // Set up router with headers router:= gin.Default() //TODO:Register headers // TODO:Initialize services & register routes // Register healthcheck route router.GET("/ping", Ping) // Configure and start the HTTP server server:= &http.Server{ Addr: ":"+ serverPort, Handler: router, } log.Println("Server setup completed") log.Println("- listening on port", serverPort) returnserver } func main(){ server:= setupServer() // Start the server in a separate goroutine to handle graceful shutdown. go func(){ iferr := server.ListenAndServe(); err != nil && err != http.ErrServerClosed { log.Fatalf("listen: %s\n", err) } }() // Prepare to handle graceful shutdown. quit:= make(chan os.Signal, 1) signal.Notify(quit, os.Interrupt, syscall.SIGTERM) // Wait for a shutdown signal. <-quit log.Println("Shutting down server...") // Attempt to gracefully shutdown the server with a timeout of 5 seconds. ctx, cancel:= context.WithTimeout(context.Background(), 5*time.Second) defer cancel() iferr := server.Shutdown(ctx); err != nil { log.Fatal("Server forced to shutdown:", err) } log.Println("Server exiting") } // Ping is a handler function that serves as a basic health check endpoint. func Ping(c *gin.Context){ c.JSON(200, gin.H{ "message": "pong", }) }Note: We are loading the PORT from the environment variables, it will default to 8080 if not set in your .env file.
Let's test our basic Go server by running:
go run main.go
You should see "Server setup completed" and "- listening on port 8080" in your console.
You can now visit http://localhost:8080/ping in your browser to verify the server is working, you should see {"message": "pong"} as the response.
To test the server using curl, run:
curl http://localhost:8080/ping
You should see the response: {"message": "pong"}
Next, let's define the types needed for our ConvoAI service. Create a file called convoai-types.go in the convoai directory.
touch convoai/convoai-types.go
Add the following types:
packageconvoai // InviteAgentRequest represents the request body for inviting an AI agent typeInviteAgentRequest struct{ RequesterID string `json:"requester_id"` ChannelName string `json:"channel_name"` RtcCodec *int `json:"rtc_codec,omitempty"` InputModalities []string `json:"input_modalities,omitempty"` OutputModalities []string `json:"output_modalities,omitempty"` } // RemoveAgentRequest represents the request body for removing an AI agent typeRemoveAgentRequest struct{ AgentID string `json:"agent_id"` } // TTSVendor represents the text-to-speech vendor type typeTTSVendor string const( TTSVendorMicrosoft TTSVendor = "microsoft" TTSVendorElevenLabs TTSVendor = "elevenlabs" ) // TTSConfig represents the text-to-speech configuration typeTTSConfig struct{ Vendor TTSVendor `json:"vendor"` Params interface{} `json:"params"` } // AgoraStartRequest represents the request to start a conversation typeAgoraStartRequest struct{ Name string `json:"name"` Properties Properties `json:"properties"` } // Properties represents the configuration properties for the conversation typeProperties struct{ Channel string `json:"channel"` Token string `json:"token"` AgentRtcUID string `json:"agent_rtc_uid"` RemoteRtcUIDs []string `json:"remote_rtc_uids"` EnableStringUID bool `json:"enable_string_uid"` IdleTimeout int `json:"idle_timeout"` ASR ASR `json:"asr"` LLM LLM `json:"llm"` TTS TTSConfig `json:"tts"` VAD VAD `json:"vad"` AdvancedFeatures Features `json:"advanced_features"` } // ASR represents the Automatic Speech Recognition configuration typeASR struct{ Language string `json:"language"` Task string `json:"task"` } // LLM represents the Language Learning Model configuration typeLLM struct{ URL string `json:"url"` APIKey string `json:"api_key"` SystemMessages []SystemMessage `json:"system_messages"` GreetingMessage string `json:"greeting_message"` FailureMessage string `json:"failure_message"` MaxHistory int `json:"max_history"` Params LLMParams `json:"params"` InputModalities []string `json:"input_modalities"` OutputModalities []string `json:"output_modalities"` } // SystemMessage represents a system message in the conversation typeSystemMessage struct{ Role string `json:"role"` Content string `json:"content"` } // LLMParams represents the parameters for the Language Learning Model typeLLMParams struct{ Model string `json:"model"` MaxTokens int `json:"max_tokens"` Temperature float64 `json:"temperature"` TopP float64 `json:"top_p"` } // VAD represents the Voice Activity Detection configuration typeVAD struct{ SilenceDurationMS int `json:"silence_duration_ms"` SpeechDurationMS int `json:"speech_duration_ms"` Threshold float64 `json:"threshold"` InterruptDurationMS int `json:"interrupt_duration_ms"` PrefixPaddingMS int `json:"prefix_padding_ms"` } // Features represents advanced features configuration typeFeatures struct{ EnableAIVAD bool `json:"enable_aivad"` EnableBHVS bool `json:"enable_bhvs"` } // InviteAgentResponse represents the response for an agent invitation typeInviteAgentResponse struct{ AgentID string `json:"agent_id"` CreateTS int64 `json:"create_ts"` Status string `json:"status"` } // RemoveAgentResponse represents the response for an agent removal typeRemoveAgentResponse struct{ Success bool `json:"success"` AgentID string `json:"agent_id"` } // ConvoAIConfig holds all configuration for the ConvoAI service typeConvoAIConfig struct{ // Agora Configuration AppID string AppCertificate string CustomerID string CustomerSecret string BaseURL string AgentUID string // LLM Configuration LLMModel string LLMURL string LLMToken string // TTS Configuration TTSVendor string MicrosoftTTS *MicrosoftTTSConfig ElevenLabsTTS *ElevenLabsTTSConfig // Modalities Configuration InputModalities string OutputModalities string } // MicrosoftTTSConfig holds Microsoft TTS specific configuration typeMicrosoftTTSConfig struct{ Key string `json:"key"` Region string `json:"region"` VoiceName string `json:"voice_name"` Rate string `json:"rate"` Volume string `json:"volume"` } // ElevenLabsTTSConfig holds ElevenLabs TTS specific configuration typeElevenLabsTTSConfig struct{ Key string `json:"key"` VoiceID string `json:"voice_id"` ModelID string `json:"model_id"` }
These new types give some insight into all the parts we'll be assembling in the next steps. We'll take the client request, and use it to configure the AgoraStartRequest and send it to Agora's Conversational AI Engine. Agora's Convo AI engine will add the agent to the conversation.
With our types defined, let's implement the agent routes for inviting and removing agents from conversations.
Create the convoai-service.go file:
touch convoai/convoai-service.go
Start with importing gin and the agora-token library, because we'll need to generate tokens for the agent. Then we'll register and set up the agent routes. These functions will validate the request before passing it to their respective handlers.
packageconvoai import( "net/http" "github.com/AgoraIO-Community/convo-ai-go-server/token_service" "github.com/gin-gonic/gin" ) // ConvoAIService handles AI conversation functionality typeConvoAIService struct{ config *ConvoAIConfig tokenService *token_service.TokenService } // NewConvoAIService creates a new ConvoAIService instance func NewConvoAIService(config *ConvoAIConfig, tokenService *token_service.TokenService)*ConvoAIService{ return&ConvoAIService{ config: config, tokenService: tokenService, } } // Register the ConvoAI service routes func (s *ConvoAIService) RegisterRoutes(router *gin.Engine){ agent := router.Group("/agent") agent.POST("/invite", s.InviteAgent) agent.POST("/remove", s.RemoveAgent) } // InviteAgent handles the agent invitation request func (s *ConvoAIService) InviteAgent(c *gin.Context){ varreq InviteAgentRequest iferr := c.ShouldBindJSON(&req); err != nil{ c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) return } // Validate the request iferr := s.validateInviteRequest(&req); err != nil{ c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) return } // Call the handler response, err := s.HandleInviteAgent(req) iferr != nil{ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()}) return } c.JSON(http.StatusOK, response) } // RemoveAgent handles the agent removal request func (s *ConvoAIService) RemoveAgent(c *gin.Context){ varreq RemoveAgentRequest iferr := c.ShouldBindJSON(&req); err != nil{ c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) return } // Validate the request iferr := s.validateRemoveRequest(&req); err != nil{ c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) return } // Call the handler response, err := s.HandleRemoveAgent(req) iferr != nil{ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()}) return } c.JSON(http.StatusOK, response) }
Next, we'll implement the invite handler, which needs to handle several key tasks:
Create the file convoai_handler_invite.go :
touch convoai/convoai_handler_invite.go
Add the following content:
packageconvoai import( "bytes" "encoding/json" "fmt" "io" "net/http" "time" "crypto/rand" "github.com/AgoraIO-Community/convo-ai-go-server/token_service" ) // HandleInviteAgent processes the agent invitation request func (s *ConvoAIService) HandleInviteAgent(req InviteAgentRequest) (*InviteAgentResponse, error){ // Generate token for the agent tokenReq := token_service.TokenRequest{ TokenType: "rtc", Channel: req.ChannelName, Uid: "0", RtcRole: "publisher", } token, err := s.tokenService.GenRtcToken(tokenReq) iferr != nil{ return nil, fmt.Errorf("failed to generate token: %v", err) } // Get TTS config based on vendor ttsConfig, err := s.getTTSConfig() iferr != nil{ return nil, fmt.Errorf("failed to get TTS config: %v", err) } // Set up system message for AI behavior systemMessage := SystemMessage{ Role: "system", Content: "You are a helpful assistant. Pretend that the text input is audio, and you are responding to it. Speak fast, clearly, and concisely.", } // Set default modalities if not provided inputModalities := req.InputModalities if len(inputModalities) == 0{ inputModalities = []string{"text"} } outputModalities := req.OutputModalities if len(outputModalities) == 0{ outputModalities = []string{"text", "audio"} } // Build the request body for Agora Conversation AI service agoraReq := AgoraStartRequest{ Name: fmt.Sprintf("agent-%d-%s", time.Now().UnixNano(), randomString(6)), Properties: Properties{ Channel: req.ChannelName, Token: token, AgentRtcUID: s.config.AgentUID, RemoteRtcUIDs: getRemoteRtcUIDs(req.RequesterID), EnableStringUID: isStringUID(req.RequesterID), IdleTimeout: 30, ASR: ASR{ Language: "en-US", Task: "conversation", }, LLM: LLM{ URL: s.config.LLMURL, APIKey: s.config.LLMToken, SystemMessages: []SystemMessage{systemMessage}, GreetingMessage: "Hello! How can I assist you today?", FailureMessage: "Please wait a moment.", MaxHistory: 10, Params: LLMParams{ Model: s.config.LLMModel, MaxTokens: 1024, Temperature: 0.7, TopP: 0.95, }, InputModalities: inputModalities, OutputModalities: outputModalities, }, TTS: *ttsConfig, VAD: VAD{ SilenceDurationMS: 480, SpeechDurationMS: 15000, Threshold: 0.5, InterruptDurationMS: 160, PrefixPaddingMS: 300, }, AdvancedFeatures: Features{ EnableAIVAD: false, EnableBHVS: false, }, }, } // Debug logging prettyJSON, _ := json.MarshalIndent(agoraReq, "", " ") fmt.Printf("Sending request to start agent: %s\n", string(prettyJSON)) // Convert request to JSON jsonData, err := json.Marshal(agoraReq) iferr != nil{ return nil, fmt.Errorf("failed to marshal request: %v", err) } // Create the HTTP request url := fmt.Sprintf("%s/%s/join", s.config.BaseURL, s.config.AppID) fmt.Printf("URL: %s\n", url) httpReq, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonData)) iferr != nil{ return nil, fmt.Errorf("failed to create request: %v", err) } // Add headers httpReq.Header.Set("Content-Type", "application/json") httpReq.Header.Set("Authorization", s.getBasicAuth()) // Send the request using a client with a timeout client := &http.Client{Timeout: 30* time.Second} resp, err := client.Do(httpReq) iferr != nil{ return nil, fmt.Errorf("failed to send request: %v (URL: %s)", err, url) } deferresp.Body.Close() // Handle response ifresp.StatusCode != http.StatusOK { body, _ := io.ReadAll(resp.Body) return nil, fmt.Errorf("failed to start conversation: status=%d, body=%s, url=%s, headers=%v", resp.StatusCode, string(body), url, httpReq.Header) } // Parse the response varagoraResp map[string]interface{} iferr := json.NewDecoder(resp.Body).Decode(&agoraResp); err != nil{ return nil, fmt.Errorf("failed to decode response: %v", err) } // Create the response response := &InviteAgentResponse{ AgentID: agoraResp["agent_id"].(string), CreateTS: time.Now().Unix(), Status: "RUNNING", } returnresponse, nil } // getRemoteRtcUIDs returns the appropriate RemoteRtcUIDs array based on the requesterID func getRemoteRtcUIDs(requesterID string)[]string{ return[]string{requesterID} } // Add this helper function func randomString(n int) string{ constletters = "abcdefghijklmnopqrstuvwxyz" b := make([]byte, n) rand.Read(b) fori := rangeb { b[i] = letters[int(b[i])%len(letters)] } return string(b) }
After the agent joins the conversation, we need a way to remove them from the conversation. This is where the remove handler comes in. It takes the agentID and sends a request to the Agora's Conversational AI Engine to remove the agent from the channel.
Create the file convoai_handler_remove.go :
touch convoai/convoai_handler_remove.go
Add the following:
packageconvoai import( "fmt" "net/http" "time" ) // HandleRemoveAgent processes the agent removal request func (s *ConvoAIService) HandleRemoveAgent(req RemoveAgentRequest) (*RemoveAgentResponse, error){ // Create the HTTP request url := fmt.Sprintf("%s/%s/agents/%s/leave", s.config.BaseURL, s.config.AppID, req.AgentID) httpReq, err := http.NewRequest("POST", url, nil) iferr != nil{ return nil, fmt.Errorf("failed to create request: %v", err) } // Add headers auth := s.getBasicAuth() httpReq.Header.Set("Content-Type", "application/json") httpReq.Header.Set("Authorization", auth) // Send the request using a client with a timeout client := &http.Client{Timeout: 10* time.Second} resp, err := client.Do(httpReq) iferr != nil{ return nil, fmt.Errorf("failed to send request: %v", err) } deferresp.Body.Close() ifresp.StatusCode != http.StatusOK { return nil, fmt.Errorf("failed to remove agent: %d", resp.StatusCode) } // Return success response response := &RemoveAgentResponse{ Success: true, AgentID: req.AgentID, } returnresponse, nil }
In both the invite and remove routes, we need to use BasicAuthorization in the headers of our requests, so we'll set up a utility function to handle this.
Another utility we need to build is the getTTSConfig. I need to call out, because normally you would have a single TTS config. For demo purposes, I've built it this way to show how to implement the configs for all TTS vendors supported by Agora's Convo AI Engine.
Create the file convoai-utils.go:
touch convoai/convoai-utils.go
Add the following content:
packageconvoai import( "encoding/base64" "errors" "fmt" "strconv" ) func (s *ConvoAIService) getBasicAuth() string{ auth := fmt.Sprintf("%s:%s", s.config.CustomerID, s.config.CustomerSecret) return "Basic "+ base64.StdEncoding.EncodeToString([]byte(auth)) } // Helper function to check if the string is purely numeric (false) or contains any non-digit characters (true) func isStringUID(s string) bool{ for_, r := ranges { ifr < '0'|| r > '9'{ return true // Contains non-digit character } } return false // Contains only digits } // getTTSConfig returns the appropriate TTS configuration based on the configured vendor func (s *ConvoAIService) getTTSConfig() (*TTSConfig, error){ switchs.config.TTSVendor { case string(TTSVendorMicrosoft): ifs.config.MicrosoftTTS == nil|| s.config.MicrosoftTTS.Key == ""|| s.config.MicrosoftTTS.Region == ""|| s.config.MicrosoftTTS.VoiceName == ""|| s.config.MicrosoftTTS.Rate == ""|| s.config.MicrosoftTTS.Volume == ""{ return nil, fmt.Errorf("missing Microsoft TTS configuration") } // Convert rate and volume from string to float64 rate, err := strconv.ParseFloat(s.config.MicrosoftTTS.Rate, 64) iferr != nil{ return nil, fmt.Errorf("invalid rate value: %v", err) } volume, err := strconv.ParseFloat(s.config.MicrosoftTTS.Volume, 64) iferr != nil{ return nil, fmt.Errorf("invalid volume value: %v", err) } return&TTSConfig{ Vendor: TTSVendorMicrosoft, Params: map[string]interface{}{ "key": s.config.MicrosoftTTS.Key, "region": s.config.MicrosoftTTS.Region, "voice_name": s.config.MicrosoftTTS.VoiceName, "rate": rate, "volume": volume, }, }, nil case string(TTSVendorElevenLabs): ifs.config.ElevenLabsTTS == nil|| s.config.ElevenLabsTTS.Key == ""|| s.config.ElevenLabsTTS.ModelID == ""|| s.config.ElevenLabsTTS.VoiceID == ""{ return nil, fmt.Errorf("missing ElevenLabs TTS configuration") } return&TTSConfig{ Vendor: TTSVendorElevenLabs, Params: map[string]interface{}{ "api_key": s.config.ElevenLabsTTS.Key, "model_id": s.config.ElevenLabsTTS.ModelID, "voice_id": s.config.ElevenLabsTTS.VoiceID, }, }, nil default: return nil, fmt.Errorf("unsupported TTS vendor: %s", s.config.TTSVendor) } } // validateInviteRequest validates the invite agent request func (s *ConvoAIService) validateInviteRequest(req *InviteAgentRequest) error{ ifreq.RequesterID == ""{ returnerrors.New("requester_id is required") } ifreq.ChannelName == ""{ returnerrors.New("channel_name is required") } // Validate channel_name length if len(req.ChannelName) < 3|| len(req.ChannelName) > 64{ returnerrors.New("channel_name length must be between 3 and 64 characters") } return nil } // validateRemoveRequest validates the remove agent request func (s *ConvoAIService) validateRemoveRequest(req *RemoveAgentRequest) error{ ifreq.AgentID == ""{ returnerrors.New("agent_id is required") } return nil }
To handle all header-related logic, create the httpHeaders.go file:
touch http_headers/httpHeaders.go
Add the following content:
package http_headers import( "net/http" "strings" "time" "github.com/gin-gonic/gin" ) // HttpHeaders holds configurations for handling requests, such as CORS settings. type HttpHeaders struct { AllowOrigin string // List of origins allowed to access the resources. } // NewHttpHeaders initializes and returns a new Middleware object with specified CORS settings. func NewHttpHeaders(allowOrigin string) *HttpHeaders { return&HttpHeaders{AllowOrigin: allowOrigin} } // NoCache sets HTTP headers to prevent client-side caching of responses. func (m *HttpHeaders) NoCache() gin.HandlerFunc { return func(c *gin.Context){ // Set multiple cache-related headers to ensure responses are not cached. c.Header("Cache-Control", "private, no-cache, no-store, must-revalidate") c.Header("Expires", "-1") c.Header("Pragma", "no-cache") } } // CORShttpHeaders adds CORS (Cross-Origin Resource Sharing) headers to responses and handles pre-flight requests. // It allows web applications at different domains to interact more securely. func (m *HttpHeaders) CORShttpHeaders() gin.HandlerFunc { return func(c *gin.Context){ origin:= c.Request.Header.Get("Origin") // Check if the origin of the request is allowed to access the resource. if!m.isOriginAllowed(origin) { // If not allowed, return a JSON error and abort the request. c.Header("Content-Type", "application/json") c.JSON(http.StatusForbidden, gin.H{ "error": "Origin not allowed", }) c.Abort() return } // Set CORS headers to allow requests from the specified origin. c.Header("Access-Control-Allow-Origin", origin) c.Header("Access-Control-Allow-Methods", "GET, POST, DELETE, PATCH, OPTIONS") c.Header("Access-Control-Allow-Headers", "Origin, Content-Type") // Handle pre-flight OPTIONS requests. ifc.Request.Method == "OPTIONS"{ c.AbortWithStatus(http.StatusNoContent) return } c.Next() } } // isOriginAllowed checks whether the provided origin is in the list of allowed origins. func (m *HttpHeaders) isOriginAllowed(origin string) bool { ifm.AllowOrigin == "*"{ // Allow any origin if the configured setting is "*". return true } allowedOrigins:= strings.Split(m.AllowOrigin, ",") for_, allowed:= range allowedOrigins { iforigin == allowed { return true } } return false } // Timestamp adds a timestamp header to responses. // This can be useful for debugging and logging purposes to track when a response was generated. func (m *HttpHeaders) Timestamp() gin.HandlerFunc { return func(c *gin.Context){ c.Next() // Proceed to the next middleware/handler. // Add the current timestamp to the response header after handling the request. timestamp:= time.Now().Format(time.RFC3339) c.Writer.Header().Set("X-Timestamp", timestamp) } }
Let's update our main main.go file to add our headers and register the convoai-service.
Open the cmd/main.go and add:
import( // Previous imports remain the same "github.com/AgoraIO-Community/convo-ai-go-server/convoai" "github.com/AgoraIO-Community/convo-ai-go-server/http_headers" ); // Previous code remains the same.. func setupServer()*http.Server{ // Previous code remains the same.. // Set up router with headers router := gin.Default() // Replace headers TODO: varhttpHeaders = http_headers.NewHttpHeaders(corsAllowOrigin) router.Use(httpHeaders.NoCache()) router.Use(httpHeaders.CORShttpHeaders()) router.Use(httpHeaders.Timestamp()) // Initialize services & register routes tokenService := token_service.NewTokenService(config.AppID, config.AppCertificate) tokenService.RegisterRoutes(router) convoAIService := convoai.NewConvoAIService(config, tokenService) convoAIService.RegisterRoutes(router) // Rest of the code remains the same...
By now, you've noticed that we added a token service that doesn't exist. Ignore the error for now because in the next step, we'll implement the token service, which will make it easier to test and integrate with front-end applications.
In the convoai-service we use a token service. While you could tie this to your auth service and have it generate the tokens. For this guide, we'll implement a token service for both the convoai-service and our client apps if needed.
Explaining this code is a bit outside the scope of this guide, but if you are new to tokens, I would recommend checking out my guide Building a Token Server for Agora Applications using Golang.
Create the token service and handler files:
touch token_service/token-service.go
touch token_service/token_handlers.go
First, add the token service definition in token-service.go:
package token_service import( "encoding/json" "net/http" "os" "github.com/gin-gonic/gin" ) // TokenService represents the main application token service. type TokenService struct { Server *http.Server // The HTTP server for the application Sigint chan os.Signal // Channel to handle OS signals, such as Ctrl+C appID string // The Agora app ID appCertificate string // The Agora app certificate } // TokenRequest is a struct representing the JSON payload structure for token generation requests. type TokenRequest struct { TokenType string `json:"tokenType"` // The token type: "rtc", "rtm", or "chat" Channel string `json:"channel,omitempty"` // The channel name (used for RTC and RTM tokens) RtcRole string `json:"role,omitempty"` // The role of the user for RTC tokens (publisher or subscriber) Uid string `json:"uid,omitempty"` // The user ID or account (used for RTC, RTM, and some chat tokens) ExpirationSeconds int `json:"expire,omitempty"` // The token expiration time in seconds (used for all token types) } // NewTokenService initializes and returns a TokenService pointer with all configurations set. func NewTokenService(appIDEnv string, appCertEnv string) *TokenService { return&TokenService{ appID: appIDEnv, appCertificate: appCertEnv, } } // RegisterRoutes registers the routes for the TokenService. func (s *TokenService) RegisterRoutes(r *gin.Engine){ api:= r.Group("/token") api.POST("/getNew", s.GetToken) } // GetToken handles the HTTP request to generate a token based on the provided TokenRequest. func (s *TokenService) GetToken(c *gin.Context){ varreq = c.Request varrespWriter = c.Writer vartokenReq TokenRequest // Parse the request body into a TokenRequest struct err:= json.NewDecoder(req.Body).Decode(&tokenReq) iferr != nil { http.Error(respWriter, err.Error(), http.StatusBadRequest) return } s.HandleGetToken(tokenReq, respWriter) }
Next, add the token handlers in token_handlers.go:
package token_service import( "encoding/json" "errors" "net/http" "strconv" "github.com/AgoraIO-Community/go-tokenbuilder/chatTokenBuilder" rtctokenbuilder2 "github.com/AgoraIO-Community/go-tokenbuilder/rtctokenbuilder" rtmtokenbuilder2 "github.com/AgoraIO-Community/go-tokenbuilder/rtmtokenbuilder" ) // HandleGetToken handles the HTTP request to generate a token based on the provided tokenType. func (s *TokenService) HandleGetToken(tokenReq TokenRequest, w http.ResponseWriter){ vartoken string vartokenErr error switchtokenReq.TokenType { case "rtc": token, tokenErr = s.GenRtcToken(tokenReq) case "rtm": token, tokenErr = s.GenRtmToken(tokenReq) case "chat": token, tokenErr = s.GenChatToken(tokenReq) default: http.Error(w, "Unsupported tokenType", http.StatusBadRequest) return } iftokenErr != nil { http.Error(w, tokenErr.Error(), http.StatusBadRequest) return } response:= struct { Token string `json:"token"` }{Token: token} w.Header().Set("Content-Type", "application/json") w.WriteHeader(http.StatusOK) json.NewEncoder(w).Encode(response) } // GenRtcToken generates an RTC token based on the provided TokenRequest and returns it. func (s *TokenService) GenRtcToken(tokenRequest TokenRequest) (string, error) { iftokenRequest.Channel == ""{ return "", errors.New("invalid: missing channel name") } iftokenRequest.Uid == ""{ return "", errors.New("invalid: missing user ID or account") } varuserRole rtctokenbuilder2.Role iftokenRequest.RtcRole == "publisher"{ userRole = rtctokenbuilder2.RolePublisher } else{ userRole = rtctokenbuilder2.RoleSubscriber } iftokenRequest.ExpirationSeconds == 0{ tokenRequest.ExpirationSeconds = 3600 } uid64, parseErr:= strconv.ParseUint(tokenRequest.Uid, 10, 64) ifparseErr != nil { returnrtctokenbuilder2.BuildTokenWithAccount( s.appID, s.appCertificate, tokenRequest.Channel, tokenRequest.Uid, userRole, uint32(tokenRequest.ExpirationSeconds), ) } returnrtctokenbuilder2.BuildTokenWithUid( s.appID, s.appCertificate, tokenRequest.Channel, uint32(uid64), userRole, uint32(tokenRequest.ExpirationSeconds), ) } // GenRtmToken generates an RTM (Real-Time Messaging) token based on the provided TokenRequest and returns it. func (s *TokenService) GenRtmToken(tokenRequest TokenRequest) (string, error) { iftokenRequest.Uid == ""{ return "", errors.New("invalid: missing user ID or account") } iftokenRequest.ExpirationSeconds == 0{ tokenRequest.ExpirationSeconds = 3600 } returnrtmtokenbuilder2.BuildToken( s.appID, s.appCertificate, tokenRequest.Uid, uint32(tokenRequest.ExpirationSeconds), tokenRequest.Channel, ) } // GenChatToken generates a chat token based on the provided TokenRequest and returns it. func (s *TokenService) GenChatToken(tokenRequest TokenRequest) (string, error) { iftokenRequest.ExpirationSeconds == 0{ tokenRequest.ExpirationSeconds = 3600 } varchatToken string vartokenErr error iftokenRequest.Uid == ""{ chatToken, tokenErr = chatTokenBuilder.BuildChatAppToken( s.appID, s.appCertificate, uint32(tokenRequest.ExpirationSeconds), ) } else{ chatToken, tokenErr = chatTokenBuilder.BuildChatUserToken( s.appID, s.appCertificate, tokenRequest.Uid, uint32(tokenRequest.ExpirationSeconds), ) } returnchatToken, tokenErr }
With the token generation in place, let's add some validation middleware to ensure our API is robust and secure.
Create a validation utility to check that all required environment variables are set. Create the file validation/validation.go :
touch validation/validation.go
Add the following content:
packagevalidation import( "errors" "strings" "github.com/AgoraIO-Community/convo-ai-go-server/convoai" ) // ValidateEnvironment checks if all required environment variables are set func ValidateEnvironment(config *convoai.ConvoAIConfig) error{ // Validate Agora Configuration ifconfig.AppID == ""|| config.AppCertificate == ""{ returnerrors.New("config error: Agora credentials (APP_ID, APP_CERTIFICATE) are not set") } ifconfig.CustomerID == ""|| config.CustomerSecret == ""|| config.BaseURL == ""{ returnerrors.New("config error: Agora Conversation AI credentials (CUSTOMER_ID, CUSTOMER_SECRET, BASE_URL) are not set") } // Validate LLM Configuration ifconfig.LLMURL == ""|| config.LLMToken == ""{ returnerrors.New("config error: LLM configuration (LLM_URL, LLM_TOKEN) is not set") } // Validate TTS Configuration ifconfig.TTSVendor == ""{ returnerrors.New("config error: TTS_VENDOR is not set") } iferr := validateTTSConfig(config); err != nil{ returnerr } // Validate Modalities (optional, using defaults if not set) ifconfig.InputModalities != ""&& !validateModalities(config.InputModalities) { returnerrors.New("config error: Invalid INPUT_MODALITIES format") } ifconfig.OutputModalities != ""&& !validateModalities(config.OutputModalities) { returnerrors.New("config error: Invalid OUTPUT_MODALITIES format") } return nil } // Validates the TTS configuration based on the vendor func validateTTSConfig(config *convoai.ConvoAIConfig) error{ switchconfig.TTSVendor { case "microsoft": ifconfig.MicrosoftTTS == nil{ returnerrors.New("config error: Microsoft TTS configuration is missing") } ifconfig.MicrosoftTTS.Key == ""|| config.MicrosoftTTS.Region == ""|| config.MicrosoftTTS.VoiceName == ""{ returnerrors.New("config error: Microsoft TTS configuration is incomplete") } case "elevenlabs": ifconfig.ElevenLabsTTS == nil{ returnerrors.New("config error: ElevenLabs TTS configuration is missing") } ifconfig.ElevenLabsTTS.Key == ""|| config.ElevenLabsTTS.VoiceID == ""|| config.ElevenLabsTTS.ModelID == ""{ returnerrors.New("config error: ElevenLabs TTS configuration is incomplete") } default: returnerrors.New("config error: Unsupported TTS vendor: "+ config.TTSVendor) } return nil } // Checks if the modalities string is properly formatted func validateModalities(modalities string) bool{ // map of valid modalities validModalities := map[string]bool{ "text": true, "audio": true, } // split the modalities string and check if each modality is valid for_, modality := rangestrings.Split(modalities, ",") { if!validModalities[strings.TrimSpace(modality)] { return false } } return true }
This validation utility ensures that all required environment variables are properly set before the server starts.
Open the main.go and update the setupServer function to use the validation utility:
// Just below: Load configuration // Replace the TODO:comment with the following: // Validate environment configuration iferr := validation.ValidateEnvironment(config); err != nil { log.Fatal("FATAL ERROR: ", err) } // Rest of the code remains the same...
Now that we have all the components in place, let's run the server. First, make sure you have set up the .env file with all the necessary credentials. The server will automatically load these environment variables at startup.
Build and run the server:
go build -o server
./server
If you've set up everything correctly, you should see the server starting up and listening on the configured port (default is 8080).
Before testing the endpoints, make sure you have a client-side app running. You can use any application that implements Agora's video SDK (web, mobile, or desktop). If you don't have an app, you can use Agora's Voice Demo, just make sure to make a token request before joining the channel.
Let's test our API endpoints using curl:
curl -X POST http://localhost:8080/token/getNew \ -H "Content-Type: application/json"\ -d '{ "tokenType": "rtc", "channel": "test-channel", "uid": "1234", "role": "publisher" }'
Expected response:
{ "token": "007eJxTYBAxNdgrlvnEfm3o..." }
curl -X POST http://localhost:8080/agent/invite \ -H "Content-Type: application/json"\ -d '{ "requester_id": "1234", "channel_name": "test-channel", "input_modalities": ["text"], "output_modalities": ["text", "audio"] }'
Expected response:
{ "agent_id": "agent-123abc", "create_ts": 1665481725000, "status": "RUNNING" }
curl -X POST http://localhost:8080/agent/remove \ -H "Content-Type: application/json"\ -d '{ "agent_id": "agent-123abc" }'
Expected response:
{ "success": true, "agent_id": "agent-123abc" }
Agora Conversational AI Engine supports several customizations.
In the convoai_handler_invite.go file, you can modify the system message to customize the agent's behavior:
systemMessage := SystemMessage{ Role: "system", Content: "You are a technical support specialist named Alex. Your responses should be friendly but concise, focused on helping users solve their technical problems. Use simple language but don't oversimplify technical concepts.", }
You can also update the greeting message to control the initial message the agent speaks when joining the channel:
LLM: LLM{ // ... other configurations GreetingMessage: "Hello! I'm Alex, your technical support specialist. How can I assist you today?", FailureMessage: "I'm processing your request. Please give me a moment.", // ... rest of the configuration }
Choose the right voice for your application by exploring the voice libraries:
Update the .env file with the appropriate voice settings.
Adjust VAD settings in convoai_handler_invite.go to optimize conversation flow:
VAD: VAD{ SilenceDurationMS: 600, // How long to wait after silence to end turn SpeechDurationMS: 10000, // Maximum duration for a single speech segment Threshold: 0.6, // Speech detection sensitivity InterruptDurationMS: 200, // How quickly interruptions are detected PrefixPaddingMS: 400, // Audio padding at the beginning of speech },
Here's a complete list of environment variables for your .env file:
# Server Configuration PORT=8080 CORS_ALLOW_ORIGIN=* # Agora Configuration AGORA_APP_ID=your_app_id AGORA_APP_CERTIFICATE=your_app_certificate AGORA_CONVO_AI_BASE_URL=https://api.agora.io/api/conversational-ai-agent/v2/projects AGORA_CUSTOMER_ID=your_customer_id AGORA_CUSTOMER_SECRET=your_customer_secret AGENT_UID=Agent # LLM Configuration LLM_URL=https://api.openai.com/v1/chat/completions LLM_TOKEN=your_openai_api_key LLM_MODEL=gpt-4o-mini # Input/Output Modalities INPUT_MODALITIES=text OUTPUT_MODALITIES=text,audio # TTS Configuration TTS_VENDOR=microsoft # or elevenlabs # Microsoft TTS Configuration MICROSOFT_TTS_KEY=your_microsoft_tts_key MICROSOFT_TTS_REGION=your_microsoft_tts_region MICROSOFT_TTS_VOICE_NAME=en-US-GuyNeural MICROSOFT_TTS_RATE=1.0 MICROSOFT_TTS_VOLUME=100.0 # ElevenLabs TTS Configuration ELEVENLABS_API_KEY=your_elevenlabs_api_key ELEVENLABS_VOICE_ID=your_elevenlabs_voice_id ELEVENLABS_MODEL_ID=eleven_monolingual_v1
Congratulations! You've built a Go server that integrates with Agora's Conversational AI Engine. Take this microservice and integrate it with your existing Agora backends.
For more information about Agora's Conversational AI Engine check out the official documentation.
For the full source code, check out the GitHub repository.
Happy building!