Amazon Amazon Polly AWS BLOCKS Chat ChatEngine Cognitive Services Functions JavaScript PubNub Functions Tech Text-to-Speech Tutorials and Demos

Build a Text-to-Speech Chat App with Amazon Polly and ChatEngine

Build a Text-to-Speech Chat App with Amazon Polly and ChatEngine

Accessibility for pc purposes is very often an ignored consideration throughout improvement. A consumer interface that works for one individual may be utterly constrained for a disabled individual. Therefore, as designers and builders, the onus is on us to empathize with one and all in order that we will usher in inclusivity in all our creations.

On this weblog submit, we take a dig into this very facet of easy methods to make apps extra accessible. Since chat-based purposes proceed to develop in adoption, why not deliver within the accessibility angle to it? Right here at PubNub, we have already got an extensible chat framework referred to as ChatEngine. Let’s see how we will shortly allow text-to-speech capabilities for chat apps constructed with ChatEngine framework, to help a blind consumer.

Know-how That Assists Blind Customers

Resulting from visible impairment, a blind consumer is severely constrained in studying the incoming chat messages. To help this consumer, the obvious answer is to make use of a text-to-speech engine that synthesizes speech from the incoming chat messages.

The know-how behind text-to-speech is already obtainable. However as an alternative of regionally processing the textual content, it’s higher to leverage the cloud in order that the system can constantly study and enhance its speech rendering capabilities. Amazon Polly is one such service provided beneath the AWS Machine Studying umbrella.

One among Amazon Polly’s most beneficial capabilities is the power to stream pure speech, which is a massive plus for a realtime software like chat. We’re going to leverage this function to construct a speech-enabled chat shopper on prime of PubNub ChatEngine.

Introducing ChatEngine

In case you are acquainted with PubNub, you in all probability understand how straightforward it’s to launch your personal chat app based mostly on the ChatEngine. Simply comply with the ChatEngine Quickstart Tutorial and you’ll get a default chat app with all of the supply code.

Speech-enabled Chat App

To allow speech prompts for incoming chat messages, we will consider an icon that prompts this function. One thing like this.

Discover the icon on the prime proper space of the chat UI. With this icon, we will allow or disable this function with a easy click on. Right here is how you’d expertise this app now.

Observe: The click of an icon can also be a constraint for a blind consumer. As an enhancement, the UX designers can consider a keyboard shortcut or another means of creating it extra accessible.

Tutorial Belongings and Know-how Overview

The supply code of this speech-enabled chat app is out there on GitHub. Nevertheless, a lot of the code is taken from the default chat app’s supply code that’s offered by ChatEngine framework. With a few modifications, you’ll be able to simply construct the speech synthesis function on prime of the default chat app.

You’ll be able to clone the repository and comply with alongside to get a sense of the code modifications which might be required.

Let’s take a look at the constructing blocks of this app by exploring the varied know-how elements that work behind the scenes to ship a practical chat expertise. The README file accompanying the repository incorporates the steps for establishing all of the elements.

PubNub Features

ChatEngine leverages the PubNub’s serverless infrastructure referred to as Features, to spawn off the backend for this chat app. And since it’s serverless, it may be delivered to life inside a few seconds.

Whenever you create a chat app via the ChatEngine framework, the magic of PubNub perform deploys the backend for you which of them is immediately out there and helps all the usual options required by a chat room software. It’s utterly hidden from the consumer and no specific setup or coding is required for this.

ChatEngine Frontend

The default chat shopper code is already out there from the ChatEngine fast begin information, and we will use that as a base for the modified, speech-enabled chat app. Nevertheless, we’d like a few modifications to make this occur.

Chat UI

At first, there’s a small change within the header portion of chat UI to accommodate the speech activation button icon.

<div class=”chat-header clearfix”>
<img src=”” alt=”avatar” />
<div class=”chat-about”>
<div class=”chat-with”>ChatEngine Demo Chat</div>
<div id=”speechButton” class=”speech-button”><img src=”/speech-icon.png”></div>

And right here is the CSS class to type this icon.


float: proper;
margin-top: 6px;
background-color: #d4e2dd;

Speech Activation for Chat App

HTML5 <audio> and <video> tags are the usual methods of embedding media controls on net apps. Because the app have to be able to enjoying the speech equal of chat messages, we’ve got used the audio tag.

<audio id=”player”>


Now, let’s transfer to the JavaScript a part of the chat app. We first have to examine for the browser compatibility for supported audio media codecs.

The default ChatEngine initialization is now subjected to a different pre-condition that initializes the audio help for the browser. And eventually, after the ChatEngine initialization, we will hook within the click on occasion for the icon to activate/deactivate the speech function.

‘ogg_vorbis’: ‘audio/ogg’,
‘mp3’: ‘audio/mpeg’,
‘pcm’: ‘audio/wave; codecs=1’

var supportedFormats;
var participant;
var speechEnabled = false;

// that is our fundamental perform that begins our chat app
const init = () =>

//First issues first, examine for the the browser’s audio capabilities
participant = doc.getElementById(‘participant’);
supportedFormats = getSupportedAudioFormats(participant);

if (supportedFormats.size === zero)
submit.disabled = true;
alert(‘The online browser in use doesn’t help any of the’ +
‘ out there audio codecs. Please attempt with a totally different’ +
‘ one.’);

// hook up with ChatEngine with our generated consumer
ChatEngine.join(newPerson.uuid, newPerson);

// when ChatEngine is booted, it returns your new Consumer as ``
ChatEngine.on(‘$.prepared’, perform(knowledge)

// retailer my new consumer as `me`
me =;

// create a new ChatEngine Chat
myChat = new ChatEngine.Chat(‘chatengine-demo-chat’);

// once we recieve messages on this chat, render them
myChat.on(‘message’, (message) =>

// when a consumer comes on-line, render them within the on-line listing
myChat.on(‘$.on-line.*’, (knowledge) =>
$(‘#people-list ul’).append(peopleTemplate(knowledge.consumer));

// when a consumer goes offline, take away them from the web listing
myChat.on(‘$.offline.*’, (knowledge) =>
$(‘#people-list ul’).discover(‘#’ + knowledge.consumer.uuid).take away();

// watch for our chat to be related to the web
myChat.on(‘$.related’, () =>

// seek for 50 previous `message` occasions
occasion: ‘message’,
restrict: 50
).on(‘message’, (knowledge) =>


// when messages are returned, render them like regular messages
renderMessage(knowledge, true);



// bind our “send” button and return key to ship message
$(‘#sendMessage’).on(‘submit’, sendMessage)


$(“#speechButton”).click on(perform()

speechEnabled = false;

speechEnabled = true;



Speech Synthesis for Chat App

The HTML5 audio tag has the power to stream audio from a URL that returns a chunked HTTP response containing media content material varieties.

Earlier than rendering each chat message, the app will verify for the speechEnabled flag. Whether it is enabled then it should make a request to the streaming server and play the speech acquired in response. Right here is how the default renderMessage( ) perform of the chat app seems to be like after speech enablement.

// render messages within the record
const renderMessage = (message, isHistory = false) =>

// use the generic consumer template by default
let template = userTemplate;

// if I occurred to ship the message, use the particular template for myself
if (message.sender.uuid == me.uuid)
template = meTemplate;

let el = template(
messageOutput: message.knowledge.textual content,
time: getCurrentTime(),
consumer: message.sender.state


// render the message
$(‘.chat-history ul’).prepend(el);

if(speechEnabled && message.sender.uuid != me.uuid)

participant.src = “” +
encodeURIComponent(“Aditi”) +
‘&textual content=’ + encodeURIComponent(message.knowledge.textual content) +
‘&outputFormat=’ + supportedFormats[0];;

$(‘.chat-history ul’).append(el);

// scroll to the underside of the chat


Streaming Server for Check to Speech Conversion

As talked about earlier, we’re utilizing Amazon Polly to allow text-to-speech conversion. You’ll be able to check with the pattern python server that demonstrates how the Polly service is known as and the binary audio stream is returned to the chat shopper.

The server code is derived from this pattern server software. Right here is a fast code walkthrough of the primary functionalities of this server app.

App Internet hosting

The server hosts the chat app and URL routes are outlined for all of the assets utilized by this app.

PROTOCOL = “http”
ROUTE_INDEX = “/index.html”
ROUTE_VOICES = “/voices”
ROUTE_READ = “/read”
ROUTE_CSS = “/chat.css”
ROUTE_JS = “/chat.js”
ROUTE_IMG = “/speech-icon.png”

def do_GET(self):
“””Handles GET requests”””

# Extract values from the question string
path, _, query_string = self.path.partition(‘?’)
question = parse_qs(query_string)

response = None

print(u”[START]: Received GET for %s with query: %s” % (path, question))

# Deal with the potential request paths
if path == ROUTE_INDEX or path == ROUTE_CSS or path == ROUTE_JS or path == ROUTE_IMG:
response = self.route_index(path, question)
elif path == ROUTE_VOICES:
response = self.route_voices(path, question)
elif path == ROUTE_READ:
response = self.route_read(path, question)
response = self.route_not_found(path, question)

self.send_headers(response.standing, response.content_type)

besides HTTPStatusError as err:
# Reply with an error and log debug
# info
if sys.version_info >= (three, zero):
self.send_error(err.code, err.message, err.clarify)
self.send_error(err.code, err.message)

self.log_error(u”%s %s %s – [%d] %s”, self.client_address[0],
self.command, self.path, err.code, err.clarify)


Amazon Polly Initialization

When a chat shopper is served, the server additionally initializes the boto3 AWS python library by way of which we will entry the Amazon Polly service.

polly = boto3.Session(
aws_access_key_id=”<AWS USER ACCESS KEY>”,
aws_secret_access_key=”<AWS USER SECRET KEY>”,

There are a few issues that have to occur behind the scenes to entry the Poly service by way of the <AWS USER ACCESS KEY> & <AWS USER SECRET KEY> parameters. Confer with this README file part for establishing your AWS account and the conditions for accessing the Polly service.

Streaming Request

That is the place the actual magic occurs. The chat app invokes a particular URL endpoint, “/read”, to request for textual content to speech conversion. That is a HTTP GET name and the textual content to be transformed is provided as a parameter. That’s the place the Amazon Polly kicks in and returns the binary stream of the audio containing the synthesized speech.

def route_read(self, path, question):
“””Handles routing for studying textual content (speech synthesis)”””
# Get the parameters from the question string
textual content = self.query_get(question, “text”)
voiceId = self.query_get(question, “voiceId”)
outputFormat = self.query_get(question, “outputFormat”)

# Validate the parameters, set error flag in case of sudden
# values
if len(textual content) == zero or len(voiceId) == zero or
outputFormat not in AUDIO_FORMATS:
increase HTTPStatusError(HTTP_STATUS[“BAD_REQUEST”],
“Wrong parameters”)
# Request speech synthesis
response = polly.synthesize_speech(Textual content=textual content,
besides (BotoCoreError, ClientError) as err:
# The service returned an error

return ResponseData(standing=HTTP_STATUS[“OK”],
# Entry the audio stream within the response

Each time the chat app sends a request to the streaming server, this code is executed and a synthesized speech is generated on the fly and streamed again to the app. As superior as it might appear, that is all we’d like, to allow speech synthesis capabilities to this chat app.

Speech-enablement Past Accessibility

Even past accessibility, this function also can assist in many use instances, that require voice prompts for background apps and particular occasions.

Othe use instances for the Amazon Polly Block that invokes Amazon Polly service could possibly be used to generate speech samples for particular textual content. This may be preferrred for purposes that require the era of voice instructions from a pre-defined listing of textual content messages.

!perform (f, b, e, v, n, t, s) if (f.fbq)return;n = f.fbq = perform () n.callMethod ?
n.callMethod.apply(n, arguments) : n.queue.push(arguments)
;if (!f._fbq)f._fbq = n;n.push = n;n.loaded = !zero;n.model = ‘’;n.queue = [];t = b.createElement(e);t.async = !zero;t.src = v;s = b.getElementsByTagName(e)[0];s.parentNode.insertBefore(t, s)
(window,doc, ‘script’, ‘’);
fbq(‘init’, ‘1736515043250868’); // Insert your pixel ID right here.
fbq(‘monitor’, ‘PageView’);