Create a React Teleprompter using the Web SpeechRecognition API
June 22, 2020
In this post, we are going to build a teleprompter web application using the Web Speech API
. In particular, we'll use the SpeechRecognition
interface to build this app. The idea is that we'll be able to recognize the user's voice, match the words to a predefined script, and then automatically scroll to the next unspoken position.
NOTE: The above 🆓 video goes through all the content of this blog post step by step. You can find the code for this project on GitHub and you can play with it on CodeSandbox
Preview of Teleprompter
The following is a short animated GIF showing the result of what we will be building using native web technologies and JavaScript libraries.
Basic Application
The Teleprompter
component that we'll start with is just a shell of what we will be building. To begin with, we are looping over the words passed to the component and displaying them in <span>
elements, but eventually, we will want to wire up the SpeechRecognition
API and auto-scroll the contents as the user speaks.
import React from 'react'
import styled from 'styled-components'
const StyledTeleprompter = styled.div`
/* ... */
`
export default function Teleprompter({ words, progress, listening, onChange }) {
return (
<React.Fragment>
<StyledTeleprompter>
{words.map((word, i) => (
<span key={`${word}:${i}`}>{word} </span>
))}
</StyledTeleprompter>
</React.Fragment>
)
}
SpeechRecognition
Instance
Creating a The first thing we'll do is create an instance of SpeechRecognition
. To do this, we'll create a recog
reference using React.useRef
hook, passing null
.
Next, we'll use a React.useEffect
that we'll execute after the initial render of the component. Here, we will either reference the real window.SpeechRecognition
constructor or the vendor-prefixed window.webkitSpeechRecognition
version. Depending on which one exists, we'll create a new instance of it and assign it to the current
property of our recog
reference.
Setting continuous
mode to true
allows us to continuously capture results once we have started, instead of just getting one result.
Also, we'll want to set interimResults
to true as well. This will let us get access to quicker results. However, these results aren't final and may not be as accurate compared to waiting a bit longer.
const recog = React.useRef(null)
React.useEffect(() => {
const SpeechRecognition =
window.SpeechRecognition || window.webkitSpeechRecognition
recog.current = new SpeechRecognition()
recog.current.continuous = true
recog.current.interimResults = true
}, [])
SpeechRecognition
Browser Compatibility
Before we get too far along, it's important to know that the SpeechRecognition
feature that we are going to use only has minimal browser support at the moment.
SpeechRecognition
to Start and Stop
Toggling Now we'll add another React.useEffect
hook so that we can toggle whether or not our Speech Recognition system should start or stop listening. So, if we are listening
then will tell our recog
ref to start
, otherwise, we will stop
the recognition instance.
React.useEffect(() => {
if (listening) {
recog.current.start()
} else {
recog.current.stop()
}
}, [listening])
Adding and Removing Event Listeners
At this point, starting or stopping does nothing yet, so let's wire that up. We'll have another React.useEffect
hook, and this one will grab our recog
reference and add an event listener, listen to the result
event, and handle that with the handleResult
callback, which we haven't defined yet, but we will very soon.
Also, we'll want to clean up after ourselves so we'll return a function that will removeEventListener
for the result
event bound to the handleResult
function.
React.useEffect(() => {
const handleResult = () => {
/* ... more code later ... */
}
recog.current.addEventListener('result', handleResult)
return () => {
recog.current.removeEventListener('result', handleResult)
}
}, [onChange, progress, words])
Handling the Recognition Results
Now, let's define the handleResult
function that we've wired up to the recog
reference. Here we will grab out the results
portion of the argument passed to us. We'll create an interim
variable and take the SpeechRecognitionResultList
returned and convert it into an array using Array.from()
, limit the results to grab only those that are not final using Array.prototype.filter
, grab the first transcript from each of those using Array.prototype.map
, and finally join all of those together into one big string using Array.prototype.join
.
To leverage the results, let's create some new state using the React.useState
hook and save off the results calling setResults
.
+ const [results, setResults] = React.useState('');
React.useEffect(() => {
+ const handleResult = ({ results }) => {
+ const interim = Array.from(results)
+ .filter((r) => !r.isFinal)
+ .map((r) => r[0].transcript)
+ .join(' ');
+
+ setResults(interim);
/* ... more code later ... */
};
recog.current.addEventListener('result', handleResult);
return () => {
recog.current.removeEventListener('result', handleResult);
};
}, [onChange, progress, words]);
To see the interim results
that we've captured in state, let's conditionally show them as a sibling to <StyledTeleprompter>
if they exist.
return (
<React.Fragment>
<StyledTeleprompter ref={scrollRef}>
{words.map((word, i) => (
<span key={`${word}:${i}`}>{word} </span>
))}
</StyledTeleprompter>
+ {results && <Interim>{results}</Interim>}
</React.Fragment>
);
Scrolling based on Progress Value
Now, let's focus on scrolling. We'll first create a scrollRef
variable using the React.useRef
hook, set it to null
, and then assign it to our <StyledTeleprompter>
component. To our <span>
we'll add an HTML5 data attribute of data-index
and assign it to the index of the word.
This technically isn't necessary for the scrolling, but let's add a color style to indicate if the word has already been spoken or not. If the word index is less than the progress (meaning it has already been said), then it'll look gray, otherwise, it'll look black.
+ const scrollRef = React.useRef(null);
return (
<React.Fragment>
- <StyledTeleprompter>
+ <StyledTeleprompter ref={scrollRef}>
{words.map((word, i) => (
<span
key={`${word}:${i}`}
+ data-index={i}
+ style={{
+ color: i < progress ? '#ccc' : '#000',
+ }}
>
{word}{' '}
</span>
))}
</StyledTeleprompter>
{results && <Interim>{results}</Interim>}
</React.Fragment>
);
To scroll the teleprompter, we'll add another React.useEffect
hook. We will want this hook to be invoked when the progress
prop has changed, so we'll set that in the 2nd parameter. Next, grab the scrollRef
's current value and querySelector
the data-index
that is 3 words past where the current progress is currently set to. We are targeting a few words ahead to hopefully scroll before we run out of words that are in view.
At this point we are using the optional chaining operator in case nothing was found, but if it was, we'll use the Element.scrollIntoView()
method, passing behavior smooth, block nearest, and inline start.
React.useEffect(() => {
scrollRef.current
.querySelector(`[data-index='${progress + 3}']`)
?.scrollIntoView({
behavior: 'smooth',
block: 'nearest',
inline: 'start',
})
}, [progress])
Element.scrollIntoView
Browser Compatibility
Support for Element.scrollIntoView()
is surprisingly good, which is great because it's such a handy feature.
Updating to the Next Progress Value
The trickiest part of the app is trying to figure out where the current progress should be. To do this, we'll introduce a newIndex
variable, break up the interim
string back into an array, and compare each word with the next expected unspoken word in the teleprompter script.
To make comparison easier we'll use two techniques. One is we'll create a cleanWord
function to trim whitespace, lowercase the string, and replace any non-alpha characters with an empty string and next, we'll leverage the string-similarity
library from npm.
import stringSimilarity from 'string-similarity'
const cleanWord = (word) =>
word
.trim()
.toLocaleLowerCase()
.replace(/[^a-z]/gi, '')
If the similarity between our words is greater than 75% then we'll increment our index by one otherwise we'll keep it the same. Then if our newIndex
is greater than it was previously and is less than the total number of words, we'll let our consuming component know that something has changed.
React.useEffect(() => {
const handleResult = ({ results }) => {
const interim = Array.from(results)
.filter((r) => !r.isFinal)
.map((r) => r[0].transcript)
.join(' ');
setResults(interim);
+ const newIndex = interim.split(' ').reduce((memo, word) => {
+ if (memo >= words.length) {
+ return memo;
+ }
+ const similarity = stringSimilarity.compareTwoStrings(
+ cleanWord(word),
+ cleanWord(words[memo]),
+ );
+ memo += similarity > 0.75 ? 1 : 0;
+ return memo;
+ }, progress);
+ if (newIndex > progress && newIndex <= words.length) {
+ onChange(newIndex);
+ }
};
recog.current.addEventListener('result', handleResult);
return () => {
recog.current.removeEventListener('result', handleResult);
};
}, [onChange, progress, words]);
Conclusion
It's pretty amazing some of the features that are available in many browsers. Although SpeechRecognition
isn't available in all browser, it's a pretty powerful feature and fun to play with. I hope you enjoyed seeing it in action and learned about fun and unique ways to leverage the feature.
NOTE: This is a beginning of an egghead playlist that I plan to grow with additional refactors and new features.
Tweet about this post and have it show up here!