Create a React Teleprompter using the Web SpeechRecognition API
4 min read
The above video is hosted on egghead.io.

In this post we are going to build a teleprompter web application using the Web Speech API. In particular, we'll use the SpeechRecognition interface to build this app. The idea is that we'll be able to recognize the user's voice, match the words to a predefined script, and then automatically scroll to the next unspoken position.

NOTE: The above 🆓 video goes through all the content of this blog post step by step. You can find the code for this project in GitHub and you can play with it on CodeSandbox

Preview of Teleprompter

The following is a short animated GIF showing what the end result of what we will be building using native web technologies and JavaScript libraries.

Basic Application

The Teleprompter component that we'll start with is just a shell of what we will be building. To begin with, we are looping over the words passed to the component and displaying them in <span> elements, but eventually we will want to wire up the SpeechRecognition API and auto-scroll the contents as the user speaks.

import React from 'react';
import styled from 'styled-components';

const StyledTeleprompter = styled.div`
  /* ... */
`;

export default function Teleprompter({
  words,
  progress,
  listening,
  onChange,
}) {
  return (
    <React.Fragment>
      <StyledTeleprompter>
        {words.map((word, i) => (
          <span key={`${word}:${i}`}>{word} </span>
        ))}
      </StyledTeleprompter>
    </React.Fragment>
  );
}

Creating a SpeechRecognition Instance

The first thing we'll do, is create an instance of SpeechRecognition. To do this, we'll create a recog reference using React.useRef hook, passing null.

Next we'll use a React.useEffect that we'll execute after the initial render of the component. In here, we will either reference the real window.SpeechRecognition constructor or the vendor prefixed window.webkitSpeechRecognition version. Depending on which one exists, we'll create a new instance of it and assign it to the current property of our recog reference.

Setting continuous mode to true allows us to continuously capture results once we have started, instead of just getting one result.

Also, we'll want to set interimResults to true as well. This will let us get access to quicker results. However, these results aren't final and may not be as accurate compared to waited a bit longer.

const recog = React.useRef(null);

React.useEffect(() => {
  const SpeechRecognition =
    window.SpeechRecognition || window.webkitSpeechRecognition;
  recog.current = new SpeechRecognition();
  recog.current.continuous = true;
  recog.current.interimResults = true;
}, []);

SpeechRecognition Browser Compatibility

Before we get too far along, it's important to know that the SpeechRecognition feature that we are going to use only has minimal browser support at the moment.

Toggling SpeechRecognition to Start and Stop

Now we'll add another React.useEffect hook so that we can toggle whether or not our Speech Recognition system should start or stop listening. So, if we are listening then will tell our recog ref to start, otherwise, we will stop the recognition instance.

React.useEffect(() => {
  if (listening) {
    recog.current.start();
  } else {
    recog.current.stop();
  }
}, [listening]);

Adding and Removing Event Listeners

At this point, starting or stopping does nothing yet, so let's wire that up. We'll have another React.useEffect hook, and this one will grab our recog reference and add an event listener, listen to the result event, and handle that with the handleResult callback, which we haven't defined yet, but we will very soon.

Also, we'll want to clean-up after ourselves so we'll return a function that will removeEventListener for the result event bound to the handleResult funciton.

React.useEffect(() => {
  const handleResult = () => {
    /* ... more code later ... */
  };
  recog.current.addEventListener('result', handleResult);
  return () => {
    recog.current.removeEventListener('result', handleResult);
  };
}, [onChange, progress, words]);

Handling the Recognition Results

Now, let's define the handleResult function that we've wired-up to the recog reference. In here we will grab out the results portion of the argument passed to us. We'll create an interim variable and take the SpeechRecognitionResultList returned and convert it into an array using Array.from(), limit the results to grab only those that are not final using Array.prototype.filter, grab the first transcript from each of of those using Array.prototype.map, and finally join all of those together into one big string using Array.prototype.join.

In order to leverage the results, let's create some new state using the React.useState hook and save off the results calling setResults.

+ const [results, setResults] = React.useState('');

React.useEffect(() => {
+  const handleResult = ({ results }) => {
+    const interim = Array.from(results)
+      .filter((r) => !r.isFinal)
+      .map((r) => r[0].transcript)
+      .join(' ');
+
+    setResults(interim);

    /* ... more code later ... */
  };

  recog.current.addEventListener('result', handleResult);
  return () => {
    recog.current.removeEventListener('result', handleResult);
  };
}, [onChange, progress, words]);

To see the interim results that we've captured in state, let's conditionally show them as a sibling to <StyledTeleprompter> if they exist.

return (
  <React.Fragment>
    <StyledTeleprompter ref={scrollRef}>
      {words.map((word, i) => (
        <span key={`${word}:${i}`}>{word} </span>
      ))}
    </StyledTeleprompter>
+    {results && <Interim>{results}</Interim>}
  </React.Fragment>
);

Scrolling based on Progress Value

Now, let's focus on scrolling. As setup, let's first create a scrollRef variable using React.useRef hook, setting it to null and assign it to our <StyledTeleprompter> component. To our <span> we'll add an HTML5 data attribute of data-index and assign it to the index of the word.

This technically isn't necessary for the scrolling, but let's add a color style to indicate if the word has already been spoken or not. If the word index is less than the progress (meaning it has already been said), then it'll look gray, otherwise it'll look black.

+ const scrollRef = React.useRef(null);

return (
  <React.Fragment>
-    <StyledTeleprompter>
+    <StyledTeleprompter ref={scrollRef}>
      {words.map((word, i) => (
        <span
          key={`${word}:${i}`}
+          data-index={i}
+          style={{
+            color: i < progress ? '#ccc' : '#000',
+          }}
        >
          {word}{' '}
        </span>
      ))}
    </StyledTeleprompter>
    {results && <Interim>{results}</Interim>}
  </React.Fragment>
);

In order to actually scroll the teleprompter, we'll add another React.useEffect hook. In this one we'll want to be invoked once the progress prop has changed. We'll grab the scrollRef's current value and querySelector the data-index that is 3 words past what the current progress is currently set to. That's to hopefully scroll before we run out of words that are in view.

Here we'll use the optional chaining operator in case nothing was found, but if it was, we'll use the Element.scrollIntoView() method, passing behavior smooth, block nearest, and inline start.

React.useEffect(() => {
  scrollRef.current
    .querySelector(`[data-index='${progress + 3}']`)
    ?.scrollIntoView({
      behavior: 'smooth',
      block: 'nearest',
      inline: 'start',
    });
}, [progress]);

Element.scrollIntoView Browser Compatibility

Support for Element.scrollIntoView() is surprisingly really good, which is great because it's such a handy feature.

Updating to the Next Progress Value

The trickiest part of the app is trying to figure out where the current progress should be. To do this, we'll introduce a newIndex variable, break up the interim string back into an array, and compare each word with the next expected unspoken word in the teleprompter script.

To make comparison easier we'll use two techniques. One is we'll create a cleanWord function to trim whitespace, lowercase the string, and replace any non-alpha characters with an empty string and next, we'll leverge the string-similarity library from npm.

import stringSimilarity from 'string-similarity';

const cleanWord = (word) =>
  word
    .trim()
    .toLocaleLowerCase()
    .replace(/[^a-z]/gi, '');

If the similarity between our words is greater than 75% then we'll increment our index by one otherwise we'll keep it the same. Then if our newIndex is greater than it was previously and is less than the total number of words, then we'll let our consuming component know that something has changed.

React.useEffect(() => {
  const handleResult = ({ results }) => {
    const interim = Array.from(results)
      .filter((r) => !r.isFinal)
      .map((r) => r[0].transcript)
      .join(' ');
    setResults(interim);

+  const newIndex = interim.split(' ').reduce((memo, word) => {
+    if (memo >= words.length) {
+     return memo;
+    }
+    const similarity = stringSimilarity.compareTwoStrings(
+      cleanWord(word),
+      cleanWord(words[memo]),
+    );
+    memo += similarity > 0.75 ? 1 : 0;
+    return memo;
+  }, progress);
+  if (newIndex > progress && newIndex <= words.length) {
+    onChange(newIndex);
+  }
  };

  recog.current.addEventListener('result', handleResult);
  return () => {
    recog.current.removeEventListener('result', handleResult);
  };
}, [onChange, progress, words]);

Conclusion

It's pretty amazing some of the features that are availble in many browsers. Although SpeechRecognition isn't everywhere yet, it's a pretty powerful feature and was definately fun to play with. I hope you enjoy using it as well and find fun and unique ways to leverage the feature.

NOTE: This is a beginning of an egghead playlist that I plan to grow with additional refactors and new features.