How to run speech to text application in React by using Google Cloud.
Hello everyone, today we are going to build a React Application that will convert audio speech to text by using Google Cloud Platform.
A React application is a web application or user interface built using the React JavaScript library. React is a popular and widely used open-source library developed by Facebook. It’s specifically designed for building user interfaces and is known for its component-based architecture, which allows developers to create modular, reusable UI elements.
To run React application in your web browser, follow these steps:
-
Make sure you have Node.js installed on your computer. If you don’t, download and install it from the official Node.js website. I suggest use ubuntu 22.04 and there install Node.js.
-
Open a terminal or command prompt.
-
Execute the following commands to create a new React application and navigate to the project folder:
npx create-react-app speech-to-text cd speech-to-text
-
Install Axios by running the following command:
npm install axios
-
To get the API key for Google Cloud Platform (GCP) to use the Speech-to-Text service, we need a Google Cloud Platform Account: If you don’t already have a Google Cloud Platform account, you’ll need to create one. Visit the Google Cloud Platform website and sign up for an account here
-
Create a New Project: Once logged in to the GCP console, create a new project (or use an existing one). The project will serve as the container for resources and services you use.
-
Enable the Speech-to-Text API: In the Google Cloud Console, navigate to the API & Services section. Find the “Library” option and search for “Cloud Speech-to-Text API” or simply “Speech-to-Text API”. Enable this API for your project.
-
Create Credentials: After enabling the API, you’ll need to create credentials. Go to the “Credentials” section in the Google Cloud Console. Click “Create Credentials” and select “API key”. This will generate an API key that you can use to access the Speech-to-Text API.
-
Using the API Key: Once you’ve obtained the API key, you can use it in your applications to access the Speech-to-Text service.
-
To read the API key from a
.env
file, in the root directory of your project (wherepackage.json
is located), create a new file named.env
. -
Add your API key to the
.env
file in the following format:REACT_APP_GOOGLE_API_KEY=your_api_key
Replace
your_api_key
with your actual Google Cloud Speech-to-Text API key. -
Replace the contents of the
src/App.js
file with the following code:import axios from 'axios'; import React, { useState, useEffect } from 'react'; // Function to convert audio blob to base64 encoded string const audioBlobToBase64 = (blob) => { return new Promise((resolve, reject) => { const reader = new FileReader(); reader.onloadend = () => { const arrayBuffer = reader.result; const base64Audio = btoa( new Uint8Array(arrayBuffer).reduce( (data, byte) => data + String.fromCharCode(byte), '' ) ); resolve(base64Audio); }; reader.onerror = reject; reader.readAsArrayBuffer(blob); }); }; const App = () => { const [recording, setRecording] = useState(false); const [mediaRecorder, setMediaRecorder] = useState(null); const [transcription, setTranscription] = useState(''); // Cleanup function to stop recording and release media resources useEffect(() => { return () => { if (mediaRecorder) { mediaRecorder.stream.getTracks().forEach(track => track.stop()); } }; }, [mediaRecorder]); if (!process.env.REACT_APP_GOOGLE_API_KEY) { throw new Error("REACT_APP_GOOGLE_API_KEY not found in the environment"); } const apiKey = process.env.REACT_APP_GOOGLE_API_KEY; const startRecording = async () => { try { const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); const recorder = new MediaRecorder(stream); recorder.start(); console.log('Recording started'); // Event listener to handle data availability recorder.addEventListener('dataavailable', async (event) => { console.log('Data available event triggered'); const audioBlob = event.data; const base64Audio = await audioBlobToBase64(audioBlob); //console.log('Base64 audio:', base64Audio); try { const startTime = performance.now(); const response = await axios.post( `https://speech.googleapis.com/v1/speech:recognize?key=${apiKey}`, { config: { encoding: 'WEBM_OPUS', sampleRateHertz: 48000, languageCode: 'en-US', }, audio: { content: base64Audio, }, } ); const endTime = performance.now(); const elapsedTime = endTime - startTime; //console.log('API response:', response); console.log('Time taken (ms):', elapsedTime); if (response.data.results && response.data.results.length > 0) { setTranscription(response.data.results[0].alternatives[0].transcript); } else { console.log('No transcription results in the API response:', response.data); setTranscription('No transcription available'); } } catch (error) { console.error('Error with Google Speech-to-Text API:', error.response.data); } }); setRecording(true); setMediaRecorder(recorder); } catch (error) { console.error('Error getting user media:', error); } }; const stopRecording = () => { if (mediaRecorder) { mediaRecorder.stop(); console.log('Recording stopped'); setRecording(false); } }; const mode4 =( <div style=> <h1 style=>Speech to Text</h1> {!recording ? ( <button onClick={startRecording} style=>Start Recording</button> ) : ( <button onClick={stopRecording} style=>Stop Recording</button> )} <p style=>Transcription: {transcription}</p> </div> ); return (mode4); }; export default App;
- Save the changes to
src/App.js
. - Start the development server by running the following command in the terminal or command prompt:
npm start
The application should automatically open in your default web browser. If it doesn’t, open your browser and navigate to http://localhost:3000
.
Now you should see the application running in your browser, with a button to start and stop recording and a section to display the transcribed text.
Description of the Code
The provided code snippet is part of a React application that allows users to start and stop voice recording. When the “Start Recording” button is clicked, the code initiates the recording process by calling the “startRecording” function.
It requests permission from the user to access the audio device using “navigator.mediaDevices.getUserMedia”. Inside the function, a MediaRecorder object is created, which captures the audio stream from the microphone.
The recording starts, and an event listener is added to handle “dataavailable” events. These events are triggered whenever there is new audio data available from the microphone.
When a “dataavailable” event occurs, the code converts the captured audio data into a “base64” encoded string and sends it to the Google Speech-to-Text API using the “axios” library. The API request includes audio content, language configuration, and an API key.
The response from the API is received, and the transcription results are processed. If the API returns a transcription, it is displayed on the screen using a “p” element.
The recording stops when the “Stop Recording” button is clicked. In the code, the “setRecording” and “setMediaRecorder” states are used to track the recording state. The transcription state is also stored in the “transcription” state variable. Overall, this code provides a simple interface for voice recording, sends the audio data to the Google Speech-to-Text API, and displays the resulting transcription on the screen
You can personalize your own colors, I have added some colors examples :
The default theme color is the Google color, called mode4
const mode4 =(
<div style=>
<h1 style=>Speech to Text</h1>
{!recording ? (
<button onClick={startRecording} style=>Start Recording</button>
) : (
<button onClick={stopRecording} style=>Stop Recording</button>
)}
<p style=>Transcription: {transcription}</p>
</div>
);
The next theme is the Matrix color named mode3
const mode3 = (
<div style=>
<h1 style=>Speech to Text</h1>
{!recording ? (
<button onClick={startRecording} style=>Start Recording</button>
) : (
<button onClick={stopRecording} style=>Stop Recording</button>
)}
<p style=>Transcription: {transcription}</p>
</div>
);
then the next theme multicolors named mode2
const mode2 = (
<div style=>
<h1 style=>Speech to Text</h1>
{!recording ? (
<button onClick={startRecording} style=>Start Recording</button>
) : (
<button onClick={stopRecording} style=>Stop Recording</button>
)}
<p style=>Transcription: {transcription}</p>
</div>
);
and finally the simplest theme the mode1
const mode1 = (
<div>
<h1>Speech to Text</h1>
{!recording ? (
<button onClick={startRecording}>Start Recording</button>
) : (
<button onClick={stopRecording}>Stop Recording</button>
)}
<p>Transcription: {transcription}</p>
</div>
);
Notes: if you have special types of input audio in your programs you can check the supported encodings available at the official google documentation here.
Congratulations! You have learned how to create your React Application that uses speech to text by using Google Cloud Platform.
Leave a comment