Python Project: Intelligent Voice Assistant in 30 mins using Speech Recognition

A voice assistant is a form of artificial intelligence that recognizes and responds to voice commands. You can find them on smartphones, desktops, smartwatches, and other devices. Alexa, Siri, Google Assistant, and Cortana are some examples of popular voice assistants.

In this blog, we will build our voice assistant using Python’s Speech Recognition. Our voice assistant will perform all the basic tasks such as:-

tell the day, date, and time
tell jokes
open applications and websites
play YouTube videos
search on Browser and Wikipedia
send WhatsApp message
fetch the latest news headlines
display calendar
open the camera and click selfies
predict weather
answer geographical and mathematical questions

So let’s begin!

Table of Contents

Starting PyCharm

PyCharm IDE is used for this project. PyCharm can be downloaded here. If you want to use any other IDE, please feel free to do so.

Required Modules

You will need to install these modules using the PyCharm terminal. To install, open the terminal and type the pip commands.

Speech Recognition- Speech Recognition is a speech-to-text converter module that allows computers to recognize spoken words and convert them into text.

pip install SpeechRecognition

PyAudio- With PyAudio, we can easily use Python to play and record audio on a variety of platforms.

pip install pyaudio

Pyttsx3- Text-to-speech conversion can be performed with the pyttsx3. It works offline and is compatible with both Python 2 and 3.

pip install pyttsx3

Pywhatkit- With pywhatkit, you can automate emails, watch YouTube videos, and send Whatsapp messages in one command.

pip install pywhatkit

Wikipedia- This is used to retrieve information from the Wikipedia website.

pip install wikipedia

Pyjokes- This module gives one-line jokes for programmers.

pip install pyjokes

Newsapi-Python client library to integrate News API into Python.

pip install newsapi-python

OpenCV- This is used to open the webcam and capture an image.

pip install opencv-python

Selenium- With selenium web browser interaction can be automated from Python.

pip install selenium

WolframAlpha- WolframAlpha API will help to answer almost every question asked by the user. It uses algorithms, a knowledge base, and AI technology.

pip install wolframalpha

Webbrowser – This module includes functions to open URLs in browsers.

Datetime- In-built module that is used to retrieve data about date and time.

Calendar- In-built module that handles operations related to the calendar.

os- In-built module provides functions for interacting with the operating system.

Code

Import the following libraries

import calendar
import os
import sys
import webbrowser
import speech_recognition as sr
import pyttsx3
import pywhatkit
import datetime
import wikipedia
import pyjokes
import wolframalpha
import cv2
from newsapi import NewsApiClient

Setting speech engine

The two main tasks of a voice assistant are:-

recognize speech and convert it to text
convert text to speech

We are going to use sr.Recognizer() to recognize speech and convert it to text. Next, we set up our assistant variable to pyttsx3 which is used for text-to-speech conversion. We can also choose between male and female voices by setting voice Id to “0” (male) or “1” (female).

ear = sr.Recognizer()   
assistant = pyttsx3.init() 
voices = assistant.getProperty('voices')
assistant.setProperty('voice', voices[1].id)

Functions

1)Speak Function

We define a function that will allow the assistant to convert text to speech and respond to the commands. The function takes text as an argument.

def speak(text):
    assistant.say(text)
    assistant.runAndWait()  # Without this command speech will not be audible to us

2)Input Command function

This function is used to give commands using the microphone as the source. We have used adjust_for_ambient_noise(source, duration=0.5) so that the program distinguishes between the speech and ambient noise. With the microphone as the source, we then try to listen to the audio using the listen() method in the Recognizer class. The recognize_google() performs speech recognition on the audio passed to it, using the Google Speech Recognition API.

def input_command():
    try:
        with sr.Microphone() as source:
            print(".....")
            ear.adjust_for_ambient_noise(source, duration=0.5) 
            audio = ear.listen(source)
            commands = ear.recognize_google(audio) #google cloud speech api
            speak(commands)
    except Exception as e:
        print(e)
        pass
    return commands

3)Run assistant function

In this function, we first call the input_command( ) function that we defined before, to take the human commands and store them in the variable command. If there are trigger words in the command given by the user, it will invoke the virtual assistant to speak and perform tasks according to the command.

For example, If the user says ‘stop‘, our voice assistant will speak “voice assistant shutting down” and then will stop the program from running.

def run_assistant():
    command = input_command()

    if 'stop' in command or 'bye' in command:
        speak("Voice assistant shutting down")
        sys.exit()

Now let’s see how we can make our voice assistant perform various tasks according to the user’s commands.

Task 1- Play videos on YouTube

First, we will replace the ‘play’ in command with an empty string and store it in the variable song. Next, we will make our voice assistant speak using the speak( ) function that we had defined earlier. We will also print the message and then use pywhatkit.playonyt( ) to open the youtube video.

    elif 'play' in command:
        song = command.replace('play', '')
        speak('Playing' + song)
        print('Playing' + song)
        pywhatkit.playonyt(song)

Task 2- Display time, day, and date

Python has an inbuilt module called DateTime.The DateTime module offers classes for manipulating dates and times. The method called strftime(), takes one parameter to specify the format of the returned string.

We are using ‘%I:%M %p‘ which corresponds to Hour(00-12) : Minute(00-59) AM/PM respectively.

    elif 'time' in command:
        time = datetime.datetime.now().strftime('%I:%M %p') #Hour(00-12):Minute(00-59) AM/PM
        speak('Current time is' + time)

    elif 'what day' in command:
        day = datetime.datetime.now()
        speak(day.strftime("%A")) #	Weekday, full version

    elif 'what date' in command:
        date = datetime.datetime.now()
        speak(date.date())

Task 3- Display Calendar

Python has an inbuilt module calendar that handles operations related to the calendar. The calendar.calendar(2022) returns the whole calendar for the year 2022.

    elif 'calendar' in command:
        assistant.say('Opening Calendar')
        print(calendar.calendar(2022))

Task 4- Fetching data from Wikipedia

To answer ‘who is’ questions, we will use wikipedia module. First, we will get the subject from the user by replacing ‘who is’ in the command and then storing it in the variable subject. Next, we will use wikipedia.summary() that takes two arguments, the subject given by the user, and the number of sentences that need to be extracted from the wikipedia page.

    elif 'who is' in command:
        subject = command.replace('who is', '')
        info = wikipedia.summary(subject, 1)
        print(info)
        speak(info)

Task 5- Getting jokes

We will use get_joke() from the pyjokes package to make our voice assistant tell jokes.

    elif 'joke' in command:
        speak(pyjokes.get_joke())

Task 6- Opening Websites

To open any website, we are using an in-built module called webbrowser. The open_new_tab() takes one parameter URL and opens the website in a new tab.

    elif 'open google' in command:
        speak('Opening google')
        webbrowser.open_new_tab("http://www.google.com")

    elif 'open youtube' in command:
        speak('Opening YouTube')
        webbrowser.open_new_tab("http://www.youtube.com")

    elif 'open gmail' in command:
        speak('Opening Gmail')
        webbrowser.open_new_tab("http://www.gmail.com")

Task 7- Opening applications

The OS module in Python provides functions for interacting with the operating system. The startfile() method allows us to open a file by giving its target as an argument. To get the target path, right-click on the application, then click properties and copy the target.

    elif 'open zoom' in command:
        speak('Opening zoom')
        os.startfile(r"C:\Users\anams\AppData\Roaming\Zoom\bin\Zoom.exe")

Task 8-Searching data from the web

Suppose the user would like to search for images of cats. We will use search() from the pywhatkit module. This will open google search results for cats.

    elif 'search' in command:
        obj=command.replace("search", "")
        pywhatkit.search(obj)

Task 9- Clicking selfies

To click selfies we will use OpenCV‘s VideoCapture( ).This will allow us to work with the webcam. ‘0’ means the first webcam,’1′ means the second webcam, and so on. To exit the webcam the user has to press ‘e‘.To save the picture the user has to press ‘s’.

To save our selfies with different names we are using the count variable. The count is formatted into the image name and stored in the img variable.

Next,we save the picture using cv2.imwrite( ) function.

    elif 'camera' in command:
        speak('Opening camera to exit the camera press e')
        speak('to click press s')
        cam = cv2.VideoCapture(0) #to open webcam
        while (True):
            f, frame = cam.read()
            cv2.imshow('frame', frame)
            k = cv2.waitKey(1)
            if k == ord('e'): # to exit
                break
            elif k == ord('s'): #to save
                speak('click')
                count = 0
                img = "img{}.png".format(count) # to format img names
                cv2.imwrite(img, frame) # to save
                speak('done')
                count += 1
        cam.release()
        cv2.destroyAllWindows()

Task 10- Fetching the latest news headlines

To get the API for news we will use newsapi.org. We will create an account and take the API key by clicking on Get API button. Enter the API key in NewsApiClient( ). We then use the get_top_headlines( ) to get the headlines for the desired category, language, and country.

    elif 'headlines' in command:
        newsapi = NewsApiClient(api_key='e16ec165c8e74dffafcd47343238835e')
        top_headlines = newsapi.get_top_headlines(category='technology',
                                                  language='en',
                                                  country='us')
        headlines = top_headlines['articles']
        news = []
        for headline in headlines:
            news.append(headline['title'])
        for i in range(5):
            print(i+1, news[i])
            speak(news[i])
            assistant.runAndWait()

Task 11- Fetching weather forecast

We are using this website to get our weather forecast. We are using the selenium package to automate the web browser interaction.

Click on your desired city and paste the URL into the driver.get( ).

    elif 'weather' in command:
        from selenium import webdriver
        from selenium.webdriver.common.by import By
        driver = webdriver.Chrome()
        driver.get("https://www.weather-forecast.com/locations/Muscat/forecasts/latest")
        desc = driver.find_element(By.CLASS_NAME, "phrase")
        print(desc.text)
        speak(desc.text)

Task 12- Answer questions related to geography and maths

We are using Wolfram alpha API to answer computational and geographical questions. First, go to the WolframAlpha website and create an account. After creating an account click on My Apps(API) and then click on Get an AppID. Copy the AppID and paste it into wolframalpha.Client( ).

     elif 'ask' in command:
        speak("Please go ahead")
        ques = input_command()
        client = wolframalpha.Client('56779X-9V3Y57L8TA') #AppID
        eng = client.query(ques) #process the question
        ans = next(eng.results).text #get the result in text form
        speak(ans)
        print(ans)

Task 12- Send a WhatsApp message

We are using sendwhatmsg_instantly() from pywhatkit. It takes two arguments, the receiver’s number and the message. We have also added a pause_threshold of 2 seconds so that the user can say the complete phone number without moving to the next line immediately.

    elif 'send WhatsApp message' in command:
        speak('Please tell the number')
        number = input_command()
        ear.pause_threshold = 2
        speak('Please tell the message')
        message=input_command()
        pywhatkit.sendwhatmsg_instantly(f"+968{number}", message)

Task 13- To exit the voice assistant

To stop our program from running we will use sys.exit( ).

    elif 'stop' in command or 'bye' in command:
        speak("Voice assistant shutting down")
        sys.exit()

Task 14- Speak when the command is not understood by the voice assistant

This is the final else part where the command given by the user cannot be understood, and we will make our voice assistant speak the following message.

    else:
        speak('Sorry,I could not understand the command')

The final line of code to run our assistant

while True:
    run_assistant()

Entire Code

import calendar
import os
import sys
import webbrowser
import speech_recognition as sr
import pyttsx3
import pywhatkit
import datetime
import wikipedia
import pyjokes
import wolframalpha
import cv2
from newsapi import NewsApiClient

ear = sr.Recognizer()  # function to recognize the speech
assistant = pyttsx3.init()
voices = assistant.getProperty('voices')
assistant.setProperty('voice', voices[1].id)

def speak(text):
    assistant.say(text)
    assistant.runAndWait()  # Without this command speech will not be audible to us

def input_command():
    try:
        with sr.Microphone() as source:
            print(".....")
            ear.adjust_for_ambient_noise(source, duration=0.5)
            audio = ear.listen(source)
            commands = ear.recognize_google(audio)
            speak(commands)
    except Exception as e:
        print(e)
        pass
    return commands

def run_assistant():
    command = input_command()

    if 'play' in command:
        song = command.replace('play', '')
        speak('Playing' + song)
        print('Playing' + song)
        pywhatkit.playonyt(song)

    elif 'time' in command:
        time = datetime.datetime.now().strftime('%I:%M %p')
        speak('Current time is' + time)

    elif 'what day' in command:
        day = datetime.datetime.now()
        speak(day.strftime("%A"))

    elif 'what date' in command:
        date = datetime.datetime.now()
        speak(date.date())

    elif 'who is' in command:
        subject = command.replace('who is', '')
        info = wikipedia.summary(subject, 1)
        print(info)
        speak(info)

    elif 'joke' in command:
        speak(pyjokes.get_joke())

    elif 'open google' in command:
        speak('Opening google')
        webbrowser.open_new_tab("http://www.google.com")

    elif 'open youtube' in command:
        speak('Opening YouTube')
        webbrowser.open_new_tab("http://www.youtube.com")

    elif 'open gmail' in command:
        speak('Opening Gmail')
        webbrowser.open_new_tab("http://www.gmail.com")

    elif 'open zoom' in command:
        speak('Opening zoom')
        os.startfile(r"C:\Users\anams\AppData\Roaming\Zoom\bin\Zoom.exe")

    elif 'search' in command:
        obj=command.replace("search", "")
        pywhatkit.search(obj)

    elif 'calendar' in command:
        assistant.say('Opening Calendar')
        print(calendar.calendar(2022))

    elif 'camera' in command:
        speak('Opening camera to exit the camera press e')
        speak('to click press s')
        cam = cv2.VideoCapture(0)
        while (True):
            f, frame = cam.read()
            cv2.imshow('frame', frame)
            k = cv2.waitKey(1)
            if k == ord('e'):
                break
            elif k == ord('s'):
                speak('click')
                count = 0
                img = "img{}.png".format(count)
                cv2.imwrite(img, frame)
                speak('done')
                count += 1
        cam.release()
        cv2.destroyAllWindows()

    elif 'headlines' in command:
        newsapi = NewsApiClient(api_key='e16ec165c8e74dffafcd47343238835e')
        top_headlines = newsapi.get_top_headlines(category='technology',
                                                  language='en',
                                                  country='us')
        headlines = top_headlines['articles']
        news = []
        for headline in headlines:
            news.append(headline['title'])
        for i in range(5):
            print(i+1, news[i])
            speak(news[i])
            assistant.runAndWait()

    elif 'weather' in command:
        from selenium import webdriver
        from selenium.webdriver.common.by import By
        driver = webdriver.Chrome()
        driver.get("https://www.weather-forecast.com/locations/Muscat/forecasts/latest")
        desc = driver.find_element(By.CLASS_NAME, "phrase")
        print(desc.text)
        speak(desc.text)

    elif 'ask' in command:
        speak("Please go ahead")
        ques = input_command()
        client = wolframalpha.Client('56779X-9V3Y57L8TA')
       eng = client.query(ques)
        ans = next(eng.results).text
        speak(ans)
        print(ans)

    elif 'send WhatsApp message' in command:
        speak('Please tell the number')
        number = input_command()
        ear.pause_threshold = 2
        speak('Please tell the message')
        message=input_command()
        pywhatkit.sendwhatmsg_instantly(f"+968{number}", message)

    elif 'stop' in command or 'bye' in command:
        speak("Voice assistant shutting down")
        sys.exit()

    else:
        speak('Sorry,I could not understand the command')
while True:
    run_assistant()

Check out my GitHub for more!

Thanks and Happy Coding! 🙂

Starting PyCharm

Required Modules

Setting speech engine

Task 1- Play videos on YouTube

Entire Code

You Might Also Like

Linked Lists in Python – Everything you need to know as a programmer

Linked Lists: Everything you need to know as a programmer

Python Arrays: Everything you need to know as a programmer