A voice assistant is a form of artificial intelligence that recognizes and responds to voice commands. You can find them on smartphones, desktops, smartwatches, and other devices. Alexa, Siri, Google Assistant, and Cortana are some examples of popular voice assistants.
In this blog, we will build our voice assistant using Python’s Speech Recognition. Our voice assistant will perform all the basic tasks such as:-
- tell the day, date, and time
- tell jokes
- open applications and websites
- play YouTube videos
- search on Browser and Wikipedia
- send WhatsApp message
- fetch the latest news headlines
- display calendar
- open the camera and click selfies
- predict weather
- answer geographical and mathematical questions
So let’s begin!
Starting PyCharm
PyCharm IDE is used for this project. PyCharm can be downloaded here. If you want to use any other IDE, please feel free to do so.
Required Modules
You will need to install these modules using the PyCharm terminal. To install, open the terminal and type the pip commands.
Speech Recognition- Speech Recognition is a speech-to-text converter module that allows computers to recognize spoken words and convert them into text.
pip install SpeechRecognition
PyAudio- With PyAudio, we can easily use Python to play and record audio on a variety of platforms.
pip install pyaudio
Pyttsx3- Text-to-speech conversion can be performed with the pyttsx3. It works offline and is compatible with both Python 2 and 3.
pip install pyttsx3
Pywhatkit- With pywhatkit, you can automate emails, watch YouTube videos, and send Whatsapp messages in one command.
pip install pywhatkit
Wikipedia- This is used to retrieve information from the Wikipedia website.
pip install wikipedia
Pyjokes- This module gives one-line jokes for programmers.
pip install pyjokes
Newsapi-Python client library to integrate News API into Python.
pip install newsapi-python
OpenCV- This is used to open the webcam and capture an image.
pip install opencv-python
Selenium- With selenium web browser interaction can be automated from Python.
pip install selenium
WolframAlpha- WolframAlpha API will help to answer almost every question asked by the user. It uses algorithms, a knowledge base, and AI technology.
pip install wolframalpha
Webbrowser – This module includes functions to open URLs in browsers.
Datetime- In-built module that is used to retrieve data about date and time.
Calendar- In-built module that handles operations related to the calendar.
os- In-built module provides functions for interacting with the operating system.
Code
Import the following libraries
import calendar import os import sys import webbrowser import speech_recognition as sr import pyttsx3 import pywhatkit import datetime import wikipedia import pyjokes import wolframalpha import cv2 from newsapi import NewsApiClient
Setting speech engine
The two main tasks of a voice assistant are:-
- recognize speech and convert it to text
- convert text to speech
We are going to use sr.Recognizer() to recognize speech and convert it to text. Next, we set up our assistant variable to pyttsx3 which is used for text-to-speech conversion. We can also choose between male and female voices by setting voice Id to “0” (male) or “1” (female).
ear = sr.Recognizer() assistant = pyttsx3.init() voices = assistant.getProperty('voices') assistant.setProperty('voice', voices[1].id)
Functions
1)Speak Function
We define a function that will allow the assistant to convert text to speech and respond to the commands. The function takes text as an argument.
def speak(text): assistant.say(text) assistant.runAndWait() # Without this command speech will not be audible to us
2)Input Command function
This function is used to give commands using the microphone as the source. We have used adjust_for_ambient_noise(source, duration=0.5) so that the program distinguishes between the speech and ambient noise. With the microphone as the source, we then try to listen to the audio using the listen() method in the Recognizer class. The recognize_google()
performs speech recognition on the audio passed to it, using the Google Speech Recognition API.
def input_command(): try: with sr.Microphone() as source: print(".....") ear.adjust_for_ambient_noise(source, duration=0.5) audio = ear.listen(source) commands = ear.recognize_google(audio) #google cloud speech api speak(commands) except Exception as e: print(e) pass return commands
3)Run assistant function
In this function, we first call the input_command( ) function that we defined before, to take the human commands and store them in the variable command. If there are trigger words in the command given by the user, it will invoke the virtual assistant to speak and perform tasks according to the command.
For example, If the user says ‘stop‘, our voice assistant will speak “voice assistant shutting down” and then will stop the program from running.
def run_assistant(): command = input_command() if 'stop' in command or 'bye' in command: speak("Voice assistant shutting down") sys.exit()
Now let’s see how we can make our voice assistant perform various tasks according to the user’s commands.
Task 1- Play videos on YouTube
First, we will replace the ‘play’ in command with an empty string and store it in the variable song. Next, we will make our voice assistant speak using the speak( ) function that we had defined earlier. We will also print the message and then use pywhatkit.playonyt( ) to open the youtube video.
elif 'play' in command: song = command.replace('play', '') speak('Playing' + song) print('Playing' + song) pywhatkit.playonyt(song)
Task 2- Display time, day, and date
Python has an inbuilt module called DateTime.The DateTime module offers classes for manipulating dates and times. The method called strftime()
, takes one parameter to specify the format of the returned string.
We are using ‘%I:%M %p‘ which corresponds to Hour(00-12) : Minute(00-59) AM/PM respectively.
elif 'time' in command: time = datetime.datetime.now().strftime('%I:%M %p') #Hour(00-12):Minute(00-59) AM/PM speak('Current time is' + time)
elif 'what day' in command: day = datetime.datetime.now() speak(day.strftime("%A")) # Weekday, full version
elif 'what date' in command: date = datetime.datetime.now() speak(date.date())
Task 3- Display Calendar
Python has an inbuilt module calendar that handles operations related to the calendar. The calendar.calendar(2022) returns the whole calendar for the year 2022.
elif 'calendar' in command: assistant.say('Opening Calendar') print(calendar.calendar(2022))
Task 4- Fetching data from Wikipedia
To answer ‘who is’ questions, we will use wikipedia module. First, we will get the subject from the user by replacing ‘who is’ in the command and then storing it in the variable subject. Next, we will use wikipedia.summary() that takes two arguments, the subject given by the user, and the number of sentences that need to be extracted from the wikipedia page.
elif 'who is' in command: subject = command.replace('who is', '') info = wikipedia.summary(subject, 1) print(info) speak(info)
Task 5- Getting jokes
We will use get_joke() from the pyjokes package to make our voice assistant tell jokes.
elif 'joke' in command: speak(pyjokes.get_joke())
Task 6- Opening Websites
To open any website, we are using an in-built module called webbrowser. The open_new_tab() takes one parameter URL and opens the website in a new tab.
elif 'open google' in command: speak('Opening google') webbrowser.open_new_tab("http://www.google.com") elif 'open youtube' in command: speak('Opening YouTube') webbrowser.open_new_tab("http://www.youtube.com") elif 'open gmail' in command: speak('Opening Gmail') webbrowser.open_new_tab("http://www.gmail.com")
Task 7- Opening applications
The OS module in Python provides functions for interacting with the operating system. The startfile() method allows us to open a file by giving its target as an argument. To get the target path, right-click on the application, then click properties and copy the target.
elif 'open zoom' in command: speak('Opening zoom') os.startfile(r"C:\Users\anams\AppData\Roaming\Zoom\bin\Zoom.exe")
Task 8-Searching data from the web
Suppose the user would like to search for images of cats. We will use search() from the pywhatkit module. This will open google search results for cats.
elif 'search' in command: obj=command.replace("search", "") pywhatkit.search(obj)
Task 9- Clicking selfies
To click selfies we will use OpenCV‘s VideoCapture( ).This will allow us to work with the webcam. ‘0’ means the first webcam,’1′ means the second webcam, and so on. To exit the webcam the user has to press ‘e‘.To save the picture the user has to press ‘s’.
To save our selfies with different names we are using the count variable. The count is formatted into the image name and stored in the img variable.
Next,we save the picture using cv2.imwrite( ) function.
elif 'camera' in command: speak('Opening camera to exit the camera press e') speak('to click press s') cam = cv2.VideoCapture(0) #to open webcam while (True): f, frame = cam.read() cv2.imshow('frame', frame) k = cv2.waitKey(1) if k == ord('e'): # to exit break elif k == ord('s'): #to save speak('click') count = 0 img = "img{}.png".format(count) # to format img names cv2.imwrite(img, frame) # to save speak('done') count += 1 cam.release() cv2.destroyAllWindows()
Task 10- Fetching the latest news headlines
To get the API for news we will use newsapi.org. We will create an account and take the API key by clicking on Get API button. Enter the API key in NewsApiClient( ). We then use the get_top_headlines( ) to get the headlines for the desired category, language, and country.
elif 'headlines' in command: newsapi = NewsApiClient(api_key='e16ec165c8e74dffafcd47343238835e') top_headlines = newsapi.get_top_headlines(category='technology', language='en', country='us') headlines = top_headlines['articles'] news = [] for headline in headlines: news.append(headline['title']) for i in range(5): print(i+1, news[i]) speak(news[i]) assistant.runAndWait()
Task 11- Fetching weather forecast
We are using this website to get our weather forecast. We are using the selenium package to automate the web browser interaction.
Click on your desired city and paste the URL into the driver.get( ).
elif 'weather' in command: from selenium import webdriver from selenium.webdriver.common.by import By driver = webdriver.Chrome() driver.get("https://www.weather-forecast.com/locations/Muscat/forecasts/latest") desc = driver.find_element(By.CLASS_NAME, "phrase") print(desc.text) speak(desc.text)
Task 12- Answer questions related to geography and maths
We are using Wolfram alpha API to answer computational and geographical questions. First, go to the WolframAlpha website and create an account. After creating an account click on My Apps(API) and then click on Get an AppID. Copy the AppID and paste it into wolframalpha.Client( ).
elif 'ask' in command: speak("Please go ahead") ques = input_command() client = wolframalpha.Client('56779X-9V3Y57L8TA') #AppID eng = client.query(ques) #process the question ans = next(eng.results).text #get the result in text form speak(ans) print(ans)
Task 12- Send a WhatsApp message
We are using sendwhatmsg_instantly() from pywhatkit. It takes two arguments, the receiver’s number and the message. We have also added a pause_threshold of 2 seconds so that the user can say the complete phone number without moving to the next line immediately.
elif 'send WhatsApp message' in command: speak('Please tell the number') number = input_command() ear.pause_threshold = 2 speak('Please tell the message') message=input_command() pywhatkit.sendwhatmsg_instantly(f"+968{number}", message)
Task 13- To exit the voice assistant
To stop our program from running we will use sys.exit( ).
elif 'stop' in command or 'bye' in command: speak("Voice assistant shutting down") sys.exit()
Task 14- Speak when the command is not understood by the voice assistant
This is the final else part where the command given by the user cannot be understood, and we will make our voice assistant speak the following message.
else: speak('Sorry,I could not understand the command')
The final line of code to run our assistant
while True: run_assistant()
Entire Code
import calendar import os import sys import webbrowser import speech_recognition as sr import pyttsx3 import pywhatkit import datetime import wikipedia import pyjokes import wolframalpha import cv2 from newsapi import NewsApiClient ear = sr.Recognizer() # function to recognize the speech assistant = pyttsx3.init() voices = assistant.getProperty('voices') assistant.setProperty('voice', voices[1].id) def speak(text): assistant.say(text) assistant.runAndWait() # Without this command speech will not be audible to us def input_command(): try: with sr.Microphone() as source: print(".....") ear.adjust_for_ambient_noise(source, duration=0.5) audio = ear.listen(source) commands = ear.recognize_google(audio) speak(commands) except Exception as e: print(e) pass return commands def run_assistant(): command = input_command() if 'play' in command: song = command.replace('play', '') speak('Playing' + song) print('Playing' + song) pywhatkit.playonyt(song) elif 'time' in command: time = datetime.datetime.now().strftime('%I:%M %p') speak('Current time is' + time) elif 'what day' in command: day = datetime.datetime.now() speak(day.strftime("%A")) elif 'what date' in command: date = datetime.datetime.now() speak(date.date()) elif 'who is' in command: subject = command.replace('who is', '') info = wikipedia.summary(subject, 1) print(info) speak(info) elif 'joke' in command: speak(pyjokes.get_joke()) elif 'open google' in command: speak('Opening google') webbrowser.open_new_tab("http://www.google.com") elif 'open youtube' in command: speak('Opening YouTube') webbrowser.open_new_tab("http://www.youtube.com") elif 'open gmail' in command: speak('Opening Gmail') webbrowser.open_new_tab("http://www.gmail.com") elif 'open zoom' in command: speak('Opening zoom') os.startfile(r"C:\Users\anams\AppData\Roaming\Zoom\bin\Zoom.exe") elif 'search' in command: obj=command.replace("search", "") pywhatkit.search(obj) elif 'calendar' in command: assistant.say('Opening Calendar') print(calendar.calendar(2022)) elif 'camera' in command: speak('Opening camera to exit the camera press e') speak('to click press s') cam = cv2.VideoCapture(0) while (True): f, frame = cam.read() cv2.imshow('frame', frame) k = cv2.waitKey(1) if k == ord('e'): break elif k == ord('s'): speak('click') count = 0 img = "img{}.png".format(count) cv2.imwrite(img, frame) speak('done') count += 1 cam.release() cv2.destroyAllWindows() elif 'headlines' in command: newsapi = NewsApiClient(api_key='e16ec165c8e74dffafcd47343238835e') top_headlines = newsapi.get_top_headlines(category='technology', language='en', country='us') headlines = top_headlines['articles'] news = [] for headline in headlines: news.append(headline['title']) for i in range(5): print(i+1, news[i]) speak(news[i]) assistant.runAndWait() elif 'weather' in command: from selenium import webdriver from selenium.webdriver.common.by import By driver = webdriver.Chrome() driver.get("https://www.weather-forecast.com/locations/Muscat/forecasts/latest") desc = driver.find_element(By.CLASS_NAME, "phrase") print(desc.text) speak(desc.text) elif 'ask' in command: speak("Please go ahead") ques = input_command() client = wolframalpha.Client('56779X-9V3Y57L8TA') eng = client.query(ques) ans = next(eng.results).text speak(ans) print(ans) elif 'send WhatsApp message' in command: speak('Please tell the number') number = input_command() ear.pause_threshold = 2 speak('Please tell the message') message=input_command() pywhatkit.sendwhatmsg_instantly(f"+968{number}", message) elif 'stop' in command or 'bye' in command: speak("Voice assistant shutting down") sys.exit() else: speak('Sorry,I could not understand the command') while True: run_assistant()
Check out my GitHub for more!
Thanks and Happy Coding! 🙂