Based on python and deep learning (speech recognition, NLP) to implement local offline intelligent voice control terminal (with chat function and home control function)

First, let’s take a look at what functions a good intelligent control terminal requires:

1. Be available at any time and sleep when not needed, saving power and worry.

2. Must be able to listen and understand people’s needs

3. Be able to control smart homes to meet people’s needs

4. You also need to be able to speak and better communicate and interact with people.

5. Must be able to chat and communicate with people

6. It can serve humans normally regardless of whether there is a network or not, and has a high fault tolerance rate.

7. Additional demand: If you can, do it for free, preferably without spending any money.

Based on the above requirements, the following ideas are drawn:

Here I mainly implement the offline version that I have trained myself, and I will not use the online version that uses the APIs of various major Internet companies. There are many tutorials on the Internet, so I will not explain them in detail here.

Let’s look at the effect first:

Intelligent voice assistant

Each link of this project was implemented one by one by myself, and then connected through a main function;

There may be some flaws in the workmanship, but the main operation is completely flawless.

Let’s start by implementing the functions one by one.

1. Learn to sleep and wake up by voice

Here I usepocketsphinxExample of implementing voice wake-up:

1. Environment configuration:

pip install pocketsphinx 
pip install pyaudio     

If the installation fails in this way, we need to download the specified whl file online and then install it offline.

Go to the website above to find the whl files of pocketsphinx and pyaudio. Pay attention to selecting the corresponding ones here.your operating system

andpython version, download it and put it in the project folder, enter in the pycharm terminal:

pip install full name of package

Perform a local offline installation.

For example: I am installing pocketsphinx offline with python3.7 on the window. Pay attention to the path of the file here, otherwise an error will be reported that the specified file cannot be found.

pip install pocketsphinx-0.1.15-cp37-cp37m-win_amd64.whl

2. How to customize and train your own wake-up words:

Create a file named keyword, and enter the wake-up word you want, as well as words with similar pronunciations (the more words with similar pronunciations, the higher the sensitivity). For example: the terminal I want to train is named COCO, then my keyword content for:


Open the website:Sphinx Knowledge Base Tool VERSION 3


Select keyword.txt to upload and get the corresponding compressed package. After downloading, place it in the project folder and unzip it:

The numbers here are randomly generated by the website, and it is normal to be different.

Test code:

import os
from pocketsphinx import LiveSpeech, get_model_path

def wakeup_co():
    model_path = get_model_path()
    speech = LiveSpeech(
        hmm=os.path.join(model_path, 'en-us'),
        lm=os.path.join('.\\Sphinx_keyword\\keyword_COCO\\', '5995.lm'),
        dic=os.path.join('.\\Sphinx_keyword\\keyword_COCO\\', '5995.dic')
    for phrase in speech:
        #print("phrase:", phrase)
        if str(phrase) in ["GOGO", "COCO", "YOYO",
                           "BOBO", "LOLO", "MOMO",
                           "NONO", "HOHO"]:
              print('I am COCO')

here ‘.\\Sphinx_keyword\\keyword_COCO\\’is the address where I store the files,5995It also needs to be changed according to your file and needs to be changed according to your address.

Test Results:

Allocating 32 buffers of 2500 samples each

Process ended with exit code 0

If you want to use Chinese words as wake-up words, you need to download the relevant Chinese files:

Download CMU Sphinx from

After downloading, put it in the project folder and unzip it, you will get:cmusphinx-zh-cn-5.2  folder

Just like training English wake-up words, you need to establish keyword.txtFile, I trained a wake word named “Pepe”:

A surname

Open the website:Sphinx Knowledge Base Tool VERSION 3

uploadkeyword.txtFinally, you will get a compressed package, download it, put it in the project path and unzip it.

Need to change here dicThe content of the file with the suffix needs to be added after the Chinese according to the format.Pinyinandtone, all intervals are one space, after modification, for example:

Pepe p ei4 p ei3	
Nene n ei4 n ei3	
Hey hey h ei4 h ei3	
kkk ei4 k ei3	
gotd ei4 d ei3	
tetet ei4 t ei3	
Leilei l ei4 l ei3	
Beibei b ei4 b ei3	
Thief z ei4 z ei3

Test code:

import os
from pocketsphinx import LiveSpeech, get_model_path

model_path = '.\\Sphinx_keyword\\cmusphinx-zh-cn-5.2\\'

speech = LiveSpeech(
    hmm=os.path.join(model_path ,'zh_cn.cd_cont_5000'),
    lm=os.path.join('.\\Sphinx_keyword\\keyword_PeiPei\\', '0738.lm'),
    dic=os.path.join('.\\Sphinx_keyword\\keyword_PeiPei\\', '0738.dic')
for phrase in speech:
    print("phrase:", phrase)
    if str(phrase) in ["Beibei", "Pepe", "Leilei",
                       "Nene", "Hey", "Te Te",
                       "Dede", "thief", "keke",]:
        print("I am Pepe")

here model_path The path needs to point to the decompressed Chinese folder  cmusphinx-zh-cn-5.2  insidezh_cn.cd_cont_5000 folder.

‘.\\Sphinx_keyword\\keyword_PeiPei\\’ and0738 You need to modify the path and file name to point to the file you downloaded and decompressed.

Test Results:

Allocating 32 buffers of 2500 samples each
phrase: Indicates that the phrase may be
[('<s>', 0, 4359242, 4359325), ('<sil>', -1331, 4359326, 4359479), ('<sil>', -1331, 4359480, 4359801), ('<sil>', -1331, 4359802, 4359940), (' Ki ', 0, 4359941, 4360080)]
I'm Pepe
phrase: gotde
[(' <s>', 0, 8516377, 8516528), ('</s><sil> <s>', -5375, 8516529, 8516683), ('</s><sil> <s>', -2035, 8516684, 8516764), ('Dede', 0, 8516765, 8516968), (' '</s> , 0, 8516969, 8516980)]
I'm Pepe
phrase: hehe
[(' <s>', 0, 10674834, 10675304), ('Hey hey', -3628, 10675305, 10675382), ('</s> ', 0, 10675383, 10675385)]
I'm Pepe

Just use one of both Chinese and English wake-up words. The English version is more sensitive, but the Chinese version may be slower, so here I still recommend using the English version for training, which is fast and sensitive.

2. Learn to listen

Here I am based onpytorchofspeechbrainThe pre-training model is trained to obtain a Chinese speech recognition system.

It can convert Chinese speech into text output.

1. Basic environment configuration, if anything is missing later, just use pip.

pip install speechbrain
pip install SoundFile
pip install sox
pip install speech_recognition

2. Receive and save voice as wav file

This is based onspeech_recognitionPackage is implemented.

When reading voice and there is no voice input, it will automatically stop and save.

Test code:

import speech_recognition as sr #pyaudio SpeechRecognition module

from myself_word_to_voice import speakout

def rec(rate=16000): # Pick up audio data from the system microphone, the sampling rate is 16000
    r = sr.Recognizer()
    with sr.Microphone(sample_rate=rate) as source:
        sayword = 'coco is listening'
        print(sayword) #please say something will be printed here to prompt you to speak and record.
        audio = r.listen(source)

    with open("recording.wav", "wb") as f: #Save the collected audio data in wav format to the recording.wav file in the current directory
        print('I have received what you said')
    return 1


Test results: Open the project folder and double-click to open it.recording.wavThe file can hear what you say.

3. Receive and read the wav file of Chinese voice and convert it into text output

Here I give the official website and github addresses, you can try it yourself:

SpeechBrain: A PyTorch Speech Toolkit

GitHub – speechbrain/speechbrain: A PyTorch-based Speech Toolkit

Here I choose Mandarin. After downloading the pre-trained model, there are also tutorials on the website. You can try to train and use it yourself.

Here I directly give the code source and model of the implementation, which can be used directly after configuring the environment.

SpeechBrain (Chinese speech recognition).zip-deep learning document resources-download

Test Results:

The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
If you have milk, there will be coverage, including everything, and you can adjust mileage and voice to achieve better results.

Process ended with exit code 0

3. Learn to chat

Here I am based onchatterbotand third-party semantic libraries to highly customize your own robot chat dialogue system

1. Environment configuration:

pip install chatterbot
pip install chatterbot_corpus

Possible errors:

OSError: [E053] Could not read config.cfg from C:\Users\pc\AppData\Local\Programs\Python\Python38\Lib\site-packages\en_core_web_md\en_core_web_md-2.2.5\config.cfg。


pip uninstall spacy
pip install -i spacy==2.2.2

2. Try to train the official Chinese data set and use

from chatterbot import ChatBot
from chatterbot.trainers import ChatterBotCorpusTrainer
import logging

This is an example showing how to train a chat bot using the
ChatterBot Corpus of conversation dialog.

# Enable info level logging
# logging.basicConfig(level=logging.INFO)
chatbot = ChatBot('Example Bot')

# Start by training our bot with the ChatterBot corpus data
trainer = ChatterBotCorpusTrainer(chatbot)

def train():

def chat(word = ''):
    word = chatbot.get_response(word)
    return word

def test1():
    while 1:


If no errors are reported, you can continue to the next step to train your own data set to achieve a high degree of customization.

3.I provide a data set:

corpus.txt is used to train your own chatbot-deep learning document resources-download

It probably looks like this:

After downloading, create a folder named corpus in your project folder and put the downloaded corpus.txt into it.

For training, code:

from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer
from chatterbot.trainers import ChatterBotCorpusTrainer

# Build ChatBot and specify Adapter
my_bot = ChatBot(
            'import_path': 'chatterbot.logic.BestMatch',
            'threshold': 0.65, #If it is lower than the confidence level, the default answer will be
            'default_response':'coco didn't understand'

def train_myword():
    file = open("./corpus/corpus.txt", 'r', encoding='utf-8')
    corpus = []
    print('Start loading corpus!')
    #Import corpus
    while 1:
            line = file.readline()
            if not line:
            if line == '===\n':
            temp = line.strip('\n')
            # print(temp)
    print('Corpus loading completed')

    #my_bot = ChatBot("coco")
    trainer = ListTrainer(my_bot)
    print('Start training!')
    print('Training completed!')

def chat1():
    while True:

def chat_my(word = ''):
    word = my_bot.get_response(word)
    return word

def test1():


Training completed:

Start loading corpus!
The corpus is loaded
Start training!
List Trainer: [####################] 100%
Training completed!

Here I only train the first 10,000 conversations in the corpus. It is recommended not to train too many conversations. If you bite off more than you can chew, there will be a high reply delay even after training, and you may even be unable to run replies directly, which will greatly affect the user experience.

4. Perform mathematical operations and time queries:

# -*- coding: utf-8 -*-
from chatterbot import ChatBot

bot = ChatBot(
    "Math & Time Bot",

def chot_math_time(text=''):
    response = bot.get_response(text)
    return response

print(chot_math_time('what is 1 + 1'))
print(chot_math_time('What time is it now'))


1 + 1 = 2
The current time is 05:32 PM

Process ended with exit code 0

Since this module only supports English, when we use Chinese speech, we need to strip the numbers in the speech. The specific code is as follows:

def Split_num_letters(astr):
    nums = []
    astr = astr +'none'
    num1 = ''
    for i in range(len(astr)-1):
        if astr[i].isdigit()== True and astr[i+1].isdigit()==False:
            num1 = ''
        elif astr[i].isdigit() == False and astr[i+1].isdigit() == True:
            num1 = num1 + astr[i+1]
        elif astr[i].isdigit() == True and astr[i+1].isdigit() ==True:
                num1 = num1 + astr[i+1]
    if astr[0].isdigit():
        nums[0] = astr[0] + nums[0]
    return nums

Split_num_letters('Do you know what 120 times 20 equals')
Split_num_letters('What is 120 times 20?')

Result: This function can be used in conjunction with other functions to perform simple operations on speech recognition.

['120', '20']

Process ended with exit code 0

4. Learn to control smart home appliances

Basic principle: Configure the code in arduino, connect the circuit, and transmit a certain signal to arduino through python under certain circumstances. When arduino receives the specified signal, it will perform the specified action.

1.Environment configuration:

pip install pyserial

2. Implement python control arduino

You can refer to my other article here:

python and arduino communication (windows and linux)_Leonard2021’s blog-blog_Raspberry Pi and arduino communication

In this way, the interaction between Python and Arduino is realized. Arduino can control many electrical appliances, such as lights, servos, fans, etc. Through the rotation of the servos, it can open and close doors, open and close the switches of various large electrical appliances, etc.;

Numerous sensor accessories for arduino can also be used forIntelligent voice systemProvide relevant data, such as: air humidity, temperature, etc., so that it can better control related electrical appliances and realize integrated control of smart homes.

There is a lot of room for imagination and development. Here I only control the LED light switch that comes with Arduino through voice. Other controls only need to be configured according to the same principle.

a.python code:

import serial # Import serial communication library
import time

def try2():
    ser = serial.Serial("COM3", 9600, timeout=1)
    c = ''
    while 1:

        wakeup_co() #Voice wake up
        rec() #Convert speech to wav file
        listenword = listen() #Convert the speech in the wav file into Chinese text 
        #These three are all given above. You need to name them yourself and import them.

        if 'light' in listenword and 'on' in listenword:
            c = '1'
        elif 'light' in listenword and 'off' in listenword:
            c = '0'
        if (c == '0'):
        if (c == '1'):


b.arduino code:

void setup(){
  pinMode(13,OUTPUT);//Set port 13 as the output port
char var;
void loop(){
      if(var == '0'){
      if(var== '1'){

5. Learn to speak

Here I am based onpyttsx3Realize text-to-speech, allowing smart terminals to learn to “speak”

1. Environment configuration:

pip install pyttsx3

2. Code implementation

import pyttsx3

def speakout(workText):
        # Initialize voice
        engine = pyttsx3.init()  # Initialize voice库
        # Set speech rate
        rate = engine.getProperty('rate')
        engine.setProperty('rate', rate - 50)
        # Output speech
        engine.say(workText) #Synthesize speech

Test result: I heard a slightly awkward female voice say “Hello”. You can also adjust the parameters here to make the voice sound more comfortable.

Here I give my complete code source. If you need it, you can get it yourself and use it with understanding:

Warm reminder, you need to connect to the Arduino version first, configure the port, and then run it, otherwise an error will be reported. You can also comment out the code for communication between Arduino and Python, and try other functions first.

Homemade smart voice learning documentation resources-download

If this article is helpful to you, welcome to connect with one click! ! !

Related Posts

Tutorial on creating python virtual environment based on Anaconda under Windows

How to convert list to string Python?

Alibaba Yu Jun: DingTalk should take the path of low-code practice

Basic tutorial for getting started with Django. (1. Overview and installation of Django)

How to understand Python classes and functions?

Python reads csv file

The relationship between Tensorflow and cuda versions (with multiple cuda versions installed)

Use Request in Python to implement HTTP requests (data, json, file, headers, timeout)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>