e-NT – Sticking GPT-3 into an ENT

The General Gist

So this is a project for Dublin Maker 2023. The idea that a group of us have come up with is basically this (so far):

  • Make an animatronic ENT face (yes the massive tree people from lord of the rings)
  • In the eye of this ENT place a camera that can recognise people and look at them
  • The face will also have a microphone and speaker so people can speak to it and get a response
  • The system responding to the questions is Chat GPT
Treebeard from LOTR

The GPT-3 Davinci Language Model API

The bit of the project I’ve been working on for the past few days is the voice recognition and sending that to a GPT-3 language model. As of time of writing this post there is no API for ChatGPT but it’s coming soon according to the openai website so when that comes out we can easily switch over.

Using the Davinci languge model in Python is extremely easy thankfully. All I had to do was make an account on https://openai.com, request an API key and download one of their Python examples and modify it a bit.


import os
import openai
import json

openai.api_key = "insert-api-key-here"

question = input()

promptQ = "Respond to the question " + "'" +question +"'" + " in the style of an ent from lord of the rings"


response = openai.Completion.create(
  model="text-davinci-003",
 
  prompt=promptQ,
  temperature=0,
  max_tokens=200,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)

openaidict = json.loads(json.dumps(response))

answer =(openaidict["choices"][0]["text"])
print("\n")
print("Answer: "+ answer)

Here are some examples:

Kind of cool that it even knows what the Dublin Maker Faire is but the output does not really sound like something an ENT would say. To be fair asking an ENT what any event that only takes place in a technologically advanced civilization probably wont sound right.


Asking it for directions sounds a bit more ENT like:


It’s also pretty good at giving life advice:

Asking it about climate change does not feel very ENT life apart from the middle-earth reference:

The settings of the model can be adjusted. In particular the parameters mentioned here in the API guide:

So I’ll have a play around with those and see if we can get things sounding more ent like but for now at least we have a basic working version of a script that can be adjusted later.

Python Voice to Text Script

Again here this script was found within a few minutes of googling and modified a bit. The really cool thing is that it uses the google voice recognition API and it seems way better than any offline python voice recognition library I’ve used in the past.


# Python program to translate
# speech to text and text to speech
 
 
import speech_recognition as sr
import pyttsx3
 
# Initialize the recognizer
r = sr.Recognizer()
for line in (sr.Microphone.list_microphone_names()):
    print(line)
 
# Function to convert text to
# speech
def SpeakText(command):
     
    # Initialize the engine
    engine = pyttsx3.init()
    engine.say(command)
    engine.runAndWait()
     
     
# Loop infinitely for user to
# speak
 
while(1):  
     
    # Exception handling to handle
    # exceptions at the runtime
    try:
         
        # use the microphone as source for input.
        with sr.Microphone(device_index=1) as source2:
             
            # wait for a second to let the recognizer
            # adjust the energy threshold based on
            # the surrounding noise level
            r.adjust_for_ambient_noise(source2, duration=0.2)
             
            #listens for the user's input
            audio2 = r.listen(source2)
             
            # Using google to recognize audio
            MyText = r.recognize_google(audio2)
            MyText = MyText.lower()
 
            print("Did you say ",MyText)
            SpeakText(MyText)
             
    except sr.RequestError as e:
        print("Could not request results; {0}".format(e))
         
    except sr.UnknownValueError:
        print("unknown error occurred")

Here is a video showing a test of this script. I just read out a recipe from a cookbook and moved further and further away from the microphone to see how good the voice recognition was at distance. The furthest point I spoke from in the video is 3.5 meters.

So the results here are pretty good. It looks like this Python library will do what we want in letting people ask the ENT face some questions. I also did some tests playing background noise and it decreased the working range a bit but the noise rejection is quite good also. In real life if someone is asking a question they are probably going to stand in front of the mask anyways.

Getting the voice to sound like an ENT will require a bit of fine tuning. The idea to use a british accent and slow it down has been suggested so I can try that and see how it sounds.

Speaking with GPT-3

Combining the two scripts above lets us have a back and forth with GPT-3 using voice.


import os
import openai
import json
import speech_recognition as sr
import pyttsx3

openai.api_key = "insert-api-key-here"





def ask_gpt3(question):
    promptQ = "Respond to the question " + "'" +question +"'" + " in the style of an ent from lord of the rings"
    response = openai.Completion.create(
      model="text-davinci-003",
     
      prompt=promptQ,
      temperature=0,
      max_tokens=200,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0
    )
    openaidict = json.loads(json.dumps(response))
    answer =(openaidict["choices"][0]["text"])
    return answer


 
# Initialize the recognizer
r = sr.Recognizer()
for line in (sr.Microphone.list_microphone_names()):
    print(line)
 
# Function to convert text to
# speech
def SpeakText(command):
     
    # Initialize the engine
    engine = pyttsx3.init()
    engine.say(command)
    engine.runAndWait()
     
     
# Loop infinitely for user to
# speak
 
while(1):  
     
    # Exception handling to handle
    # exceptions at the runtime
    try:
         
        # use the microphone as source for input.
        with sr.Microphone(device_index=1) as source2:
             
            # wait for a second to let the recognizer
            # adjust the energy threshold based on
            # the surrounding noise level
            r.adjust_for_ambient_noise(source2, duration=0.2)
             
            #listens for the user's input
            audio2 = r.listen(source2)
             
            # Using google to recognize audio
            MyText = r.recognize_google(audio2)
            MyText = MyText.lower()
           
            GPT_reply = ask_gpt3(MyText)
            print("GPT-3 Says: ",MyText)
            SpeakText(GPT_reply)
             
    except sr.RequestError as e:
        print("Could not request results; {0}".format(e))
         
    except sr.UnknownValueError:
        print("unknown error occurred")
    except error as e:
        print(e)

The shorter and simpler the question the faster it responds. For most normal complexity questions it takes a few seconds to reply.


Leave a Reply

%d bloggers like this: