Lesson Modules
Teaching Tips:
Engage: Start with a real-world question: “Can Alexa see you?” Let students discuss. Then show NAO’s camera feed to visualize its “eyes.”
Demonstration: Run a quick scripted demo (if possible) where NAO identifies an object and speaks. Example: “What do you see?” → “I see a book.” Explain this is what they’ll build.
Connection: Review earlier lessons — vision in Lesson 4, speech in Lesson 6. Emphasize integration: we’re combining both today.
Setup Tip: Ensure NAO’s cameras and microphones are functional. Have a few objects ready that NAO can recognize from its database or Deep NAO model.
Virtual assistants like Siri and Alexa can listen and respond to your voice, but imagine if they could also see what’s around them! That’s what we’ll do today with NAO. We’ll give our robot the ability to use both its camera and microphones to understand the world.
NAO’s eyes are cameras that capture what’s in front of it, and its ears are microphones that pick up sounds. Today, we’ll connect those two senses so that when you ask, “What do you see?”, NAO will look through its camera and tell you what object it sees!
Watch a quick demo from your teacher (or video example) of NAO acting as a “mini AI assistant.” Think about how the robot might be combining vision and hearing to make that happen.
Teaching Tips:
Key Concepts: Reinforce how input (voice or image) leads to output (speech). Draw a simple data flow: Microphone → Speech Recognition → Vision → Text-to-Speech.
Class Discussion: Ask students why a robot might need both vision and speech. Guide toward real-world examples (e.g., a helper robot in a store, self-driving car listening and seeing).
Vocabulary Focus: Use flashcards or a quick matching activity for: Vision, Speech Recognition, Object Detection, AI Assistant.
NAO can recognize both sounds and images. To understand how that works, let’s review:
- Speech Recognition: NAO listens for specific words or phrases that we predefine. It can’t understand everything we say, but if we program a phrase like “what do you see,” it can react when it hears that exact phrase.
- Vision Recognition: NAO’s camera captures an image and compares it to stored examples or a trained model to recognize what’s there — for example, a book or cup.
- Integration: We combine both processes: when NAO hears the trigger phrase, it activates its camera, checks what it sees, and speaks a response like “I see a book.”
This combination of hearing and seeing is the basis of modern AI systems. Robots use multiple types of data to make sense of their environment — just like humans!
Teaching Tips:
Guided Build: Walk students through connecting boxes and understanding the flow: Speech → Vision → Speak.
Simplify for Beginners: Offer a prebuilt project with the Speech box connected and let them focus on editing the Python logic.
Troubleshoot Common Issues:
- NAO not responding: verify vocabulary matches exactly.
- NAO says “I’m not sure”: ensure the object is visible and lighting is good.
- NAO lagging: lower camera resolution (use QVGA).
Safety: Keep NAO stationary and clear of objects during testing.
Now it’s time to build your own AI assistant! Follow these steps carefully.
Step 1 – Set Up the Project
- Open Choregraphe and connect to your NAO robot using its IP address.
- Check that NAO’s language is set to English.
- Open the Video Monitor to confirm the camera feed is working.
- Choose one classroom object for NAO to identify (e.g., book, pen, bottle).
Step 2 – Add Speech Recognition
- Drag the Speech Recognition box into your workspace.
- Double-click and set the vocabulary to: ["what do you see"].
- Link its output to a Python Script box that will handle the vision part.
Step 3 – Program NAO’s Vision
In your Python box, paste this code (your teacher will help explain each part):
def onInput_onStart(self):
from naoqi import ALProxy
import vision_definitions
tts = ALProxy("ALTextToSpeech", "127.0.0.1", 9559)
cam = ALProxy("ALVideoDevice", "127.0.0.1", 9559)
mem = ALProxy("ALMemory", "127.0.0.1", 9559)
videoClient = cam.subscribeCamera("ai_assistant", 0, vision_definitions.kQVGA, vision_definitions.kBGRColorSpace, 10)
frame = cam.getImageRemote(videoClient)
cam.unsubscribe(videoClient)
# Replace this with your recognition function or module
object_label = recognize_object_from_image(frame)
if object_label:
tts.say("I see a " + object_label + "!")
else:
tts.say("I'm not sure what that is.")
Save and run your project. Then say clearly: “What do you see?”
Teaching Tips:
Differentiation: Let advanced groups try multiple objects or add LED reactions (ears glow when listening).
Encourage Creativity: Students can make NAO tell jokes, dance, or play sounds based on detected objects.
Assessment Tip: Have each group demo their assistant to the class and explain how they linked vision and speech.
Let’s make NAO even smarter! Add a new phrase and response to your project.
- Choose a second phrase, like “Is that a book?”
- In your code, compare what NAO hears with what it sees.
- If they match, make NAO say “Yes, it is.” If not, have it say “No, that’s a [object].”
- Test your new assistant by holding different objects and asking it questions!
Example code addition:
if heard_word == "book" and object_label == "book":
tts.say("Yes, I see a book.")
else:
tts.say("No, that’s a " + object_label)
Teaching Tips:
Debrief: Discuss real-world parallels — how assistants like Alexa or Pepper use similar technologies.
Data Literacy: Talk about why NAO might need training data or why recognition errors happen.
Ethics Check: Ask: “What if a robot like NAO was always watching and listening — is that okay?” Encourage responsible AI thinking.
Wrap-Up Message: “You’ve just built an AI system that listens and sees — the foundation of many real robots today!”
Let’s test and think about what you built today.
- What did NAO do when you said “What do you see?”
- How did NAO know which object it was looking at?
- What could make NAO get confused or say the wrong thing?
- How could we improve its accuracy next time?
Mini Quiz
- 1. Which NAO module lets it recognize words? (ALSpeechRecognition)
- 2. Which module helps NAO recognize objects? (ALVisionRecognition)
- 3. True or False: NAO can recognize any word you say. (False)
- 4. What is the benefit of combining vision and speech? (More interactive and intelligent behavior)