Siri can be used to query and control accessories, and to activate scenes. Minimal information about the configuration of the home is provided anonymously to Siri, to provide names of rooms, accessories, and scenes that are necessary for command recognition. Audio sent to Siri may denote specific accessories or commands, but such Siri data isn’t associated with other Apple features such as HomeKit.
Users can enable new features like Siri, and other HomePod features like timers, alarms, intercom, and doorbell, on Siri-enabled accessories using the Home app. When these features are enabled, the accessory coordinates with a paired HomePod on the local network that hosts these Apple features. Audio is exchanged between the devices over encrypted channels using both HomeKit and AirPlay protocols.
When Listen for Hey Siri is turned on, the accessory listens for the “Hey Siri” phrase using a locally running trigger-phrase detection engine. If this engine detects the phrase, it sends the audio frames directly to a paired HomePod using HomeKit. The HomePod does a second check on the audio and may cancel the audio session if the phrase doesn’t appear to contain the trigger phrase.
When Touch for Siri is turned on, the user can press a dedicated button on the accessory to start a conversation with Siri. The audio frames are sent directly to the paired HomePod.
After a successful invocation of Siri is detected, the HomePod sends the audio to Siri servers and fulfills the user’s intent using the same security, privacy, and encryption safeguards that the HomePod applies to user invocations made to the HomePod itself. If Siri has an audio reply, then Siri’s response is sent over an AirPlay audio channel to the accessory. Some Siri requests require additional information from the user (for example, asking if the user wants to hear more options). In that case, the accessory receives an indication that the user should be prompted, and the additional audio is streamed to the HomePod.
The accessory is required to have a visual indicator to signal to a user when it’s actively listening (for example, an LED indicator). The accessory has no knowledge of the intent of the Siri request, except for access to the audio streams, and no user data is stored on the accessory.