This is accomplished through what is known as WiFi sensing, or the use of WiFi signals to infer information about a physical environment. When radio signals like WiFi travel through a space, they interact with the objects and people around them. Those signals can be reflected, scattered, or absorbed. By analyzing how the signal is expected to behave compared with how it is actually received, researchers can infer details about the surrounding environment.
I kind of get this idea. It's just sensing the strength of radio waves with some kind of antenna array. Like pixels on an optical sensor. It's not really a new idea, I sort of recall seeing video of this kind of thing from like 10 years ago.
But the paper suggests this relies on higher level protocol information.
Beamforming, as introduced in WiFi 5, requires clients to broadcast observations of their channel characteristics. This introduces a new information source for WiFi sensing with privacy threats that have not been explored, so far. With WiFi networks being ubiquitous in our everyday lives, the impact of unknown privacy threats is likely severe. To investigate this concern, we introduce BFId, the first identity inference attack using BFI-based sensing and evaluate its efficacy on a novel dataset containing WiFi recordings of 197 individuals. We show that we can infer the identity of individuals with very high accuracy, across different walking styles and perspectives, even with large sample sizes.
[...]
Identity inference based on WiFi can be done by analyzing different sources, but most prominent in recent years has been the analysis of Channel State Information (CSI), a built-in part of the physical layer of WiFi. [...] To enable higher bandwidths, WiFi 5 (802.11ac) introduced beamforming. Beamforming utilizes similar information on the physical environment as CSI, but on the sender instead of the receiver side.In a typical WiFi scenario, clients send Beamforming Feedback Information (BFI) back to the access point, a compressed representation of the current signal characteristics
This makes it sound kind of like a way to fingerprint signals, but they also mention:
We show that individuals can be recognized with very high accuracy (99.5% ± 0.38) with our BFI-based attack. Furthermore, BFId is not only able to infer the identity of individuals, our experiments also demonstrate that in a direct comparison it is able to do so better than CSI-based attacks for large populations. This also holds for identifying individuals across walking styles, from multiple different perspectives, and at reduced sample rates.
But if we're talking about walking styles, it sounds more like "regular" wifi sensing again.
For this, the standard defines a channel sounding procedure (shown in Figure 1) which is initiated by the access point (beamformer) regularly through a null data packet (NDP) announcement frame. Beamformees will reply to this announcement. The actual NDP is then sent by the access point which contains one VHT-LTF (very high throughput long training field) per spatial stream used in the transmission. Beamformees will then use the CSI of these VHT-LTFs to calculate so-called feedback matrices for each subcarrier. The feedback matrix is compressed into beamforming angles which are sent back to the beamformer.The beamformer can then calculate a steering matrix which can be used to direct the transmission towards the beamformee [6, 16].
So I guess it's kind of like sensing from multiple observers?
The vast majority of approaches use recordings of gait sequences for identification, but there are exceptions, e.g. using lip-motions [ 42], keystrokes [ 18 ] or no moving at all, but the individual just standing [53 ] or sitting [51 ]. When usinggait, most approaches record individuals while walking orthogonally to the line-of-sight (LOS) between sender and receiver. Twoearly approaches had participants walk parallel to the LOS [26, 71 ] and some approaches have opted to have participants walk freely,but only one approach considered multiple perspectives [76].
Yeah, I guess the novel thing in this paper is using information from this protocol to essentially get a bunch more sensors. Though I'm still shocked they can put together a coherent picture to get fucking lip movements with all this.
Also, WiFi has gotten so much more complicated since 802.11b. Feels like I forgot to pay attention.