Why aren’t my IoT devices smarter, Or where is my robot helper at home?

We are surrounded by Internet of Things (IoT) devices, roughly 2.5 per human being on our planet. These do smart things like tell us when the package has been delivered to our doorstep (23% of us in the US own video doorbells) or telling us while cooking, for the n-th time, how many ounces is one cup (more than half of US households has a smart speaker or display). Robots do sci-fi acts, though mostly across the barrier of a computer screen. For many technology enthusiasts, and some normal people, it has been a wait rivaling Godot as to when the things in our home and workplace would become smarter to do our bidding and robots will arrive to take over the tedium of household chores. Intel had said in 2015 that the number of smart devices will grow to 200 billion by 2020 or about 26 smart objects for every human on Earth, missing the mark by more than an order of magnitude. For years, ads have been promising us emotionally attuned robots working alongside us in unscripted real-world settings, like this Superbowl ad from GM in 2007 showing a sensitive robot trying to fit in with humans. 

But the fact of matter is we are very far from being there as I point out in this recent article in the Indianapolis Business Journal. Today we have about 50 billion “smart” IoT devices, so more than 5 per human being on our planet, but it’s a fair bet that the farmer in rural Afghanistan is not frantically trying to configure his voice assistant. And robots to get me a fizzy drink from the fridge on command as I do something useful (like watch LOTR yet again) are only the material of publicity stunts still, like the Neo in the picture below. So why are we not there yet?

The helpful humanoid robot Neo by 1X Technologies does the boring stuff around the house while you do the essentials like yoga. Caveat: The beautiful greenspace around the house does not come with the robot. (Photo courtesy 1X Technologies) 

First, there is a legitimate question of does society want to get there. As much as it may seem superficially alluring to have a robotic butler, are we willing to sacrifice that human agency, along with all the attendant privacy risks that such a development comes with it? As thrilling it may sound from a sci-fi angle to have my fridge talk to my TV to order milk because the sensor senses that it has gone bad, do we really need that level of automation? If the widespread public antipathy to AI is anything to go by, society broadly is not willing to hand over the reign of such personal autonomy. There will of course be technology enthusiasts, the early adopters who will run out to get the latest gizmo just because it is there. But my gut feeling, and borne out by opinion polls, shows that there isn’t a wide enough swathe of society clamoring for that to make such developments commercially viable. One such survey of 1,000 US consumers in November 2025 found that 69% of respondents are unwilling to pay more than $5,000 for a home robot, far below the price point that is foreseeable even in the near future, and half of the respondents were concerned about safety of having robots co-exist in the same physical spaces.

The second reason is the lack of interoperability. This means that your fridge made by Company X cannot talk to your TV made by a different Company Y. This is because unlike in many other technology sectors, universal standards have not been widely adopted. Take the example of Wi-Fi, a game-changing standard initiated and maintained by the IEEE. Any part of the globe you go to, you have the certainty of knowing that if Wi-Fi is available there, then your Wi-Fi equipped device can connect to the network, irrespective of the manufacturer of your device. The situation with IoT devices is not nearly as smooth. There are pockets of standardization like in communication protocols among IoT devices (such as, LoRA and NBIoT) but for true interoperability you need standards across the stack and we are far from there. Where standards exist like mentioned above, there is a good deal of fragmentation too. 

The third reason is the lack of security and safety guarantees with these devices and robots. Since I work in this area, I could bore you to tears with security attacks, some already realized and many more hypothesized in research literature, and what we have been doing to harden systems against these attacks. A reasonable approximation to think of the current security here is consider that all the code on it runs akin to root level of privilege all the time. As someone who has the slightest exposure to cybersecurity knows, that is a colossally bad idea. Further, much of the code is written in C, which lacks “memory safety” —- in plain speak, security attacks are possible simply by sending large requests from the outside world and the outside world has many bad guys. For these applications, timing is important, not just that the function is completed, but that it is completed on time  —- if your robotic butler stops and puts the laundry down, but only a few seconds late, and after colliding with its human master, you would fire the butler. This means an attackers job becomes easier, it can launch a successful attack simply by delaying the operation. 

Credit: “A Director With Film Crew Directs A Robot” by Farley Katz, New Yorker Cartoons, April 6, 2015.

The final reason is the difficulty of training robots to operate in the “open world”, i.e., the physical world where all eventualities cannot be predicted and trained for. This is an active area of research in the AI and robotics communities. The bottleneck is that unlike with LLMs, where vast text data is available for training the models, video data is sparser, compared to the limitless variety of physical world settings where robots need to operate. The hope is that robots can be trained from the vast amounts of unstructured human video data that is available on the internet. (On YouTube alone, over 500 hours of video data are uploaded every minute, much of it doubtless with humans in them.) In the research literature, this solution approach is known as Learning from Demonstration and Visual Imitation Learning. There are still miles to go to achieve this promise, though much work is ongoing and gradual strides are being made. The challenges are the anatomical gap (robots and humans have different physical structures) and the visual domain gap (environments, camera angles, lightning conditions, etc. are different from what the videos have to where the robot has to function). 

To sum, we have IoT devices proliferating around us, at home and at work. Robots have seen adoption in controlled industrial settings, and halting use in home settings. They will continue to proliferate and may even have a sharp step-function increase. That will depend on three technical challenges and crucially, laying out a convincing use case for society at large. 

Leave a comment