Thinking Machines Lab has unveiled a research preview of its new interaction models, a class of multimodal AI systems designed to respond in real time while continuously processing text, audio, and video inputs.
TL;DR
- Thinking Machines introduced interaction models, a new AI architecture built for real-time collaboration.
- The system is designed to listen, talk, watch, show, and think at the same time.
- Its TML-Interaction-Small model reportedly responds in 0.40 seconds, close to natural conversation speed.
- A limited research preview is expected in the coming months, followed by a wider release later this year.
Thinking Machines Moves Beyond Turn-Based AI
Thinking Machines Lab, the artificial intelligence startup founded by former OpenAI CTO Mira Murati, has revealed a new approach to human-AI interaction that aims to make conversations with AI feel less like messaging and more like natural collaboration.
The company announced a research preview of interaction models, which are native multimodal systems designed to continuously take in audio, video, and text while thinking, responding, and acting in real time. Thinking Machines said current AI systems usually operate in turns, where the user finishes speaking or typing before the model begins responding.
The company argues that this limits collaboration, since today’s models often stop perceiving new input while generating an output. Thinking Machines described this as a narrow channel for human-AI collaboration, comparing it to trying to resolve a serious disagreement over email instead of in person.
How Do Interaction Models Work?
Unlike conventional models that rely on external software systems to handle interruptions, turn-taking, or voice detection, Thinking Machines wants interactivity to be part of the model architecture itself.
The company said most existing commercial real-time speech systems use voice activity detection to determine when a user has stopped speaking. However, Thinking Machines believes such hand-crafted systems will be outpaced as general AI capabilities improve, which is why it is building interactivity directly into the model.
SiliconANGLE reported that the new architecture uses a multistream, micro-turn-based design instead of a standard alternating token sequence. The model processes inputs and outputs in small 200-millisecond chunks, allowing it to react to visual or auditory cues even while it is already speaking.
The result is a system built for full-duplex communication, meaning it can listen, see, and talk at the same time. TechCrunch reported that Thinking Machines’ TML-Interaction-Small model responds in 0.40 seconds, which it said is roughly the pace of natural human conversation and faster than comparable models from OpenAI and Google.
What Can The Model Do?
Thinking Machines said interaction models unlock capabilities that would otherwise need to be handled by a separate software layer. These include seamless dialog management, verbal interjections, and visual interjections that happen when the context calls for it, not only when a user finishes speaking.
The Verge noted that Thinking Machines demonstrated examples such as listening for animal mentions in a story, translating speech in real time, and warning a person when they are slouching. These use cases show how the system could support real-time tutoring, accessibility, translation, collaboration, and workplace assistance.
VentureBeat also described the preview as an attempt to move AI beyond turn-based chat, where a human sends input and waits for the model to produce output. The publication noted that the model is meant to respond more fluidly while still processing the next human input.
Topics for more insights:
When Will It Be Available?
For now, this is not a public product release. TechCrunch reported that Thinking Machines is preparing a limited research preview in the next few months, with a wider release expected later this year.
The Verge also confirmed that users cannot try the interaction models yet, though Thinking Machines plans to open access through a limited preview before expanding availability.
Why Does This Matter?
Thinking Machines’ announcement comes as AI companies race to make assistants more useful in live, complex, and multimodal environments. Today’s AI systems are powerful, but the interaction pattern often feels rigid, especially when users need to interrupt, clarify, point, gesture, or switch between voice and visuals.
By making real-time interactivity part of the model itself, Thinking Machines is betting that AI assistants will become more helpful in situations that require continuous awareness. That could include learning, coding, healthcare support, design, robotics, and enterprise workflows.
Still, the company’s claims will need to be tested outside demos. As TechCrunch noted, the benchmarks and idea are interesting, but real-world performance cannot be judged until users can actually try the system.


Join The Discussion