Editor's note
Air-ground integration combines airborne communication and terrestrial networks to extend coverage and enable efficient data transport. The rise of commercial space, advances in onboard satellite technology, and reductions in satellite manufacturing and launch costs have accelerated interest in air-ground integrated systems. With evolving 5G/6G visions and ITU frequency planning for direct-to-device satellite services, satellite-enabled public handheld voice services have gained attention for emergency and rescue scenarios, where reliable voice is essential.
1. Requirements and current status
Requirements overview
Satellite communication offers wide-area coverage, strong anti-interference capability, and the ability to provide independent emergency communication when terrestrial infrastructure is absent or damaged. In rescue and emergency scenarios, both parties need to establish low-latency, reliable real-time interaction quickly, making satellite voice capability essential. In disaster response, satellite voice enables flexible command and dispatch systems. For outdoor emergencies, the public can use mobile devices with satellite voice to request assistance.
Compared with legacy satellite mobile systems such as GMR (geostationary mobile radio interface), 5G NTN has advantages in standards evolution, industry momentum, and satellite-terrestrial integration. However, the Chinese market currently has no commercially available mobile communication satellite resources that meet NR NTN requirements under current standards in terms of bandwidth and link budget. This has focused industry attention on implementing satellite voice using IoT NTN (non-terrestrial network access for IoT devices).
Current status and feasibility
In recent years, IoT NTN applications have progressed rapidly in China and internationally. Internationally, maritime satellite operators and chipset vendors have conducted two-way satellite trials based on IoT NTN and announced plans to enable two-way satellite communication for smartphones, IoT devices, and vehicles. In China, research institutes, equipment manufacturers, and satellite operators have jointly verified satellite-terrestrial convergence using 3GPP R17, achieving end-to-end IoT NTN technical validation across chips, terminal modules, and network equipment. Current IoT NTN deployments primarily use short-message and IoT-first modes; once satellite voice is supported, a consumer-facing emergency communication network based on NTN could emerge.
To support user voice needs during 5G NTN evolution, the industry has researched IoT NTN voice enhancement and proposed three solution approaches. Solution A adopts a multi-domain integration concept with an added signaling gateway to reduce terminal-satellite signaling exchanges and conserve satellite resources. Solution B uses a WebRTC-based architecture with custom interfaces to tailor voice protocols and improve signaling efficiency. Solution C optimizes IMS signaling by trimming SIP/SDP procedures and fields to shorten terminal-IMS interaction latency and increase efficiency.
Current research indicates that optimizing existing terrestrial mechanisms and reducing satellite-terrestrial voice signaling overhead are fundamental to enabling voice on IoT NTN. Since terrestrial 4G/5G voice uses an IMS architecture, compatibility with terrestrial voice services motivates industry efforts to develop IMS-optimized voice solutions, including network and chip customization, streamlined signaling flows, and optimized low-rate voice codecs to better support IoT NTN voice.
2. Key technologies for satellite voice over NTN
Network architecture optimization
Given the radio characteristics of IoT NTN systems, meeting voice service requirements during satellite network evolution can leverage proven terrestrial cellular architectures and deployment practices to minimize satellite–terrestrial signaling overhead. By optimizing the IMS architecture within the satellite network, interoperability between satellite and terrestrial voice domains can be achieved. These IMS core nodes, satellite gateways, and NTN base stations are typically implemented on high-reliability telecommunication PCB, ensuring stable operation under high throughput, long latency, and harsh environmental conditions.
For signaling interaction, a lightweight IMS SIP signaling between the UE and IMS may be used, with an IMS enhancement that converts between simplified SIP and standard SIP. Standard SIP signaling is maintained between the IMS network and terrestrial network elements.
For voice media transport, low-rate voice codecs should be deployed in the UE and IMS network gateway to support low-bit-rate voice streams. Between the UE and the IMS gateway, voice uses the low-rate codec; at the IMS gateway, conversions between low-rate and standard voice codecs are performed.
User plane voice bearer solution
IoT NTN radio supports control plane (CP) and user plane (UP) transmission modes. CP mode carries data via NAS signaling and suits sporadic small-packet IoT traffic; UP mode uses normal data radio bearers (DRBs) for sustained data. In current IoT NTN, UP mode supports only two DRBs and cannot simultaneously carry data and voice. Since voice is important for mobile satellite services, this article considers implementing three bearer types on the UP plane—data bearer, voice signaling bearer, and voice media bearer—that can switch automatically according to active services.
Enhancing base station behavior is required: network-triggered conditions can cause the base station to release a data bearer and establish a voice bearer. Because the UE may receive a network-triggered bearer release and initiate a TAU (tracking area update), this also requires partial UE enhancements to avoid excessive signaling. When a user ends a call, the network should be able to revert to the data bearer.
IMS SIP simplification
1. Principles for IMS SIP simplification
NTN air-interface resources are constrained and standard IMS SIP messages are verbose, increasing call setup time and reducing call success rate. Optimizing call setup by shortening signaling messages and reducing transaction length is a primary approach to enabling voice over IoT NTN. Because terrestrial networks primarily use VoNR and VoLTE, and to support interworking with 5G NTN and smooth migration, an IMS-optimized voice architecture should be prioritized.
Signaling optimizations can include, but are not limited to, the following principles: first, encode SIP header names using abbreviated forms where defined, without inventing custom abbreviations for undefined headers; second, allow the UE to omit certain parameters in SIP messages to the IMS, with the IMS functions inserting those parameters later; third, use simplified IPv6 address encodings in SIP/SDP messages sent by the UE; fourth, omit some signaling steps for selected voice sessions based on priority or real-time resource status.
In addition to SIP simplification, introduce an enhanced voice signaling gateway in the terrestrial network to support negotiation and conversion of satellite voice codecs, hiding media differences between the UE and IMS network under the simplified SIP scheme.
Because NTN bandwidth is limited and latency is higher, SIP signaling must be adapted to these characteristics to simplify interaction complexity and reduce delay. To implement SIP message simplification over IMS, both the UE and certain IMS network functions require enhancements to mask signaling differences while preserving user call experience and improving network interaction efficiency.
2. Latency analysis
Transmission latency for IoT NTN voice includes codec processing delay at the UE, baseband delay, satellite air-interface latency, base station protocol stack delay, core-to-IMS user plane delay, IMS user-plane codec conversion delay, and large-network processing and terminal handling delays. Due to constrained air-interface bandwidth and long-range links for high-orbit satellites, uplink latency concentrates in the codec and satellite air link, accounting for roughly 80% of the total; downlink latency concentrates in the satellite air link and IMS voice conversion process, also accounting for about 80%.
Analysis shows that packet compression, codec optimization, and coding improvements can effectively reduce transport latency for voice packets in satellite access scenarios and improve the user experience for NTN voice calls.
3. Conclusion and outlook
Based on 3GPP research and standardization, this article proposes an IoT NTN converged networking architecture and optimization directions to support voice, taking into account constrained satellite resources and higher latency. It presents signaling simplification and optimization measures to reduce satellite-terrestrial signaling consumption and maintain voice quality in satellite access scenarios. Remaining challenges include adapting and optimizing protocols to high air-interface latency, enhancing network element functions and performance, and mitigating Doppler-induced effects on voice signal quality and stability through algorithmic and codec improvements.
Research on next-generation real-time voice is underway, driven by technological innovation, user demand, and network evolution. NTN will be a key technology for integrated space-terrestrial connectivity and pervasive links, forming an important foundation for ubiquitous, high-quality real-time communication services.