WebRTC "polyfill" over realtime channels (webxdc-x-WebRTC)

TL;DR let’s make a library that makes apps that rely on WebRTC work.

Me and @adbenitez had this idea.
The apps themselves wouldn’t have to utilize any webxdc APIs, it will be hidden behind the WebRTC polyfill we’ll have written.

Below are mostly my implementation thoughts and ideas.

webxdc realtime channels are a broadcast channels, so all peers receive the message you send, and vise versa, plus there is no metadata associated with each webxdc message.
So, the problem can be more broadly described as “WebRTC polyfill over a broadcast channel”. (So perhaps we could look for existing polyfills?)
So this means that we’ll have to:

  • implement IP (internet-protocol)-like network layer with addresses for each peer.
  • reliability layer, for RTCDataChannels, which can be required to be ordered and/or reliable by the WebRTC API user.
  • implement WebRTC things on top of it

WebRTC, for those who don’t know, only exposes API (RTCPeerConnection) that allows you to make a P2P connection with one other peer, and you can make as many of those as you want, making “networks of peers” is totally up to the developer.

WebRTC, in short, includes:

  • Signaling. Basically, when two peers want to connect, they need to exchange two messages that contain their IP address and other info, over an off-band channel.
    Looks like we’ll need to use these messages to figure out which RTCPeerConnection to pair with which other RTCPeerConnection of another peer.
  • RTCDataChannels, there can be thousands of them per one RTCPeerConnection. Data channels have an id associated with them, and a label, and, as mentioned, they can be specified to be reliable and/or ordered.
  • addTrack() to transfer MediaStreamTracks, which are usually video from the user’s camera. We could try to polyfill this like I did in I made a video call app (WITHOUT 15-second ping this time), with MediaRecorder, though in WebRTC there really is a lot of stuff, such as adaptive media quality, jitter buffering, various stats being exposed, so this might be hard to achieve good polyfilling for video streams.

It is important to remember that WebRTC leaves the signaling up to the WebRTC API user, i.e. it is the developer’s responsibility to transmit the signaling messages, and as such, each app implements their own signaling means. Usually signaling is facilitated by the service’s HTTP / WS server, such as Jitsi’s server.

This means that apparently apps won’t “just work” if we simply import this polyfill and pack them into webxdc.
But there are universal signaling servers, such as PeerJS, which I also used in the webxdc-x-web project, so maybe we could start with making a PeerJS polyfill instead of a WebRTC polyfill.

However, about signaling again, I suspect there might be a way to pair RTCPeerConnections without even using a signaling server. We could do this by spying on RTCPeerConnections.setRemoteDescription() and RTCPeerConnections.setLocalDescription() calls… Might need to think about this some more.


Of course, this can be implemented gradually. For example, we could skip the reliability layer and just pretend that we support reliable RTCDataChannels.

What can benefit from this (is this enough to justify effort? perhaps we need to look for concrete examples where this could help):

  • voice call apps
  • various CRDTs that sync over WebRTC, such as y-webrtc

Related:

4 Likes

Well, how is the development going here? It’s time to introduce this technology, let’s finally get rid of the third-party video chat jitsi meet. You send the Webxdc application and here are group calls for you, at least voice calls to begin with.

I personally haven’t worked on this.

What the original post proposes is not required per se to make a video call webxdc app. It would only simplify porting existing WebRTC apps.

The biggest obstacle for a working video call app IMO is the lack of permissions for webxdc apps: Allow access to camera, geolocation, other Web APIs.

1 Like

There is a newer browser standard to replace WebRTC. MOQ Media over Quic. If you are just using data channels then life will be much better with webtransport:

Media over Quick is not a replacement for WebRTC. From Replacing WebRTC - Media over QUIC itself:

If you primarily use WebRTC for…
peer-to-peer: you’re stuck with WebRTC for the forseeable future.

News: there is now a project that took this approach, i.e. partially polyfilling WebRTC API using webxdc transport. That is the new Quake III Arena app (source code). (See also Mastodon posts 1, 2).

As I anticipated, the more painful part was actually mocking signaling, and not WebRTC itself. I haven’t fully mocked a signaling server, and just hard-coded its expected responses. You can find the implementation here.

The WebRTC mock itself can be found here. It uses webxdc.joinRealtimeChannel(), and implements UDP-like transport, with source / destination address, where each peer (webxdc app instance) has one and only one randomly generated 4-byte address.

Note that this implementation is not a full WebRTC polyfill, and it only works in this particular app’s case. This is because Quake III is central-server-based i.e. each client only communicates with the central server and not other peers (the “star” network topology). So, all the “signaling” we have to do is just figuring out the address of the server (which is done here).
Contrarily, in a full, proper WebRTC polyfill, each data channel would have to have a unique address, because one webxdc app instance can have several data channels. So, in order for a data channel to know the address of its other end, we would have to take an approach that I mentioned here:

1 Like

Starting to build IP-like addressing and routing and p2p channels on top of the realtime channels is a little bit odd. The realtime channels are broadcast channels, which are built using an iroh-gossip gossip swarm, which is built on top of iroh p2p QUIC connections, (which is again on top of IP).

If webxdc apps would like to access direct p2p channels it would make much more sense to extend the webxdc API to allow exposing direct p2p channels, rather than only a broadcast channel. It could do this in deltachat using iroh p2p QUIC connections directly. That would be a much more sensible point to start thinking about a polyfill for WebRTC DataChannels.

That’s only about message transport though. I think the media stuff from addTrack() is probably an entirely different effort.

1 Like

You have a point. As long as Delta Chat members are open to the idea of introducing new API, it would probably make sense to introduce one-on-one (non-broadcast) channels first, and only then think about polyfilling WebRTC.