WebRTC "polyfill" over realtime channels (webxdc-x-WebRTC)

TL;DR let’s make a library that makes apps that rely on WebRTC work.

Me and @adbenitez had this idea.
The apps themselves wouldn’t have to utilize any webxdc APIs, it will be hidden behind the WebRTC polyfill we’ll have written.

Below are mostly my implementation thoughts and ideas.

webxdc realtime channels are a broadcast channels, so all peers receive the message you send, and vise versa, plus there is no metadata associated with each webxdc message.
So, the problem can be more broadly described as “WebRTC polyfill over a broadcast channel”. (So perhaps we could look for existing polyfills?)
So this means that we’ll have to:

  • implement IP (internet-protocol)-like network layer with addresses for each peer.
  • reliability layer, for RTCDataChannels, which can be required to be ordered and/or reliable by the WebRTC API user.
  • implement WebRTC things on top of it

WebRTC, for those who don’t know, only exposes API (RTCPeerConnection) that allows you to make a P2P connection with one other peer, and you can make as many of those as you want, making “networks of peers” is totally up to the developer.

WebRTC, in short, includes:

  • Signaling. Basically, when two peers want to connect, they need to exchange two messages that contain their IP address and other info, over an off-band channel.
    Looks like we’ll need to use these messages to figure out which RTCPeerConnection to pair with which other RTCPeerConnection of another peer.
  • RTCDataChannels, there can be thousands of them per one RTCPeerConnection. Data channels have an id associated with them, and a label, and, as mentioned, they can be specified to be reliable and/or ordered.
  • addTrack() to transfer MediaStreamTracks, which are usually video from the user’s camera. We could try to polyfill this like I did in I made a video call app (WITHOUT 15-second ping this time), with MediaRecorder, though in WebRTC there really is a lot of stuff, such as adaptive media quality, jitter buffering, various stats being exposed, so this might be hard to achieve good polyfilling for video streams.

It is important to remember that WebRTC leaves the signaling up to the WebRTC API user, i.e. it is the developer’s responsibility to transmit the signaling messages, and as such, each app implements their own signaling means. Usually signaling is facilitated by the service’s HTTP / WS server, such as Jitsi’s server.

This means that apparently apps won’t “just work” if we simply import this polyfill and pack them into webxdc.
But there are universal signaling servers, such as PeerJS, which I also used in the webxdc-x-web project, so maybe we could start with making a PeerJS polyfill instead of a WebRTC polyfill.

However, about signaling again, I suspect there might be a way to pair RTCPeerConnections without even using a signaling server. We could do this by spying on RTCPeerConnections.setRemoteDescription() and RTCPeerConnections.setLocalDescription() calls… Might need to think about this some more.


Of course, this can be implemented gradually. For example, we could skip the reliability layer and just pretend that we support reliable RTCDataChannels.

What can benefit from this (is this enough to justify effort? perhaps we need to look for concrete examples where this could help):

  • voice call apps
  • various CRDTs that sync over WebRTC, such as y-webrtc

Related:

4 Likes

Well, how is the development going here? It’s time to introduce this technology, let’s finally get rid of the third-party video chat jitsi meet. You send the Webxdc application and here are group calls for you, at least voice calls to begin with.

I personally haven’t worked on this.

What the original post proposes is not required per se to make a video call webxdc app. It would only simplify porting existing WebRTC apps.

The biggest obstacle for a working video call app IMO is the lack of permissions for webxdc apps: Allow access to camera, geolocation, other Web APIs.

1 Like