Intoduction
WhatsApp’s Business API now supports calling, opening up new possibilities for integrating real-time voice communication directly into web applications. Imagine receiving a WhatsApp call not on your phone, but right in your browser. That’s what I set out to achieve and this article walks you through exactly how I did it, step-by-step.
We’ll explore the full architecture, backend and frontend code, and most importantly, the logic behind every part of the setup so you understand not just what to do, but why each step matters.
What is WebRTC?
Before jumping into the implementation, let’s understand WebRTC especially if this is your first time working with it.
WebRTC (Web Real-Time Communication) is a free, open-source technology that allows direct peer-to-peer communication between browsers (or between a browser and another WebRTC-compatible client).
Key Features of WebRTC:
- Real-time audio and video communication
- Peer-to-peer connection (without needing a media server)
- Encrypted by default (uses DTLS and SRTP)
- Used in video calls, screen sharing and online games
WebRTC Components You’ll Work With:
- RTCPeerConnection is the heart of WebRTC. It connects two endpoints and handles negotiation, media flow, and ICE candidates.
- MediaStream represents media captured from the user’s microphone or camera.
- SDP (Session Description Protocol) a text format describing how audio/video should be transmitted (used in negotiation).
- ICE (Interactive Connectivity Establishment) used to discover and establish the best network path between peers (even behind NATs).
Think of WebRTC like this:
It’s a way for two people (or apps) to talk directly without going through a middleman but they still need to say how they’ll talk (SDP), and where to find each other (ICE).
In our setup, we’re using WebRTC to:
- Capture microphone audio from the browser
- Send it to WhatsApp (via a Node.js bridge)
- Receive WhatsApp audio and play it in the browser
Let’s now see how this fits into our WhatsApp Calling integration.
System Overview
When a WhatsApp user calls your registered number, Meta sends a webhook event to your server. This webhook includes an SDP (Session Description Protocol) offer which contains information about how WhatsApp wants to establish a media connection.
To make the call accessible in the browser, here’s how the complete frontend-backend architecture with two-way communication works:
WhatsApp User
│
▼
Meta Webhook (call event with SDP offer)
│
▼
Backend Server
├── Receives webhook at /call-events in our case
├── Parses SDP offer
├── Emits 'call-is-coming' event via Socket.IO
│
▼
Browser (Frontend)
├── Displays incoming call UI
├── On Accept:
│ └── Starts WebRTC → sends SDP offer to server
▼
Backend Server
├── Stores browser SDP
├── Triggers bridge setup (startWebRTCCall)
├── Creates PeerConnections (wrtc)
├── Exchanges SDP answers with both browser & WhatsApp
├── Sends 'pre_accept' and then 'accept' to Meta API
▼
Media Exchange (2-way)
├── Browser microphone audio → WhatsApp user
└── WhatsApp voice audio → Played in browser via WebRTC
This architectural flow outlines how:
- WhatsApp communicates via webhooks
- The backend acts as a signalling and media bridge
- The browser becomes a live communication endpoint using WebRTC
WhatsApp Calling API Essentials Meta’s API doesn’t give you a direct voice stream instead, it follows a signalling-based approach:
- You receive events like
connectorterminatefrom WhatsApp. - These events contain metadata including the
call_idand session.sdp. - To continue the call, you must respond using WhatsApp’s
/callsendpoint with: pre_accept→ signals preparation to accept the callaccept→ starts the audio streamreject→ ends the call
This process mimics traditional SIP or VoIP call flow, where both parties exchange SDPs to agree on media parameters.
Backend Walkthrough
Let’s now look at how the server handles everything.
1. Receiving WhatsApp Webhook
Meta sends a POST request to your configured webhook endpoint whenever a call is initiated or ends. In my case, I’ve used /call-events.
app.post("/call-events", async (req, res) => {
const entry = req.body?.entry?.[0];
const call = entry?.changes?.[0]?.value?.calls?.[0];
const contact = entry?.changes?.[0]?.value?.contacts?.[0];
if (!call?.id || !call.event) return res.sendStatus(200);
currentCallId = call.id;
if (call.event === "connect") {
whatsappOfferSdp = call.session?.sdp;
io.emit("call-is-coming", {
callId: call.id,
callerName: contact?.profile?.name || "Unknown",
callerNumber: contact?.wa_id || "Unknown",
});
}
if (call.event === "terminate") {
io.emit("call-ended");
}
res.sendStatus(200);
});We extract the incoming SDP from WhatsApp’s webhook, save the call ID, and emit a call-is-coming event to the browser.
2. Waiting for Browser SDP Offer
Once the browser accepts the call, it creates a WebRTC offer and sends it via Socket.IO:
socket.on("browser-offer", async (sdp) => {
browserOfferSdp = sdp;
browserSocket = socket;
await maybeStartWebRTC();
});We store the browser’s SDP and wait until both browser and WhatsApp offers are available.
3. Connecting the Audio Streams
The initiateWebRTCBridge function is responsible for wiring everything together and acts as the core logic for media bridging.
What does initiateWebRTCBridge do?
- Checks if both browser and WhatsApp SDP offers are available.
- Creates two separate RTCPeerConnection instances: one for the browser and one for WhatsApp.
- Sets the WhatsApp SDP as the remote description for the WhatsApp peer.
- Sets the browser SDP as the remote description for the browser peer.
- Adds the audio tracks from one peer to the other.
- Creates SDP answers for both sides.
- Sends pre_accept followed by accept to WhatsApp using the generated answer.
- Establishes the audio flow between browser and WhatsApp user.
This function acts as the bridge coordinator, ensuring media negotiation and audio routing between two isolated WebRTC peers.
4. Answering WhatsApp Call via API
Once everything is ready, we respond to WhatsApp with this helper:
function answerCallToWhatsApp(callId, sdp, action) {
return axios.post(
WHATSAPP_API_URL,
{
messaging_product: "whatsapp",
call_id: callId,
action,
session: { sdp_type: "answer", sdp },
},
{
headers: { Authorization: ACCESS_TOKEN },
},
);
}This sends either the pre_accept or accept command to Meta.
Frontend Walkthrough (Browser)
The browser connects via Socket.IO, listens for incoming calls, and handles WebRTC setup.
1. Showing the Incoming Call UI
socket.on("call-is-coming", ({ callId, callerName }) => {
incomingCallId = callId;
document.getElementById("caller-name").textContent = callerName;
showModal();
});A modal UI appears when a new WhatsApp call comes in.
2. Starting WebRTC After Accepting
Once the user accepts the call:
async function startWebRTC() {
pc = new RTCPeerConnection({ iceServers: [STUN] });
pc.ontrack = (e) => {
const audio = new Audio();
audio.srcObject = e.streams[0];
audio.autoplay = true;
document.body.appendChild(audio);
};
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
stream.getTracks().forEach((track) => pc.addTrack(track, stream));
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
socket.emit("browser-offer", offer.sdp);
}What this does: Captures microphone input, prepares WebRTC connection, and sends the browser SDP to the server.
Common Pitfalls & Debugging Tips
- a=setup:actpass must be changed to a=setup:active in WhatsApp SDP answers.
- If audio doesn’t flow, check ICE candidate exchange and TURN/STUN setup.
- If accept is sent before pre_accept, the API will reject the call.
- You must ensure the SDP answers are fully formed and accepted on both ends.
Final Thoughts
This project currently demonstrates only support the User initiated WhatsApp calls. The business initiated WhatsApp calls will be added to the GitHub repository at later stage. That said, this project serves as a simplified but complete demonstration of how one can implement real-time WhatsApp voice call reception in a browser using WebRTC and Node.js. It’s meant to act as a foundational template for building more production-ready systems.
By combining WebRTC with the WhatsApp Calling API, we can build entirely new browser-based telephony experiences. This setup allows any user on your platform to receive real WhatsApp calls without ever touching their phone.
