PeerConnection::Close() is, per-spec, a blocking operation.
Unfortunately, PeerConnection is implemented to own resources used by
the network thread, and Close() - on the signaling thread - destroys
these resources. As such, tasks run in parallel like getStats() get into
race conditions with Close() unless synchronized. The mechanism in-place
is RTCStatsCollector::WaitForPendingRequest(), it waits until the
network thread is done with the in-parallel stats request.
Prior to this CL, this was implemented by performing
rtc::Thread::ProcessMessages() in a loop until the network thread had
posted a task on the signaling thread to say that it was done which
would then get processed by ProcessMessages(). In WebRTC this works, and
the test is RTCStatsIntegrationTest.GetsStatsWhileClosingPeerConnection.
But because Chromium's thread wrapper does no support
ProcessMessages(), calling getStats() followed by close() in Chrome
resulted in waiting forever (https://crbug.com/850907).
In this CL, the process messages loop is removed. Instead, the shared
resources are guarded by an rtc::Event. WaitForPendingRequest() still
blocks the signaling thread, but only while shared resources are in use
by the network thread. After this CL, calling WaitForPendingRequest() no
longer has any unexpected side-effects since it no longer processes
other messages that might have been posted on the thread.
The resource ownership and threading model of WebRTC deserves to be
revisited, but this fixes a common Chromium crash without redesigning
PeerConnection, in a way that does not cause more blocking than what
the other PeerConnection methods are already doing.
Note: An alternative to using rtc::Event is to use resource locks and
to not perform the stats collection on the network thread if the
request was cancelled before the start of processing, but this has very
little benefit in terms of performance: once the network thread starts
collecting the stats, it would use the lock until collection is
completed, blocking the signaling thread trying to acquire that lock
anyway. This defeats the purpose and is a riskier change, since
cancelling partial collection in this inherently racy edge-case would
have observable differences from the returned stats, which may cause
more regressions.
Bug: chromium:850907
Change-Id: Idceeee0bddc0c9d5518b58a2b263abb2bbf47cff
Reviewed-on: https://webrtc-review.googlesource.com/c/121567
Commit-Queue: Henrik Boström <hbos@webrtc.org>
Reviewed-by: Steve Anton <steveanton@webrtc.org>
Cr-Commit-Position: refs/heads/master@{#26707}