flutter 1.17共享engine

前言

flutter 升级到 1.17之后,app ios 线上遇到一个crash ,通过官方的 符号表文件 flutter.dsym 还原出堆栈如下

1
2
3
4
5
6
0 auto fml::internal::CopyableLambda<flutter::Shell::OnPlatformViewCreated(std::__1::unique_ptr<flutter::Surface, std::__1::default_delete<flutter::Surface> >)::$_8>::operator()<>() const (in Flutter) (make_copyable.h:24)
1 auto fml::internal::CopyableLambda<flutter::Shell::OnPlatformViewCreated(std::__1::unique_ptr<flutter::Surface, std::__1::default_delete<flutter::Surface> >)::$_8>::operator()<>() const (in Flutter) (make_copyable.h:24)
2 fml::MessageLoopImpl::FlushTasks(fml::FlushType) (in Flutter) (message_loop_impl.cc:129)
3 fml::MessageLoopDarwin::OnTimerFire(__CFRunLoopTimer*, fml::MessageLoopDarwin*) (in Flutter) (message_loop_darwin.mm:76)
9 fml::MessageLoopDarwin::Run() (in Flutter) (message_loop_darwin.mm:47)
10 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, fml::Thread::Thread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)::$_0> >(void*) (in Flutter) (thread:352)

这里还只能看到crash在engine的c++代码中,具体原因未知

定位

我们根据crash 用户的 埋点日志 分析crash前的 使用路径,基本都是打开push 落地到一个flutter页面
app 的 第一个tab 也是个 flutter 页面,所以是push 唤起app,连续打开两个flutter页面。
手动打开app,点击进到flutter页面是不会crash的(这么简单的路径,如果crash,那就该死了)
很快我们就可以通过这个 路径 复现 crash ,能复现就好说。

Image

debug engine源码,可以定位到更具体的地方

Image

surface_ 为 null ,EXC_BAD_ACCESS 野指针

分析

定位到了具体的代码位置,接下来分析下野指针的原因
xcode中 crash 的时候,看到主线程的 堆栈记录 是从 application 的 didbecomeactive 的 notification发起的
由于是 push 唤起app ,有这个通知是对的,crash 是在 共享engine 的 raster 线程。

看代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
#pragma mark - Application lifecycle notifications

// app 的 首页 flutter 页面会 执行 surfaceUpdated 方法

- (void)applicationBecameActive:(NSNotification*)notification {
TRACE_EVENT0("flutter", "applicationBecameActive");
if (_viewportMetrics.physical_width)
[self surfaceUpdated:YES];
[self goToApplicationLifecycle:@"AppLifecycleState.resumed"];
}

#pragma mark - Surface creation and teardown updates

- (void)surfaceUpdated:(BOOL)appeared {
// NotifyCreated/NotifyDestroyed are synchronous and require hops between the UI and raster
// thread.
if (appeared) {
[self installFirstFrameCallback];
[_engine.get() platformViewsController] -> SetFlutterView(_flutterView.get());
[_engine.get() platformViewsController] -> SetFlutterViewController(self);
// 这里
[_engine.get() platformView] -> NotifyCreated();
} else {
self.displayingFlutterUI = NO;
[_engine.get() platformView] -> NotifyDestroyed();
[_engine.get() platformViewsController] -> SetFlutterView(nullptr);
[_engine.get() platformViewsController] -> SetFlutterViewController(nullptr);
}
}


void PlatformView::NotifyCreated() {
std::unique_ptr<Surface> surface;

// Threading: We want to use the platform view on the non-platform thread.
// Using the weak pointer is illegal. But, we are going to introduce a latch
// so that the platform view is not collected till the surface is obtained.
auto* platform_view = this;
fml::ManualResetWaitableEvent latch;
fml::TaskRunner::RunNowOrPostTask(
task_runners_.GetRasterTaskRunner(), [platform_view, &surface, &latch]() {
surface = platform_view->CreateRenderingSurface();
latch.Signal();
});
latch.Wait();
//这里
delegate_.OnPlatformViewCreated(std::move(surface));
}


// |PlatformView::Delegate|
void Shell::OnPlatformViewCreated(std::unique_ptr<Surface> surface) {
TRACE_EVENT0("flutter", "Shell::OnPlatformViewCreated");
FML_DCHECK(is_setup_);
FML_DCHECK(task_runners_.GetPlatformTaskRunner()->RunsTasksOnCurrentThread());

// Note:
// This is a synchronous operation because certain platforms depend on
// setup/suspension of all activities that may be interacting with the GPU in
// a synchronous fashion.
fml::AutoResetWaitableEvent latch;
auto raster_task =
fml::MakeCopyable([& waiting_for_first_frame = waiting_for_first_frame_,
rasterizer = rasterizer_->GetWeakPtr(), //
surface = std::move(surface), //
&latch]() mutable {
if (rasterizer) {
//这里
rasterizer->Setup(std::move(surface));
}

waiting_for_first_frame.store(true);

// Step 3: All done. Signal the latch that the platform thread is
// waiting on.
latch.Signal();
});

...
}


void Rasterizer::Setup(std::unique_ptr<Surface> surface) {
surface_ = std::move(surface);
if (max_cache_bytes_.has_value()) {
SetResourceCacheMaxBytes(max_cache_bytes_.value(),
user_override_resource_cache_bytes_);
}
compositor_context_->OnGrContextCreated();
// surface_ null,BAD_ACCESS
if (surface_->GetExternalViewEmbedder()) {
const auto platform_id =
task_runners_.GetPlatformTaskRunner()->GetTaskQueueId();
const auto gpu_id = task_runners_.GetRasterTaskRunner()->GetTaskQueueId();
raster_thread_merger_ =
fml::MakeRefCounted<fml::RasterThreadMerger>(platform_id, gpu_id);
}
}

这么一路看下来,surface_怎么会变成null呢?一般情况是,执行 [self surfaceUpdated:NO] 的时候会销毁surface,断点根本都没进去。
继续看代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

// push 落地页 flutter 页面 init 的时候,会重新attach 到 engine,会执行setViewController方法

- (void)setViewController:(FlutterViewController*)viewController {
FML_DCHECK(self.iosPlatformView);
_viewController =
viewController ? [viewController getWeakPtr] : fml::WeakPtr<FlutterViewController>();
//这里
self.iosPlatformView->SetOwnerViewController(_viewController);
[self maybeSetupPlatformViewChannels];

if (viewController) {
__block FlutterEngine* blockSelf = self;
self.flutterViewControllerWillDeallocObserver =
[[NSNotificationCenter defaultCenter] addObserverForName:FlutterViewControllerWillDealloc
object:viewController
queue:[NSOperationQueue mainQueue]
usingBlock:^(NSNotification* note) {
[blockSelf notifyViewControllerDeallocated];
}];
} else {
self.flutterViewControllerWillDeallocObserver = nil;
}
}


void PlatformViewIOS::SetOwnerViewController(fml::WeakPtr<FlutterViewController> owner_controller) {
FML_DCHECK(task_runners_.GetPlatformTaskRunner()->RunsTasksOnCurrentThread());
std::lock_guard<std::mutex> guard(ios_surface_mutex_);
// 重点是这里
if (ios_surface_ || !owner_controller) {
// 这里会销毁
NotifyDestroyed();
ios_surface_.reset();
accessibility_bridge_.reset();
}
owner_controller_ = owner_controller;

// Add an observer that will clear out the owner_controller_ ivar and
// the accessibility_bridge_ in case the view controller is deleted.
dealloc_view_controller_observer_.reset(
[[[NSNotificationCenter defaultCenter] addObserverForName:FlutterViewControllerWillDealloc
object:owner_controller_.get()
queue:[NSOperationQueue mainQueue]
usingBlock:^(NSNotification* note) {
// Implicit copy of 'this' is fine.
accessibility_bridge_.reset();
owner_controller_.reset();
}] retain]);

if (owner_controller_ && [owner_controller_.get() isViewLoaded]) {
this->attachView();
}
// Do not call `NotifyCreated()` here - let FlutterViewController take care
// of that when its Viewport is sized. If `NotifyCreated()` is called here,
// it can occasionally get invoked before the viewport is sized resulting in
// a framebuffer that will not be able to completely attach.
}

void PlatformView::NotifyDestroyed() {
delegate_.OnPlatformViewDestroyed();
}

// |PlatformView::Delegate|
void Shell::OnPlatformViewDestroyed() {
TRACE_EVENT0("flutter", "Shell::OnPlatformViewDestroyed");
FML_DCHECK(is_setup_);
FML_DCHECK(task_runners_.GetPlatformTaskRunner()->RunsTasksOnCurrentThread());

// Note:
// This is a synchronous operation because certain platforms depend on
// setup/suspension of all activities that may be interacting with the GPU in
// a synchronous fashion.

fml::AutoResetWaitableEvent latch;

auto io_task = [io_manager = io_manager_.get(), &latch]() {
// Execute any pending Skia object deletions while GPU access is still
// allowed.
io_manager->GetIsGpuDisabledSyncSwitch()->Execute(
fml::SyncSwitch::Handlers().SetIfFalse(
[&] { io_manager->GetSkiaUnrefQueue()->Drain(); }));
// Step 3: All done. Signal the latch that the platform thread is waiting
// on.
latch.Signal();
};

auto raster_task = [rasterizer = rasterizer_->GetWeakPtr(),
io_task_runner = task_runners_.GetIOTaskRunner(),
io_task]() {
if (rasterizer) {
// 这里
rasterizer->Teardown();
}
// Step 2: Next, tell the IO thread to complete its remaining work.
fml::TaskRunner::RunNowOrPostTask(io_task_runner, io_task);
};

...


void Rasterizer::Teardown() {
compositor_context_->OnGrContextDestroyed();
// 这里 reset
surface_.reset();
last_layer_tree_.reset();
}

所以原因 就是 落地页 init 的时候 重新attach 引擎,NotifyDestroyed 方法 最终会销毁 surface,这时候正好raster线程使用 surface_做方法调用。

修复

定位到原因,修复就简单了,做下空判断就好了,如果为空 就直接return

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
void Rasterizer::Setup(std::unique_ptr<Surface> surface) {
surface_ = std::move(surface);

if (!surface_) {
FML_DLOG(INFO) << "Rasterizer::Setup called with no surface.";
return;
}

if (max_cache_bytes_.has_value()) {
SetResourceCacheMaxBytes(max_cache_bytes_.value(),
user_override_resource_cache_bytes_);
}
compositor_context_->OnGrContextCreated();
if (surface_->GetExternalViewEmbedder()) {
const auto platform_id =
task_runners_.GetPlatformTaskRunner()->GetTaskQueueId();
const auto gpu_id = task_runners_.GetRasterTaskRunner()->GetTaskQueueId();
raster_thread_merger_ =
fml::MakeRefCounted<fml::RasterThreadMerger>(platform_id, gpu_id);
}
}

这里是直接修改了引擎的代码,所以需要重新编译engine 产物,替换掉就搞定了

其他

1.17之前的版本 1.12.13 的时候,不确定engine存不存在这个问题,有空再看看。
后面github提issue、PR,看看官方怎么看待这个问题,修复应该还有其他办法。