Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Atomic KMS platform #3525

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

Initial Atomic KMS platform #3525

wants to merge 24 commits into from

Conversation

RAOF
Copy link
Contributor

@RAOF RAOF commented Aug 5, 2024

This is a minimal-changes copy of the display half of the gbm-kms platform, switched over to all Atomic KMS API.

It's parallel installable with the existing gbm-kms platform, and will be preferentially used on devices with atomic support.

Currently, over the existing gbm-kms display platform it only has improved gamma support.
Immediate missing features:

  • Hardware cursor support
  • Querying list of supported pixel formats

@RAOF RAOF requested a review from a team as a code owner August 5, 2024 07:16
@RAOF RAOF force-pushed the initial-atomic-kms-platform branch from 34f45ee to 8bf1aca Compare August 8, 2024 03:37
@RAOF RAOF force-pushed the initial-atomic-kms-platform branch from 8bf1aca to 280f6ed Compare September 9, 2024 02:05
@RAOF RAOF changed the base branch from main to MIRENG-653/platform-specific-options September 9, 2024 02:06
@RAOF RAOF force-pushed the MIRENG-653/platform-specific-options branch from 415a9c3 to 9367dc7 Compare September 9, 2024 05:42
@RAOF RAOF force-pushed the initial-atomic-kms-platform branch 2 times, most recently from a20a6cf to ae1a6ac Compare September 10, 2024 04:45
@RAOF RAOF force-pushed the MIRENG-653/platform-specific-options branch from 5297983 to 7fd62d0 Compare September 10, 2024 08:04
@RAOF RAOF force-pushed the initial-atomic-kms-platform branch 2 times, most recently from d172bfa to 6e64d62 Compare September 12, 2024 07:48
@Saviq
Copy link
Collaborator

Saviq commented Sep 24, 2024

Hmm with dual outputs I get:

< - ERROR - > atomic-kms: Failed to set CRTC: No space left on device (28)

And an occasional SIGABRT after.

Yes. I. Have. Disk. Space.

@Saviq
Copy link
Collaborator

Saviq commented Sep 24, 2024

@tarek-y-ismail's ploughed through drm debug logs and found:

[drm:drm_atomic_plane_check] [PLANE:32:plane 1A] invalid source coordinates 2256.000000x1504.000000+0.000000+0.000000 (fb 1920x1200)
[drm:drm_atomic_check_only] [PLANE:32:plane 1A] atomic core check failed

This suggests there's a problem with cloning, as it tries to map the smaller buffer onto a higher resolution display.

@tarek-y-ismail
Copy link
Contributor

tarek-y-ismail commented Sep 30, 2024

Hmm with dual outputs I get:

By "dual outputs", do you mean builtin-monitor + external monitor, or two external monitors?
I don't get this error, but I also don't get external output (no parameters passed except --platform-display-libs=mir:atomic-kms --platform-rendering-libs=mir:gbm-kms)

@Saviq
Copy link
Collaborator

Saviq commented Sep 30, 2024

By "dual outputs", do you mean builtin-monitor + external monitor, or two external monitors?

Built-in + external.

I don't get this error, but I also don't get external output.

As discussed, to get output through your Nvidia GPU, you'll need ,mir:eglstream-kms added to your display plugins.

@tarek-y-ismail
Copy link
Contributor

Just tried it, both monitors work fine. I assume this is because the second output is not using atomic-kms and thus not triggering the bug

@Saviq
Copy link
Collaborator

Saviq commented Sep 30, 2024

Just tried it, both monitors work fine. I assume this is because the second output is not using atomic-kms and thus not triggering the bug

Yup, that could well be it - they're not sharing the platform like in my case.

@Saviq Saviq added the triaged Triage into JIRA to plan it in label Oct 7, 2024
Base automatically changed from MIRENG-653/platform-specific-options to main October 8, 2024 11:49
@Saviq
Copy link
Collaborator

Saviq commented Oct 9, 2024

Trying heterogeneous refresh rates I got:

[2024-10-09 18:19:28.744101] <information> mirserver: Initial display configuration:
[2024-10-09 18:19:28.744134] <information> mirserver: * Output 1: eDP connected, used
[2024-10-09 18:19:28.744156] <information> mirserver: . |_ EDID manufacturer: BOE
[2024-10-09 18:19:28.744172] <information> mirserver: . |_ EDID product code: 2399
[2024-10-09 18:19:28.744190] <information> mirserver: . |_ Physical size 13.3" 280x190mm
[2024-10-09 18:19:28.744203] <information> mirserver: . |_ Power is on
[2024-10-09 18:19:28.744216] <information> mirserver: . |_ Current mode 2256x1504 59.99Hz
[2024-10-09 18:19:28.744228] <information> mirserver: . |_ Preferred mode 2256x1504 59.99Hz
[2024-10-09 18:19:28.744240] <information> mirserver: . |_ Orientation normal
[2024-10-09 18:19:28.744255] <information> mirserver: . |_ Logical size 2256x1504
[2024-10-09 18:19:28.744268] <information> mirserver: . |_ Logical position +0+0
[2024-10-09 18:19:28.744280] <information> mirserver: . |_ Scaling factor: 1.00
[2024-10-09 18:19:28.744292] <information> mirserver: * Output 2: DisplayPort connected, used
[2024-10-09 18:19:28.744305] <information> mirserver: . |_ EDID monitor name: DELL U2413
[2024-10-09 18:19:28.744326] <information> mirserver: . |_ EDID manufacturer: DEL
[2024-10-09 18:19:28.744338] <information> mirserver: . |_ EDID product code: 61512
[2024-10-09 18:19:28.744351] <information> mirserver: . |_ Physical size 24.0" 520x320mm
[2024-10-09 18:19:28.744364] <information> mirserver: . |_ Power is on
[2024-10-09 18:19:28.744377] <information> mirserver: . |_ Current mode 1920x1080 23.97Hz
[2024-10-09 18:19:28.744403] <information> mirserver: . |_ Preferred mode 1920x1200 59.95Hz
[2024-10-09 18:19:28.744416] <information> mirserver: . |_ Orientation normal
[2024-10-09 18:19:28.744433] <information> mirserver: . |_ Logical size 1920x1080
[2024-10-09 18:19:28.744445] <information> mirserver: . |_ Logical position +2256+0
[2024-10-09 18:19:28.744457] <information> mirserver: . |_ Scaling factor: 1.00
[2024-10-09 18:19:28.744474] <information> mirserver: * Output 3: DisplayPort disconnected
[2024-10-09 18:19:28.744487] <information> mirserver: * Output 4: DisplayPort disconnected
[2024-10-09 18:19:28.744503] <information> mirserver: * Output 5: DisplayPort disconnected
[2024-10-09 18:19:28.744747] < - debug - > miral: Configuring pointer: 'basic-window-manager'
[2024-10-09 18:19:28.744814] <information> input-hub: Device configuration: basic-window-manager, capabilities={pointer}
[2024-10-09 18:19:28.761780] < - ERROR - > atomic-kms: Failed to set CRTC: Invalid argument (22)
!!! Fatal signal received. Attempting cleanup, but deadlock may occur
Mir fatal error: Unsupported attempt to continue after a fatal signal: SIGSEGV
!!! Fatal signal received. Attempting cleanup, but deadlock may occur
Mir fatal error: Unsupported attempt to continue after a fatal signal: SIGABRT

drm.log

RAOF and others added 5 commits October 15, 2024 17:55
This is initially a copy of the display half of `gbm-kms`, quickly ported to use
only the atomic KMS APIs.

It shall be further developed to usefully use the atomic APIs to fix various TODOs,
and provide support for extra performance features
`DisplayConfigurationOutput.pixel_formats` now contains the list of
accepted pixel formats (or, at least, those pixel formats that are
representable in the `MirPixelFormats` enum; many aren't).

---------

Co-authored-by: tarek-y-ismail <tarek.ismail@canonical.com>
Co-authored-by: Alan Griffiths <alan@octopull.co.uk>
@RAOF
Copy link
Contributor Author

RAOF commented Oct 17, 2024

Oh! It's not heterogeneous refresh rates, it's heterogeneous resolutions!

The dance required to actually disable an output (and free its
resources) is surprisingly involved.
We might not actually have a `current_crtc` when `set_power_mode` is called
(notably, when disabling an output we set `mir_power_mode_off`).

Rather than crashing, log an error if we try and turn on an unconfigured
output. Silently ignore trying to turn *off* an unconfigured output,
as it's already off.
The way changing the display mode works is that the
`configure` sets the desired `mode_index`, and then
the *next* `set_crtc` is meant to actually set the mode.

This means that we can wait for correctly-sized content
to present at the new mode, rather than showing a single
black frame before the correct content.

But it also means that `set_crtc` needs to look at
the requested mode, rather than the current mode!
We were trying to pageflip, so let's say that.
Currently we expect to only flip framebuffers that are
the same size, in pixels, as the output mode.

Check in `schedule_page_flip` that this holds. Otherwise,
we must be expecting a modeset and need to go through
the `set_crtc` codepath.

We might, later, update the system to handle using the
display scalers to present framebuffers that aren't the
same size as the output mode.
…SOutput.

This is a different behaviour to `gbm-kms`. On `gbm-kms`, when outputs
have an overlapping view of the logical space they are grouped
together so that they share a single *physical* framebuffer.

When the overlap is great (such as a complete clone), this
should result in lower GPU memory usage (as we need only a single
set of framebuffers for all clones) at the cost of tying
the refresh rates of each clone together. (And some bugs, like
#3641).

When the overlap is *not* great, this potential memory saving
goes away (and may indeed result in *higher* GPU memory usage -
if outputs have different sizes, we may now need to have a
bunch of unused pixels, as the FB can only be rectangular).

For Atomic KMS, instead, we give each output its own physical
framebuffer, regardless of whether it overlaps.
@Saviq
Copy link
Collaborator

Saviq commented Oct 18, 2024

So we're clear on the issue I'm seeing still:

1000010352.mp4

@RAOF
Copy link
Contributor Author

RAOF commented Oct 22, 2024

AHA! It took writing a script to try and reproduce this, but I now can. (The relevant magic is that you need to trigger the modeset from the terminal that will be hung)

What's happening here is that the final frame submitted by the terminal is racing the modeset sequence.

When it stops updating, what's happening is the terminal is waiting on a frame event from its last-buffer-before-modeset before submitting any new frames. Mir is not sending a frame event until something else triggers a composition.

So this is a problem somewhere in our buffer queuing, I think.

Copy link
Contributor

@AlanGriffiths AlanGriffiths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of surface cruft that needs cleaning away. At least one more review pass needed

src/platforms/atomic-kms/include/gbm_format_conversions.h Outdated Show resolved Hide resolved
src/platforms/atomic-kms/include/gbm_format_conversions.h Outdated Show resolved Hide resolved
src/platforms/atomic-kms/include/gbm_format_conversions.h Outdated Show resolved Hide resolved
src/platforms/atomic-kms/server/display_helpers.cpp Outdated Show resolved Hide resolved
src/platforms/atomic-kms/server/kms/kms_output_container.h Outdated Show resolved Hide resolved
src/platforms/atomic-kms/server/kms/kms_page_flipper.h Outdated Show resolved Hide resolved
src/platforms/atomic-kms/server/kms/quirks.h Outdated Show resolved Hide resolved
Copy link
Contributor

@AlanGriffiths AlanGriffiths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to work with one or two monitors (I'm using Miriway/mir-pr3525)

However, when I plug or unplug an external monitor I get:

<- ERROR - > atomic-kms: Failed to schedule page flip: Invalid arguement (22)
Fatal signal received
...

[update 0]

Built Mir locally, with the "cleanup" code disabled, it didn't crash as consistently, but did after a couple of cycles.

The core wasn't as useful as I hoped: it died in an "out of order" shutdown (which is secondary to the problem causing the shutdown, but also worthy of investigation).

[update 1]

Got ssh working (with Saviq's kit), and reproduced under the debugger...

But the "page flip" message wasn't hit. Just the "out of order" shotdown

Mir fatal error: XWaylandConnector was not stopped before being destroyed (is_started: yes, spawner: exists, server: null, wm: exists, wm_event_thread: null)

But still not clear why we're shutting down.

[update 2]

There's an exception thrown...

(gdb) bt
#0  0x00007475508bb35a in __cxxabiv1::__cxa_throw (obj=0x5717b6dcfc70, tinfo=0x74754f10d508 <typeinfo for boost::wrapexcept<std::system_error>>, dest=0x74754f0383ae <boost::wrapexcept<std::system_error>::~wrapexcept()>)
    at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:81
#1  0x000074754f0384e2 in boost::throw_exception<std::system_error> (e=..., loc=...) at /usr/include/boost/throw_exception.hpp:171
#2  0x000074754f02f798 in (anonymous namespace)::PropertyBlobData::PropertyBlobData (this=0x7fff9b0ef078, drm_fd=..., handle=105) at /home/alan/CLionProjects/mir/src/platforms/atomic-kms/server/kms/atomic_kms_output.cpp:76
#3  0x000074754f032882 in mir::graphics::atomic::AtomicKMSOutput::update_from_hardware_state (this=0x5717b6682df0, output=...) at /home/alan/CLionProjects/mir/src/platforms/atomic-kms/server/kms/atomic_kms_output.cpp:761
#4  0x000074754f0738a5 in operator()<std::shared_ptr<mir::graphics::atomic::KMSOutput> > (__closure=0x7fff9b0ef4c0, output=std::shared_ptr<mir::graphics::atomic::KMSOutput> (use count 4, weak count 0) = {...})
    at /home/alan/CLionProjects/mir/src/platforms/atomic-kms/server/kms/real_kms_display_configuration.cpp:183
#5  0x000074754f07442d in std::__invoke_impl<void, mir::graphics::atomic::RealKMSDisplayConfiguration::update()::<lambda(const auto:42&)>&, const std::shared_ptr<mir::graphics::atomic::KMSOutput>&>(std::__invoke_other, struct {...} &) (__f=...)
    at /usr/include/c++/14/bits/invoke.h:61
#6  0x000074754f074296 in std::__invoke_r<void, mir::graphics::atomic::RealKMSDisplayConfiguration::update()::<lambda(const auto:42&)>&, const std::shared_ptr<mir::graphics::atomic::KMSOutput>&>(struct {...} &) (__fn=...)
    at /usr/include/c++/14/bits/invoke.h:111
#7  0x000074754f073f83 in std::_Function_handler<void(const std::shared_ptr<mir::graphics::atomic::KMSOutput>&), mir::graphics::atomic::RealKMSDisplayConfiguration::update()::<lambda(const auto:42&)> >::_M_invoke(const std::_Any_data &, const std::shared_ptr<mir::graphics::atomic::KMSOutput> &) (__functor=..., __args#0=std::shared_ptr<mir::graphics::atomic::KMSOutput> (use count 4, weak count 0) = {...}) at /usr/include/c++/14/bits/std_function.h:290
#8  0x000074754f07faaf in std::function<void(std::shared_ptr<mir::graphics::atomic::KMSOutput> const&)>::operator() (this=0x7fff9b0ef4c0, __args#0=std::shared_ptr<mir::graphics::atomic::KMSOutput> (use count 4, weak count 0) = {...})
    at /usr/include/c++/14/bits/std_function.h:591
#9  0x000074754f07efb0 in mir::graphics::atomic::RealKMSOutputContainer::for_each_output (this=0x5717b679c7a0, functor=...) at /home/alan/CLionProjects/mir/src/platforms/atomic-kms/server/kms/real_kms_output_container.cpp:36
#10 0x000074754f073a01 in mir::graphics::atomic::RealKMSDisplayConfiguration::update (this=0x5717b665c0f8) at /home/alan/CLionProjects/mir/src/platforms/atomic-kms/server/kms/real_kms_display_configuration.cpp:161
#11 0x000074754f05e767 in mir::graphics::atomic::Display::configuration (this=0x5717b665c050) at /home/alan/CLionProjects/mir/src/platforms/atomic-kms/server/kms/display.cpp:193
#12 0x000074754fcf5a81 in mir::graphics::MultiplexingDisplay::configuration (this=0x5717b66f3a60) at /home/alan/CLionProjects/mir/src/server/graphics/multiplexing_display.cpp:248
#13 0x000074754fab63fc in mir::DisplayServer::Private::configure_display (this=0x5717b65bd880) at /home/alan/CLionProjects/mir/src/server/display_server.cpp:191
#14 0x000074754fab5347 in mir::DisplayServer::Private::Private(mir::ServerConfiguration&)::{lambda()#1}::operator()() const (__closure=0x7474f817b670) at /home/alan/CLionProjects/mir/src/server/display_server.cpp:91
#15 0x000074754fab952c in std::__invoke_impl<void, mir::DisplayServer::Private::Private(mir::ServerConfiguration&)::{lambda()#1}&>(std::__invoke_other, mir::DisplayServer::Private::Private(mir::ServerConfiguration&)::{lambda()#1}&) (__f=...)
    at /usr/include/c++/14/bits/invoke.h:61
#16 0x000074754fab8bbb in std::__invoke_r<void, mir::DisplayServer::Private::Private(mir::ServerConfiguration&)::{lambda()#1}&>(mir::DisplayServer::Private::Private(mir::ServerConfiguration&)::{lambda()#1}&) (__fn=...)
    at /usr/include/c++/14/bits/invoke.h:111
#17 0x000074754fab7996 in std::_Function_handler<void (), mir::DisplayServer::Private::Private(mir::ServerConfiguration&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/14/bits/std_function.h:290
#18 0x000074754f063bce in std::function<void()>::operator() (this=0x7474f817b670) at /usr/include/c++/14/bits/std_function.h:591
#19 0x000074754f05ea16 in operator() (__closure=0x7474f817b670, type=mir::udev::Monitor::CHANGED, device=...) at /home/alan/CLionProjects/mir/src/platforms/atomic-kms/server/kms/display.cpp:230
#20 0x000074754f0612e6 in std::__invoke_impl<void, mir::graphics::atomic::Display::register_configuration_change_handler(mir::graphics::EventHandlerRegister&, const mir::graphics::DisplayConfigurationChangeHandler&)::<lambda(int)>::<lambda(mir::udev::Monitor::EventType, const mir::udev::Device&)>&, mir::udev::Monitor::EventType, const mir::udev::Device&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/14/bits/invoke.h:61
#21 0x000074754f060b59 in std::__invoke_r<void, mir::graphics::atomic::Display::register_configuration_change_handler(mir::graphics::EventHandlerRegister&, const mir::graphics::DisplayConfigurationChangeHandler&)::<lambda(int)>::<lambda(mir::udev::Monitor::EventType, const mir::udev::Device&)>&, mir::udev::Monitor::EventType, const mir::udev::Device&>(struct {...} &) (__fn=...) at /usr/include/c++/14/bits/invoke.h:111
#22 0x000074754f060483 in std::_Function_handler<void(mir::udev::Monitor::EventType, const mir::udev::Device&), mir::graphics::atomic::Display::register_configuration_change_handler(mir::graphics::EventHandlerRegister&, const mir::graphics::DisplayConfigurationChangeHandler&)::<lambda(int)>::<lambda(mir::udev::Monitor::EventType, const mir::udev::Device&)> >::_M_invoke(const std::_Any_data &, mir::udev::Monitor::EventType &&, const mir::udev::Device &)
    (__functor=..., __args#0=@0x7fff9b0ef7f4: mir::udev::Monitor::CHANGED, __args#1=...) at /usr/include/c++/14/bits/std_function.h:290
#23 0x00007475503aa2ec in std::function<void(mir::udev::Monitor::EventType, mir::udev::Device const&)>::operator() (this=0x7fff9b0ef890, __args#0=mir::udev::Monitor::CHANGED, __args#1=...) at /usr/include/c++/14/bits/std_function.h:591
#24 0x00007475503a95bd in mir::udev::Monitor::process_events (this=0x5717b665c0c0, handler=...) at /home/alan/CLionProjects/mir/src/platform/udev/udev_wrapper.cpp:355
#25 0x000074754f05eac2 in operator() (__closure=0x5717b6c179a0) at /home/alan/CLionProjects/mir/src/platforms/atomic-kms/server/kms/display.cpp:225
#26 0x000074754f061a80 in std::__invoke_impl<void, mir::graphics::atomic::Display::register_configuration_change_handler(mir::graphics::EventHandlerRegister&, const mir::graphics::DisplayConfigurationChangeHandler&)::<lambda(int)>&, int>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/14/bits/invoke.h:61
#27 0x000074754f061467 in std::__invoke_r<void, mir::graphics::atomic::Display::register_configuration_change_handler(mir::graphics::EventHandlerRegister&, const mir::graphics::DisplayConfigurationChangeHandler&)::<lambda(int)>&, int>(struct {...} &)
    (__fn=...) at /usr/include/c++/14/bits/invoke.h:111
#28 0x000074754f060c9f in std::_Function_handler<void(int), mir::graphics::atomic::Display::register_configuration_change_handler(mir::graphics::EventHandlerRegister&, const mir::graphics::DisplayConfigurationChangeHandler&)::<lambda(int)> >::_M_invoke(const std::_Any_data &, int &&) (__functor=..., __args#0=@0x7fff9b0ef994: 117) at /usr/include/c++/14/bits/std_function.h:290
#29 0x000074754fad493a in std::function<void(int)>::operator() (this=0x5717b6c55c60, __args#0=117) at /usr/include/c++/14/bits/std_function.h:591
#30 0x000074754faccef7 in operator() (__closure=0x5717b6dfabf0, fd=117) at /home/alan/CLionProjects/mir/src/server/glib_main_loop.cpp:232
#31 0x000074754fad22ee in std::__invoke_impl<void, mir::GLibMainLoop::register_fd_handler(std::initializer_list<int>, void const*, mir::UniqueModulePtr<std::function<void(int)> >)::<lambda(int)>&, int>(std::__invoke_other, struct {...} &) (__f=...)
    at /usr/include/c++/14/bits/invoke.h:61
#32 0x000074754fad0e41 in std::__invoke_r<void, mir::GLibMainLoop::register_fd_handler(std::initializer_list<int>, void const*, mir::UniqueModulePtr<std::function<void(int)> >)::<lambda(int)>&, int>(struct {...} &) (__fn=...)
    at /usr/include/c++/14/bits/invoke.h:111
#33 0x000074754facf8e9 in std::_Function_handler<void(int), mir::GLibMainLoop::register_fd_handler(std::initializer_list<int>, void const*, mir::UniqueModulePtr<std::function<void(int)> >)::<lambda(int)> >::_M_invoke(const std::_Any_data &, int &&)
    (__functor=..., __args#0=@0x7fff9b0efa94: 117) at /usr/include/c++/14/bits/std_function.h:290
#34 0x000074754fad493a in std::function<void(int)>::operator() (this=0x5717b6e14a80, __args#0=117) at /usr/include/c++/14/bits/std_function.h:591
#35 0x000074754fadf106 in mir::detail::FdSources::FdContext::static_call (fd=117, ctx=0x5717b6e14a80) at /home/alan/CLionProjects/mir/src/server/glib_main_loop_sources.cpp:390
#36 0x0000747550c27397 in g_main_dispatch (context=0x5717b65ad090) at ../../../glib/gmain.c:3357
#37 0x0000747550c87dc7 in g_main_context_dispatch_unlocked (context=0x5717b65ad090) at ../../../glib/gmain.c:4208
#38 g_main_context_iterate_unlocked.isra.0 (context=context@entry=0x5717b65ad090, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../../../glib/gmain.c:4273
#39 0x0000747550c268b3 in g_main_context_iteration (context=0x5717b65ad090, may_block=1) at ../../../glib/gmain.c:4338
#40 0x000074754facc407 in mir::GLibMainLoop::run (this=0x5717b65aa7b0) at /home/alan/CLionProjects/mir/src/server/glib_main_loop.cpp:142
#41 0x000074754fab49ac in mir::DisplayServer::run (this=0x7fff9b0efd20) at /home/alan/CLionProjects/mir/src/server/display_server.cpp:241
#42 0x000074754faaafdc in mir::run_mir (config=..., init=..., terminator_=...) at /home/alan/CLionProjects/mir/src/server/run_mir.cpp:281
#43 0x000074754fae8cbc in mir::Server::run (this=0x5717b6516770) at /home/alan/CLionProjects/mir/src/server/server.cpp:410
#44 0x0000747550f183c3 in miral::MirRunner::Self::run_with (this=0x5717b6514d30, options=std::initializer_list of length 14 = {...}) at /home/alan/CLionProjects/mir/src/miral/runner.cpp:183
#45 0x0000747550f18c0c in miral::MirRunner::run_with (this=0x7fff9b0f0010, options=std::initializer_list of length 14 = {...}) at /home/alan/CLionProjects/mir/src/miral/runner.cpp:254
#46 0x00005717ad34c251 in main (argc=2, argv=0x7fff9b0f0658) at /home/alan/CLionProjects/mir/examples/miral-shell/shell_main.cpp:136

[update 3]

This patch avoids the crash, just need to understand the context to decide how sane it is...

diff --git a/src/platforms/atomic-kms/server/kms/atomic_kms_output.cpp b/src/platforms/atomic-kms/server/kms/atomic_kms_output.cpp
index 275aa7d320..e35eecf025 100644
--- a/src/platforms/atomic-kms/server/kms/atomic_kms_output.cpp
+++ b/src/platforms/atomic-kms/server/kms/atomic_kms_output.cpp
@@ -758,16 +758,23 @@ void mga::AtomicKMSOutput::update_from_hardware_state(
     GammaCurves gamma;
     if (current_crtc && crtc_props->has_property("GAMMA_LUT") && crtc_props->has_property("GAMMA_LUT_SIZE"))
     {
-        PropertyBlobData gamma_lut{drm_fd_, static_cast<uint32_t>((*crtc_props)["GAMMA_LUT"])};
-        auto const gamma_size = gamma_lut.data<struct drm_color_lut>().size();
-        gamma.red.reserve(gamma_size);
-        gamma.green.reserve(gamma_size);
-        gamma.blue.reserve(gamma_size);
-        for (auto const& entry : gamma_lut.data<struct drm_color_lut>())
+        try
+        {
+            PropertyBlobData gamma_lut{drm_fd_, static_cast<uint32_t>((*crtc_props)["GAMMA_LUT"])};
+            auto const gamma_size = gamma_lut.data<struct drm_color_lut>().size();
+            gamma.red.reserve(gamma_size);
+            gamma.green.reserve(gamma_size);
+            gamma.blue.reserve(gamma_size);
+            for (auto const& entry : gamma_lut.data<struct drm_color_lut>())
+            {
+                gamma.red.push_back(entry.red);
+                gamma.green.push_back(entry.green);
+                gamma.blue.push_back(entry.blue);
+            }
+        }
+        catch (...)
         {
-            gamma.red.push_back(entry.red);
-            gamma.green.push_back(entry.green);
-            gamma.blue.push_back(entry.blue);
+            log(logging::Severity::warning, MIR_LOG_COMPONENT, std::current_exception(), "Failed to set gamma curves");
         }
     }

The error is:

[2024-10-24 09:38:50.657726] < -warning- > atomic-kms: Failed to set gamma curves: /home/alan/CLionProjects/mir/src/platforms/atomic-kms/server/kms/atomic_kms_output.cpp(76): Throw in function {anonymous}::PropertyBlobData::PropertyBlobData(const mir::Fd&, uint32_t)
Dynamic exception type: boost::wrapexcept<std::system_error>
std::exception::what: Failed to read DRM property blob: No such file or directory

@AlanGriffiths
Copy link
Contributor

Note: I am not sure 1b0062f the best place to consume the exception, maybe we want update_from_hardware_state() to fail and handle the problem elsewhere?

@RAOF WDYT?

Copy link
Contributor

@AlanGriffiths AlanGriffiths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the last fix, I'm content. Someone else should cast a vote too (especially about my last commit)

@Saviq
Copy link
Collaborator

Saviq commented Oct 24, 2024

Someone else should cast a vote too (especially about my last commit)

Looks legit to me, @RAOF feel free to merge if you agree with where the exception is caught.

*Almost* all of this is only accessed on a single thread, the compositor's
submission thread.

*Almost*.

Unfortunately, `configure` is generally called from the ServerAction loop,
which is a different thread to the composition thread. `ensure_crtc` can
also (transitively) be called from `configuration`, which also happens
off-composition-thread.

Wrap this ball of wax up in a `mir::Synchronised<>`, to ensure we're
not racy.

(This also affects `gbm-kms`, but to a much lesser extent)
@RAOF
Copy link
Contributor Author

RAOF commented Oct 25, 2024

AHA! It took writing a script to try and reproduce this, but I now can. (The relevant magic is that you need to trigger the modeset from the terminal that will be hung)

What's happening here is that the final frame submitted by the terminal is racing the modeset sequence.

When it stops updating, what's happening is the terminal is waiting on a frame event from its last-buffer-before-modeset before submitting any new frames. Mir is not sending a frame event until something else triggers a composition.

So this is a problem somewhere in our buffer queuing, I think.

Just to loop back to this; I can trivially reproduce this on gbm-kms, too.

@RAOF
Copy link
Contributor Author

RAOF commented Oct 25, 2024

Here are a couple of extra fixes - one trivial, and one thread race use-after-free.

Finally, I've identified a problem when disconnecting and re-connecting an external display to a different port on my laptop - we're not disassociating the CRTC from the old connector, so we run into resource conflicts. The fix there will be to update clear_connected_unused_outputs to also clear outputs which are not connected, but which have resources set up.

@RAOF
Copy link
Contributor Author

RAOF commented Oct 25, 2024

Bah, that doesn't fix it for some reason; I still see a CRTC connected to my disconnected output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Triage into JIRA to plan it in
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants