Re: Collaboration on standard Wayland protocol extensions

From: Carsten Haitzler (The Rasterman) <raster rasterman com>
To: Drew DeVault <sir cmpwn com>
Cc: kwin kde org, desktop-devel-list gnome org, wayland-devel lists freedesktop org
Subject: Re: Collaboration on standard Wayland protocol extensions
Date: Mon, 28 Mar 2016 14:13:21 +0900

On Sun, 27 Mar 2016 22:29:57 -0400 Drew DeVault <sir cmpwn com> said:

On 2016-03-28  8:55 AM, Carsten Haitzler wrote:

i can tell you that screen capture is a security sensitive thing and likely
won't get a regular wayland protocol. it definitely won't from e. if you can
capture screen, you can screenscrape. some untrusted game you downloaded for
free can start watching your internet banking and see how much money you
have in which accounts where...


Right, but there are legitimate use cases for this feature as well. It's
also true that if you have access to /dev/sda you can read all of the
user's files, but we still have tools like mkfs. We just put them behind
some extra security, i.e. you have to be root to use mkfs.


yes but you need permission and that is handled at kernel level on a specific
file. not so here. compositor runs as a specific user and so you cant do that.
you'd have to do in-compositor security client-by-client.

the simple solution is to build it into the wm/desktop itself as an explicit
user action (keypress, menu option etc.) and now it can't be exploited as
it's not pro grammatically available. :)

i would imagine the desktops themselves would in the end provide video
capture like they would stills.


I'd argue that this solution is far from simple. Instead, it moves *all*
of the responsibilities of your entire desktop into one place, and one
codebase. And consider the staggering amount of work that went into
making ffmpeg, which has well over 4x the git commits as enlightenment.


you wouldn't recreate ffmpeg. ffmpec produce libraries like avcodec. like a
reasonable developer we'd just use their libraries to do the encoding - we'd
capture frames and then hand off to avcodec (ffmpeg) library routines to do the
rest. ffmpeg doesnt need to know how to capture - just to do what 99% of its
code is devoted to doing - encode/decode. :) that's rather simple. already we
have decoding wrapped - we sit on top of either gstreamer, vlc or xine as the
codec engine and just glue in output and control api's and events. encoding is
just the same but in reverse. :) the encapsulation is simple.

- Output configuration


why? currently pretty much every desktop provides its OWN output
configuration tool that is part of the desktop environment. why do you want
to re-invent randr here allowing any client to mess with screen config.
after YEARS of games using xvidtune and what not to mess up screen setups
this would be a horrible idea. if you want to make a presentation tool that
uses 1 screen for output and another for "controls" then that's a matter of
providing info that multiple displays exist and what type they may be
(internal, external) and clients can tag surfaces with "intents" eg - this
iss a control surface, this is an output/display surface. compositor will
then assign them appropriately.


There's more than desktop environments alone out there. Not everyone
wants to go entirely GTK or Qt or EFL. I bet everyone on this ML has
software on their computer that uses something other than the toolkit of
their choice. Some people like piecing their system together and keeping
things lightweight, and choosing the best tool for the job. Some people
might want to use the KDE screengrab tool on e, or perhaps some other
tool that's more focused on doing just that job and doing it well. Or
perhaps there's existing tools like ImageMagick that are already written
into scripts and provide a TON of options to the user, which could be
much more easily patched with support for some standard screengrab
protocol than to implement all of its features in 5 different desktops.


the expectation is there won't be generic tools but desktop specific ones. the
CURRENT ecosystem of tools exist because that is the way x was designed to
work. thus the srate of software matches its design. wayland is different. thus
tools and ecosystem will adapt.

as for output config - why would the desktops that already have their own tools
then want to support OTHER tools too? their tools integrate with their settings
panels and look and feel right and support THEIR policies.

let me give you an example:

http://devs.enlightenment.org/~raster/ssetup.png

bottom-right - i can assign special scale factors and different toolkit
profiles per screen. eg one screen can be a desktop, one a media center style,
one a mobile "touch centric ui" etc. etc. - this is part of the screen setup
tool. a generic tool will miss features that make the desktop nice and
functional for its purposes. do you want to go create some kind of uber
protocol that every de has to support with every other de's feature set in it
and limit de's to modifying the protocol because they now have to go through a
shared protocol in libwayland that they cant just add features to as they
please? ok - so these features will be added adhoc in extra protocols so now
you have a bit of a messy protocol with 1 protocol referring to another... and
the "kde tool" messes up on e or the e tool messes up in gnome because all
these extra features are either not even supported by the tool or existing
features don't work because the de doesn't support those extensions?

just "i want to use the kde screen config tool" is not reason enough for there
to be a public/shared/common protocol. it will fall apart quickly like above
and simply mean work for most people to go support it rather than actual value.

We all have to implement output configuration, so why not do it the same
way and share our API? I don't think we need to let any client


no - we don't have to implement it as a protocol. enlightenment needs zero
protocol. it's done by the compositor. the compositors own tool is simply a
settings dialog inside the compositor itself. no protocol. not even a tool.
it's the same as edit/tools -> preferences in most gui apps. its just a dialog
the app shows to configure itself.

chances are gnome likely will do this via dbus (they love dbus :)). kde - i
don't know. but not everyone is implementing a wayland protocol at all so
assuming they are and saying "do it the same way" is not necessarily saving any
work.

manipulate the output configuration. We need to implement a security
model for this like all other elevated permissions.


like above. if gnome uses dbus - they will use polkit etc. etc. to decide that.
enlightenment doesn't even need to because there isn't even a protocol nor an
external tool - it's built directly in.

Using some kind of intents system to communicate things like Impress
wanting to use one output for presentation and another for notes is
going to get out of hand quickly. There are just so many different
"intents" that are solved by just letting applications configure outputs


even impress doesnt configure outputs. thank god for that.

when it makes sense for them to. The code to handle this in the
compositor is going to become an incredibly complicated mess that rivals
even xorg in complexity. We need to avoid making the same mistakes
again. If we don't focus on making it simple, then in 15 years we're
going to be writing a new protocol and making a new set of mistakes. X
does a lot of things wrong, but the tools around it have a respect for
the Unix philosophy that we'd be wise to consider.


how would it be complex. a compositor is already, if decent, going to handle
multiple outputs. it's either going to auto-configure new ones to extend/clone
or maybe pop up a settings dialog. e already does this for example and
remembers config for that screen (edid+output) so plug it in a 2nd time and it
automatically uses the last stored config for that. so the screen will "work"
as basicalyl a biu product of making a compositor that can do multiple outputs.

then intents are only a way of deciding where a surface is to be displayed -
rather than on the current desktop/screen.

so simply mark a surface as "for presentation" and the compositor will put it
on the non-internal display (chosen maybe by physical size reported in edid as
the larger one, or by elimination - its on the screen OTHER than the
internal... maybe user simply marks/checkboxes that screen as "use this
screen for presenting" and all apps that want so present get their content
there etc.)

so what you are saying is it's better to duplicate all this logic of screen
configuration inside every app that wants to present things (media players -
play movie on presentation screen, ppt/impress/whatever show presentation there,
etc. etc.) and how to configure the screen etc. etc., rather than have a simple
tag/intent and let your de/wm/compositor "deal with it" universally for all
such apps in a consistent way?

- More detailed surface roles (should it be floating, is it a modal,
  does it want to draw its own decorations, etc)


that seems sensible and over time i can imagine this will expand.


Cool. Suggestions for what sort of capability thiis protocol should
have, what kind of surface roles we will be looking at? We should
consider a few things. Normal windows, of course, which on compositors
like Sway would be tiled. Then there's floating windows, like


ummm whats the difference between floating and normal? apps like gnome
calculator just open ... normal windows.

gnome-calculator, that are better off being tiled. Modals would be
something that pops up and prevents the parent window from being
interacted with, like some sort of alert (though preventing this
interactivity might not be the compositor's job). Then we have some


yeah - good old "transient for" :)

roles like dmenu would use, where the tool would like to arrange itself
(perhaps this would demand another permission?) Surfaces that want to be
fullscreen could be another. We should also consider additional settings
a surface might want, like negotiating for who draws the decorations or
whether or not it should appear in a taskbar sort of interface.


xdg shell should be handling these already - except dmenu. dmenu is almost a
special desktop component. like a shelf/panel/bar thing.

- Input device configuration


as above. i see no reason clients should be doing this. surface
intents/roles/whatever can deal with this. compositor may alter how an input
device works for that surface based on this.


I don't feel very strongly about input device configuration as a
protocol here, but it's something that many of Sway's users are asking
for. People are trying out various compositors and may switch back and
forth depending on their needs and they want to configure all of their
input devices the same way.


they are going to have to deal with this then. already gnome and kde and e will
all configure mouse accel/left/right mouse on their own based on settings. yes
- i can RUN xset and set it back later but its FIGHTING with your DE. waqyland
is the same. use the desktop tools for this :) yes - it'll change between
compositors.  :) at least in wayland you cant fight with the compositor here.
for sway - you are going ot have to write this yourself. eg - write tools that
talk to sway or sway reads a cfg file you edit or whatever. :)

However, beyond detailed input device configuration, there are some
other things that we should consider. Some applications (games, vnc,
etc) will want to capture the mouse and there should be a protocol for
them to indicate this with (perhaps again associated with special
permissions). Some applications (like Krita) may want to do things like
take control of your entire drawing tablet.


as i said. can of worms. :)

[snip] screen capture is a nasty one and for now - no. no access [snip]


Wayland has been in the making for 4 years. Fedora is thinking about
shipping it by default. We need to quit with this "not for now" stuff
and start thinking about legitimate use-cases that we're killing off
here. The problems are not insurmountable and they are going to kill
Wayland adoption. We should not force Wayland upon our users, we should
make it something that they *want* to switch to. I personally have
gathered a lot of interest in Sway and Wayland in general by
livestreaming development of it from time to time, which has led to more
contributors getting in on the code and more people advocating for us to
get Wayland out there.


you have no idea how many non-security-sensitive things need fixing first
before addressing the can-of-worms problems. hell nvidia just released drivers
that requrie compositors to re-do how they talk to egl/kms/drm to work that's
not compatible with existing drm dmabuf buffers etc. etc.

there's lots of things to solve like window "intents/tags/etc." that are not
security sensitive.

even clients and decorations. tiled wm's will not want clients to add
decorations with shadows etc. - currently clients will do csd because csd is
what weston chose and gnome has followed and enlightenment too. kde do not want
to do csd. i think that's wrong. it adds complexity to wayland just to "not
follow the convention". but for tiling i see the point of at least removing the
shadows. clients may choose to slap a title bar there still because it's useful
displaying state. but advertising this info from the compositor is not
standardized. what do you advertise to clients? where/when? at connect time? at
surface creation time? what negotiation is it? it easily could be that 1
screen or desktop is tiled and another is not and you dont know what to tell
the client until it has created a surface and you know where that surface would
go. perhaps this might be part of a larger set of negotiation like "i am a
mobile app so please stick me on the mobile screen" or "i'm a desktop app -
desktop please" then with the compositor saying where it decided to allocate
you (no mobile screen available - you are on desktop) and app is expected to
adapt...

these are not security can-of-worms things. most de's are still getting to the
point of "usable" atm without worrying about all of these extras yet.

there's SIMPLE stuff like - what happens when compositor crashes? how do we
handle this? do you really want to lose all your apps when compositors crash?
what should clients do? how do we ensure clients are restored to the same place
and state? crash recovery is important because it is always what allows
updates/upgrades without losing everything. THIS stuff is still "un solved".
i'm totally not concerned about screen casting or vnc etc. etc. until all of
these other nigglies are well solved first.

for the common case the DE can do it. for screen sharing kind of
things... you also need input control (take over mouse and be able to
control from app - or create a 2nd mouse pointer and control that...
keyboard - same, etc. etc. etc.). [snip]


Screen sharing for VOIP applications is only one of many, many use-cases
for being able to get the pixels from your screen. VNC servers,
recording video to provide better bug reports or to demonstrate
something, and so on. We aren't opening pandora's box here, just
supporting video capture doens't mean you need to support all of these
complicated and dangerous things as well.


apps can show their own content for their own bug reporting. for system-wide
reporting this will be DE integrated anyway. supporting video capture is a a
can of worms. as i said - single buffer? multiple with metadata? who does
conversion/scaling/transforms? what is the security model? and as i said - this
has major implications to the rendering back-end of a compositor.

nasty little thing and in implementing something like this you are also
forcing compositors to work ion specific ways - eg screen capture will
likely FORCE the compositor to merge it all into a single ARGB buffer for
you rather than just assign it to hw layers. or perhaps it would require
just exposing all the layers, their config and have the client "deal with
it" ? but that means the compositor needs to expose its screen layout. do
you include pointer or not? compositor may draw ptr into the framebuffer.
it may use a special hw layer. what about if the compositor defers
rendering - does a screen capture api force the compositor to render when
the client wants? this can have all kinds of nasty effects in the rendering
pipeline - for use our rendering pipeline iss not in the compositor but via
the same libraries clients use so altering this pipeline affects regular
apps as well as compositor. ... can of worms :)


All of this would still be a problem if you want to support video
capture at all. You have to get the pixels into your encoder somehow.
There might be performance costs, but we aren't recording video all the
time.


there's a difference. when its an internal detail is can be changed and
adapted to how the compositor and its rendering subsystem work. when its a
protocol you HAVE to support THAT protocol and the way THAT protocol defines
things to work or apps break.

keep it internal - you can break at will and adapt as needed, make it public
and you are boxed in by what the public api allows.

We can make Wayland support use-cases that are important to our users or
we can watch them stay on xorg perpetually and end up maintaining two
graphical stacks forever.


priorities. there are other issues that should be solved first before worrying
about the pandoras box ones.

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster rasterman com

Follow-Ups:
- Re: Collaboration on standard Wayland protocol extensions
  - From: Drew DeVault

References:
- Collaboration on standard Wayland protocol extensions
  - From: Drew DeVault
- Re: Collaboration on standard Wayland protocol extensions
  - From: The Rasterman
- Re: Collaboration on standard Wayland protocol extensions
  - From: Drew DeVault

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]