Re: Collaboration on standard Wayland protocol extensions

From: Carsten Haitzler (The Rasterman) <raster rasterman com>
To: Drew DeVault <sir cmpwn com>
Cc: "Kwin, NET API, kwin styles API, kwin modules API" <kwin kde org>, desktop-devel-list gnome org, wayland-devel lists freedesktop org
Subject: Re: Collaboration on standard Wayland protocol extensions
Date: Tue, 29 Mar 2016 11:31:01 +0900

On Mon, 28 Mar 2016 10:55:05 -0400 Drew DeVault <sir cmpwn com> said:

On 2016-03-28 11:03 PM, Carsten Haitzler wrote:

should we? is it right to create yet another rsecurity model in userspace
"quickly" just to solve things that dont NEED solving at least at this
point.


I don't think that the protocol proposed in other branches of this
thread is complex or short sighted. Can you hop on that branch and
provide feedback?


my take on it is that it's premature and not needed at this point. in fact i
wouldn't implement a protocol at all. *IF* i were to allow special access, i'd
simply require to fork the process directly from compositor and provide a
socketpair fd to this process and THAT fd could have extra capabilities
attached to the wl protocol. i would do nothing else because as a compositor i
cannot be sure what i am executing. i'd hand over the choice of being able to
execute this tool to the user to say ok to and not just blindly execute
anything i like.

adding watermarks can be done after encoding as another pass (encode in high
quality). hell watermarks can just be a WINDOW (surface) on the screen. you
don't need options. ass for audio - not too hard to do along with it. just
offer to record an input device - and choose (input can be current mixed
output or a mic ... or both).


You're still not grasping the scope of this. I want you to run this
command right now:

man ffmpeg-all

Just read it for a while. You're delusional if you think you can
feasibly implement all of these features in the compositor. Do you


all a compositor has to do is be able to capture a video stream to a file. you
can ADD watermarking, sepia, and other effects later on in a video editor. next
you'll tell me gimp is incapable of editing image files so we need programmatic
access to a digital cameras ccd to implement effects/watermarking etc. on
photos...

honestly want your screen capture tool to be able to add a watermark?


no - this can be done in a video editing tool later on. just record video at
high quality so degradation is not an issue.

How about live streaming, some people add a sort of extra UI to read off
donations and such. The scope of your screen capture tool is increasing
at an alarming rate if you intend to support all of the features


no. i actually did not increase the scope. i kept it simple to "compositor can
write a file". everything else can be done in a post-processing task. that file
may include captured audio at the same time from a specific audio input.

currently possible with ffmpeg. How about instead we make a simple
wayland protocol extension that we can integrate with ffmpeg and OBS and
imagemagick and so on in a single C file.


i'm repeating myself. there are bigger fish to fry.

exactly what you describe is how e works out of the box. no sscripts needed.
requiring people write script to do their screen configuration is just
wrong. taking the position of "well i give up and won't bother and will
just make my users write scripts instead" iss sticking your head in the
sand and not solving the problem. you are now asking everyone ELSE who
writes a compositor to implement a protocol because YOU wont solve a
problem that others have solved in a user friendly manner.


What if I want my laptop display to remain usable? Right now I'm docked


eh? ummm that is what happens - unless you close the lid, then internal display
is "disconnected".

somewhere else and I actually do have this scenario - my laptop is one
of my working displays. How would I configure the difference between
these situations in your tool? What if I'm on a laptop with poorly
supported hardware (I've seen this before) where there's a limit on how
many outputs I can use at once? What if I want to write a script where I
put on a movie and it disables every output but my TV automatically? The
user is losing a lot of power here and there's no way you can satisfy
everyone's needs unless you make it programmable.


not true. this can be encapsulated without it being programmable. i have yet to
find a laptop that cannot run all its outputs, but the general limitation can
be accounted for - eg via prioritization. if you have 4 outputs and only 3 can
work at a time - then chose the 3 with the highest priority - adjust priority
of screens to have what you want.

Base your desktop's tools on the common protocol, of course. Gnome
settings, KDE settings, arandr, xrandr, nvidia-settings, and so on, all
seem to work fine configuring your outputs with the same protocol today.
Yes, the protocol is meh and the implementation is a mess, but the
clients of that protocol aren't bad by any stretch of the imagination.


no tools. why do it? it's built in. in order for screen config "magic" to
work  set of metadata  attached to screens. you can set priority (screens
get numbers from highest to lowest priority at any given time allowing
behaviour like your "primary" screen to migrate to an external one then
migrate back when external monitor is attached etc.) sure we can start
having that metadata separate but then ALTERNATE TOOLS won't be able to
configure it thus breaking the desktop environment not providing metadata
and other settings associated with a display. this breaks functionality for
users who then complain about things not working right AND then the
compositor has to now deal with these "error cases" too because a foreign
tool will be messing with its data/setup.


Your example has a pretty straightforward baseline - the "default"
profile. Even so, we can design the protocol to make the custom metadata
options visible to the tools, and the tools can then provide the user
with options to configure that as well.


a protocol with undefined metadata is not a good protocol. it's now goes blobs
of data that are opaque except to specific implementations., this will mean
that other implementations eventually will do things like strip it out or damage
it as they don't know what it is nor do they care.

as above. i have seen screen configuration used and abused over the years
where i just do not want to have a protocol for messing around with it for
any client. give them an inch and they'll take a mile.


Let them take a mile. _I_ want a mile. Here's an old quote that I think
is always relevant:

UNIX was not designed to stop its users from doing stupid things, as
that would also stop them from doing clever things.


but it isn't the user - it's some game you download that you cannot alter the
code or behaviour of that then messes everything up because its creator only
ever had a single monitor and didn't account for those with 2 or 3.

and that's perfectly fine - that is your choice. do not force your choice on
other compositors. you can implement all the protocol you want in any way
you want for your wm's tools.


Why do we have to be disjointed? We have a common set of problems and we
should strive for a common set of solutions.


because things like output configuration i do not see as needing a common
protocol. in fact it's desirable to not have one at all so it cannot be abused
or cause trouble.

gnome does almost everything with dbus. they love dbus. a lot of gnome is
centred around dbus. they likely will choose dbus to do this. likely. i
personally wouldn't choose to use dbus.


Let's not speak for Gnome. They're copied on this thread, they'll speak
for themselves.


my point is that not everyone chooses the same solution as you. not everyone
has the same problem and needs to solve it or WANTS to solve it the same way.

primary display? What about applications that use the entire output for


the app can simply not request to present on their "presentation" screen...
or the user would mark their primary screen (internal on laptop maybe) AS
their presentation screen - more metadata to be held by compositor.


Then we're back to the very thing you were criticising before - making
the applications implement some sort of switch between using a
"presentation" output and using some other kind of output. It would be a
lot less complicated if the application asked to go full screen and the
compositor said "hey, this app wants to be full screen, which output
would you like to put it on?"


that needs ZERO protocol extending. there already is a fullscreen request in
xdg shell. this is a compositor implementation detail. if all you want to do is
ask the user where to place the fullscreen window. if you want to open multiple
windows and have them on the most appropriate screen by default without asking
the user, then you need a little metadata. asking the app to explicitly define
the output simply means you now have N possible ways this could work depending
on each and every app. leave it to the compositor to decide along with hints
that tell the compositor the likely usage purpose of the window. a user can
always move it somewhere else via the compositor (hotkey, alt+left mouse drag
to somewhere else or some other mechanism).

but we are talking things like output control/configuration - why does a
presentation app need this control? control the actual setup of the output or
even explicitly define exactly what output (by name, id, number, etc.) to go
for? why does an app need to be able to target a specific output
programatically rather than simply give the intent/purpose of the
surface/window?

now ALL presentation tools behave the same -  you dont have to reconfigure
each one separately and deal with the difference and lack or otherwise of
features. it's done in 1 place - compositor, and then all apps that want to
do a similar thing follow and work "as expected". far better than just
ignoring the issue. you yourself already talked about extra
tags/hints/whatever - this is one of those.


I think I'm getting at something here. Does the workflow I just
described satisfy everyone's needs for this?

because this require clients DEFINING screen layout. wayland was
specifically designed to HIDE THIS. if the compositor displayed a screen
wrapped around a sphere in real life in a room - then it doesn't have
rectangles... how will an app deal with that? what if the compositor is
literally a VR world with surfaces wrapped around spheres and cubes - the
point of wayland's design was to hide this info from clients completely so
the compositor decides based on environment, not each and every client.
this was a basic premise/design in wayland from the get go and it was a
good one. letting apps break this abstraction breaks this design.


In practice the VAST majority of our users are going to be using one or
more rectangular displays. We shouldn't cripple what they can do for the
sake of the niche. We can support both - why do we have to hide
information about the type of outputs in use from the clients? It
doesn't make sense for an app to get fullscreened in a virtual reality
compositor, yet we still support that. Rather than shoehorning every
design to meet the least common denominator, we should be flexible.


they are not crippled. that's the point. in virtual reality fullscreen makes
sense as a "take over thew world", not take over the output to one eye.for
monitors on a desktop it makes sense to take over that monitor but not others.
so it depends on context and the compositors job is to interpret/manage/deal
with that context.

No. Applications want to be full screen or they don't want to be. If
they want to pick a particular output, we can easily let them do so.


i don't know about you.. but fullscreen to enlightenment means you use up
ONE SCREEN. [snip]


I never said that fullscreen means multiple screens. No clue where
that's coming from.


then why does this presentation tool need to be able to configure outputs - eg
define which screen views which part of their window spanning all outputs? i
see no other purpose of having configuration control of outputs for a
presentation tool.

what makes sense is an app hints at the purpose of its window and opens n
windows (surfaces). it can ask for fullscreen for each. the hints would
allow the compositor to choose which screen the window/surface is assigned
to.


Hinting doesn't and cannot capture all of the use cases. Just letting
the client say what it wants does.


clients explicitly saying what they want leads to broken scenarios. the game
dev who has never had > 1 screen and thus messes up users multi screen setups
because they never knew of nor cared about this situation. a HINT allows
interpretation to adapt the scenario nicely and make things work "properly".

the "i'd like to be fullscreen" hint from xdg has been a godsend - it doesn't
allow for clients to go "well i want to be at 50,80 and at 1278x968" (though
other bits of x do). apps used to do things like query root window size, create
override-redirect window , grab kbd and mouse and then display ... even though
root window may span many monitors and some parts of the rot window geom may
not be visible as no screen views that because the guy didn't know about randr
and such. worse they would play with xvidtune that only did 1 screen and thus
mess up all your screen config... because a protocol was invented that allows
EXPLICIT control and x HAD to implement explicit control. the fullscreen netwm
hint has drastically improved things as a high level hint allowing the wm to
interpret fullscreen in a way that makes sense given the scenario.

by the same token anything we do in wayland should be done at this higher level
hinting level. anything else is a recipe for disaster. it's not learning the
lessons of the past.

Gnome calculator doesn't like being tiled: https://sr.ht/Ai5N.png


i think the problem is you are not handling min/max sizing of clients
properly. :) you need to fix sway. gnome calculator is not sizing up its
buffer on surface size. that is a message "i can't be bigger than this -
this is my biggest size. deal with is". you need to deal with it. eg - pad
it and make it sized AT the buffer size :)


This is harmful to tiling window managers in general. The window manager
arranges the windows, not the other way around. You can't have tiling


sorry. neither in x11 nor in wayland does a wm/compositor just have the freedom
to resize a window to any size it likes WITHOUT CONSEQUENCES. in x11 min/max
size hints tell the wm the range of sizes a window can be sensibly drawn/laid
out with. in wayland it's communicated by buffer size. if you choose to ignore
this then you get to deal with the consequences as in your screenshot.

i would not just blindly ignore such info. i'd either pad with black/background
and keep to the buffer size or at least scale while retaining aspect ratio (and
pad as needed but likely less).

interestingly now you complain about clients having EXPLICIT control and you
say "oh well no ... this is bad for tiling wm's" ... yet when i explain that
having output configuration control etc. etc. is harmful it's something that
SHOULD be allowed for clients... (and where the output isn't even a client
resource unlike the buffers that they render which is one).

window management if you can't have the compositor tell the clients what
size to be. There's currently no metadata to tell the compositor that a
surface is strict about its geometry. Most applications handle being
given a size quite well and will rearrange/rerender itself to
compensate. Things like gnome-calcualtor are the exception, not the
rule.


yes there is - the buffer size of the next frame. your surface size is a
"request" to client for that size. the response will be a new buffer or some
given size (or maybe no new buffer at all). you THEN deal with this new size. :)

xdg shell should be handling these already - except dmenu. dmenu is
almost a special desktop component. like a shelf/panel/bar thing.


dmenu isn't the only one, though, that may want to arrange itself in
special ways. Lemonbar and rofi also come to mind.


all of these basically are "desktop components" ala
taskbars/shelves/panels/whatever - i know that for e we don't want to
support such apps. these are built in. i don't know what gnome or kde think
but these go against their design as an integrated desktop environment. YOU
need these because your compositor has no such feature itself. the bigger
desktops don't need it. they MAY support it - may not. i know i don't want
to. :)


Users should be free to choose the tools they want. dmenu is much more
flexible and scriptable than anything any of the DEs offer in its place


that is your wm's design. that is not the design of others. they want something
integrated and don't want external tools.

- you just pipe in a list of things and the user picks one. Don't be
fooled into thinking that whatever your DE does for a given feature is
the mecca of that feature. Like you were saying to make other points -


no - but i'm saying that this is not a COMMON feature among all DEs. different
ones will work differently. gnome 3's chosen design these days is to put it
into gnome shell via js extensions, not the gnome 2 way with a separate panel
process (ala dmenu). enlightenment does it internally too and extend
differently. my point is that what you want here is not universal.

there are fewer contributors to each DE than you might imagine. DEs are


that is exactly what i said in response to you saying that "we have all the
resources to do all of this" when i said we don't... :/ we don't - resources
are already expended elsewhere.

spread too thin to make the perfect _everything_. But some projects like
dmenu are small and singular in their focus, and maintained by one or
two people who put in a much larger amount of effort than is put in by
DE contributors on the corresponding features of that DE.

Be flexible enough for users to pick the tools they want.


a lifetime of doing wm's has taught me that this approach is not the best. you
end up with a limiting and complex protocol to then allow taskbars, pagers and
so on to be in "dmenus" of this world. this is how gnome 1.x and 2.x worked. i
added the support in e long ago. i learned that it was a limiter in adding
features as you had to conform to someone elses idea of what virtual desktops
are etc.

these panels/taskbars/shelves/whatever are best being closely integrated into
the wm.

YOU choose not to integrate. the other major DEs come already integrated with
these. this is not a universal solution everyone should support. you can come
up with your own extension and encourage people to support it in their demnu's
etc. - if another DE wants to support this then they can implement the same
extension.

i don't know osu - but i see no reason krita needs to configure a tablet. it
can just deal with input from it. :)

input is very sensitive. having done this for years and watched how games
like to turn off key repeat then leave it off when they crash... or change
mouse accel then you find its changed everywhere and have to "fix it" etc.
etc. - i'd be loathe to do this. give them TOO much config ability anbd it
can become a security issue.


Let's change the tone of the input configuration discussion. I've come
around to your points about providing input configuration in general to
clients, let's not do that. I think the only issue we should worry about
for input at this point is fixing the pointer-constraints protocol to
use our new permissions model.


that's very reasonable. :)

Why do those things need to be dealt with first? Sway is at a good spot
where I can start thinking about these sorts of things. There are
enough people involved to work on multiple things at once. Plus,
everyone thinks nvidia's design is bad and we're hopefully going to see
something from them that avoids vendor-specific code.


because these imho are far more important. you might be surprised at how few
people are involved.


These features have to get done at some point. Backlog your
implementation of these protocols if you can't work on it now.


that's what i'm saying. :)

not so simple. with more of the ui of an app being moved INTO the border
(titlebar etc.) this is not a simple thing to just turn it off. you then
turn OFF necessary parts of the ui or have to push the problem out to the
app to "fallback".


You misunderstand me. I'm not suggesting that these apps be crippled.
I'm suggesting that, during the negotiation, they _object_ to having the
server draw their decorations. Then other apps that don't care can say
so.


aaah ok. so compositor adapts. then likely i would express this as a "minimize
your decorations" protocol from compositor to client, client to compositor then
responds similarly like "minimize your decorations" and compositor MAY choose
to not draw a shadow/titlebar etc. (or client responds with "ok" and then
compositor can draw all it likes around the app).

only having CSD solves all that complexity and is more efficient
than SSD when it comes to things like assigning hw layers or avoiding
copies of vast amounts of pixels. i was against CSD to start with too but i
see their major benefits.


I don't want to rehash this old argument here. There's two sides to this
coin. I think everyone fully understands the other position. It's not
hard to reach a compromise on this.


it's sad that we have to have this disagreement at all. :) go on. join the dark
side! :) we have cookies!

In Wayland you create a surface, then assign it a role. Extra details
can go in between, or go in the call that gives it a role. Right now
most applications are creating their surface and then making it a shell
surface. The compositor can negotiate based on its own internal state
over whether a given output is tiled or not, or in cases like AwesomeWM,
whether a given workspace is tiled or not. And I don't think the
decision has to be final. If the window is moved to another output or
really if any of the circumstances change, they can renegotiate and the
surface can start drawing its own decorations.


yup. but this signalling/negotiation has to exist. currently it doesnt. :)


We'll make this part of the protocols we're working on here :)


this i can agree on. :)

you aren't going to talk me into implementing something that is important
for you and not a priority for e until such a time as i'm satisfied that
the other issues are solved. you are free to do what you want, but
standardizing things takes a looong time and a lot of experimentation,
discussion, and repeating this. we have resources on wayland and nothing
you described is a priority for them. there are far more important things
to do that are actual business requirements and so the people working need
to prioritize what is such a requirement as opposed to what is not.
resources are not infinite and free.


Like I said before, put it on your backlog. I'm doing it now, and I want
your input on it. Provide feedback now and implement later if you need
to, but if you don't then the protocols won't meet your needs.

let me complicate it for you. let's say i'm playing a video fullscreen. you
now have to convert argb to yuv then encode when it would have been far more
efficient to get access directly to the yuv buffer before it was even
scaled to screen size... :) so you have just specified a protocol that is
by design inefficient when it could be more efficient.


What, do you expect to tell libavcodec to switch pixel formats
mid-recording? No one is recording their screen all the time. Yeah, you
might hit performance issues. So be it. It may not be ideal but it'll
likely be well within the limits of reason.


you'll appreciate what i'm getting at next time you have to do 4k ... or 8k
video and screencast/capture that. :) and have to do miracast... on a 1.3ghz
arm device :)

yes - but when, how often and via what mechanisms pixels get there is a very
delicate thing.


And yet you still need to convert the entire screen to a frame and feed
it into an encoder, no matter what. Feed the frame to a client instead.


is the screen a single frame or multiple pieced together by scanout hw
layers? :) what is your protcol/interface to the "screen stream". if you have
it be a simple "single buffer" then you are going to soon enough run into
issues. :)

so far we don't exactly have a lot of inter-desktop co-operation happening.
it's pretty much everyone for themselves except for a smallish core
protocol.


Which is ridiculous.

do NOT try and solve security sensitive AND performance sensitive AND design
limiting/dictating things first and definitely don't do it without everyone
on the same page.


I'm here to get everyone on the same page. Get on it.


let's work on the things we do have in common first. :)


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster rasterman com

Follow-Ups:
- Re: Collaboration on standard Wayland protocol extensions
  - From: Drew DeVault

References:
- Collaboration on standard Wayland protocol extensions
  - From: Drew DeVault
- Re: Collaboration on standard Wayland protocol extensions
  - From: The Rasterman
- Re: Collaboration on standard Wayland protocol extensions
  - From: Drew DeVault
- Re: Collaboration on standard Wayland protocol extensions
  - From: The Rasterman
- Re: Collaboration on standard Wayland protocol extensions
  - From: Drew DeVault
- Re: Collaboration on standard Wayland protocol extensions
  - From: The Rasterman
- Re: Collaboration on standard Wayland protocol extensions
  - From: Drew DeVault

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]