Skip to content

Latest commit

 

History

History
254 lines (154 loc) · 16.3 KB

CG-05-23.md

File metadata and controls

254 lines (154 loc) · 16.3 KB

WebAssembly logo

Agenda for the May 23rd video call of WebAssembly's Community Group

  • Where: zoom.us
  • When: May 23rd, 4pm-5pm UTC (May 23rd, 9am-10am Pacific Time)
  • Location: link on calendar invite

Registration

None required if you've attended before. Send an email to the acting WebAssembly CG chair to sign up if it's your first time. The meeting is open to CG members only.

Logistics

The meeting will be on a zoom.us video conference. Installation is required, see the calendar invite.

Agenda items

  1. Opening, welcome and roll call
    1. Opening of the meeting
    2. Introduction of attendees
  2. Find volunteers for note taking (acting chair to volunteer)
  3. Proposals and discussions
    1. GC & strings (presentation + discussion) [Adam Klein, 45min]
  4. Closure

Agenda items for future meetings

Followup discussion scheduled for June 6

Meeting Notes

Attendees

  • Conrad Watt
  • Anne van Kesteren
  • Adam Klein
  • Justin Michaud
  • Alon Zakai
  • Slava Kuzmich
  • Jeff Charles
  • Francis McCabe
  • Yuri Iozzelli
  • Yury Delendik
  • Paolo Severini
  • Andrew Brown
  • Ilya Rezvov
  • Calvin Prewitt
  • Ryan Hunt
  • Luke Wagner
  • Asumu Takikawa
  • Nick Ruff
  • Sam Clegg
  • Andreas Rossberg
  • Zalim Bashrorov
  • Jakob Kummerow
  • Matthias Liedtke
  • Ashley Nelson
  • Alex Crichton
  • Johnnie Birch
  • Emanuel Ziegler
  • Derek Schuff
  • Deepti Gandluri
  • Bruce He
  • Heejin Ahn
  • dbezhetskov
  • Sergei Rubanov
  • Andy Wingo
  • Brendan Dahl
  • Nick Fitzgerald
  • Petr Penzin

Discussions

GC & strings

AK: presenting slides

AVK: in principle stringref allows for zero cost but that assumes that the host has a particular representation for what you invoke, this could change over time. E.g in firefox the CSS parser changed from WTF16 to UTF8. if there were existing callers, it would be a perf regression.

AK: Yes that’s true, platform owners would need to take this into consideration when making sure what APIs they chose. There’s always a tradeoff when changes are made - it allows zero copy, it doesn’t guarantee zero copy

AVK: yeah, i guess are there more engines that want to use UTF8 internally because it uses less memory. Would this prevent that from happening?

JM: If I was going to describe it, it's more about what is the purpose of Wasm? We don’t want abstractions that hide their cost. We don’t want an abstraction whose cost changes dramatically, no personal strong opinion yet, but definitely interesting

AK: yeah it does echo that I listed this among the cons for stringref

RH: question about the Sheets perf result. User space vs. string optimized? What does that mean

AK: this is character by char copying from i16 array to a string, via a buffer. Vs string.new_from_wtf16. I.e. the handwritten approach vs stringref conversion

RH: were they using stringref across their runtime? Or only when they’re boundary crossing?

AK: For this we’re only using stringref on the boundary, an experiment we ran to isolate different aspects of the performance here. They also have a configuration where they use stringref everywhere but it’s less clear that it makes a big difference.

ZB: In the case of Kotlin we use stringref for the internal representation, and we use the same references for the browser APIs

AK: Pulled this from the jetbrains slides at I/O, the dom benchmarks are not microbenchmarks, but when you’re actually in a realistic environment, so stringref was a big difference

JM: For engines that would have to do the copy between UTF16 and UTF8 anyway, do you know what the numbers look like? Neutral or basically an improvement?

AK: I don’t know for sure. V8 and chrome don’t use UTF8 internally in most places. We could guess but no hard numbers. It would be interesting if there were engines where they do use UTF8 in some places, would be interesting in practice.

AVK: One thing you could measure is cost for UTF8 languages, that might also be tricky

AK: there’s discussion later about the difference between copying in userspace wasm vs doing it in the engine, I think it would be cheaper in the engine, will address that.

ZB(via chat): Kotlin/Wasm demo with strings https://zal.im/wasm/dbmonster/ https://zal.im/wasm/dbmonster-stringref/ Same compiled with Kotlin/JS https://zal.im/wasm/dbmonster-kjs/ Pure JS implementation https://zal.im/dom-monster/

AVK: There’s no support for latin-1 string?

AK: the stringref doesn’t currently expose latin-1. You can get a view over a latin-1 string but it’s not currently mentioned in the proposal. Mentioned stringref in this proposal as a solution to a particular problem, ex. For regular expressions, you run into this very early, to be able to compete with the existing JS engine. Obviously stringref provides a lot more than what you need for that particular use case

JK(via chat): Anne: the key principle of the stringref proposal is to make it possible to avoid avoidable encoding changes. That includes allowing engines to use whatever encoding (including multiple encodings, including changing their preferred encoding) under the hood. In your example, a Wasm module creating a string from UTF-8 data can reasonably expect that no re-encodings will happen when sending such strings to Firefox's UTF-8 based CSS parser. (At least in the long term, assuming engines have had enough time to optimize.)

AVK(via chat): I understand. I'm worried about the ossification risks.

FM: can I ask about Java as an example: any language has its own semantics for strings. J2wasm doesn’t need to honor Java semantics, it has its own. When you actually try to honor java semantics, that will introduce additional complexity to any support for strings. Do you know whether there are issues with “true” java semantics that would not be supported by stringref?

AK: Do you have an example of something that might not be supported?

FM: Java has a very rich locale/i18n library. It’s not fair to push the implementation burden of that into the platform.

ZB: It is not about semantics, but more about the API that you have in Java, isn’t it?

FM: you’ll replace the existing APIs in Java with those provided by stringref

ZB: You can use basic things and implement other APIs on top.

AK: for example locale toLower, thats not in stringref, it’s in the intl library in JS. I would expect stringref to be low-level enough that issues about library support would be out of scope.

CW: Even in linear memory, you would have the same overhead of copying it in and copying out back?

AR: I don’t buy that it’s as simple as that. There's a very strong slippery slope. This is already pulling some amount of unicode logic into the engine/platform and people seem to be already requesting more. So this is a very real risk that we do end up putting things from Java into the platform. These arguments are a fallacy because you could use them to argue for integrating any feature from JS or another language. So if you want to integrate with that then of course you'd benefit from adding those elements. But this is The opposite of what we want to do for wasm, where we want to add as little on top of the hardware as we can get away with. It has big implications beyond looking at this feature.

CW: even with some existing features, that sometimes we’ve understood the power of exposing the engine APIs for exceptions, GC for example, where the engine can operate much faster than in user space

AR: true but even there we try to not hide costs, but this proposal hides a lot of cost, including allocations, assumes rope implementation strings, hidden branches, etc. not true of other proposals (other than GC). the GC instructions have been carefully designed not to hide costs.

AK: First of all, I want to object to the claim these are fallacious benefits, these are actual benefits are strings the question is whether to add the strings or not.

AR: the fallacy is to use this arguments to argue for certain features.

CW: In abstract, but we’re always going to look at quantitative benefits, which may outweigh the aesthetics

AR: The narrative that this is about aesthetics only I want to pushback on that, it’s about philosophical direction of the whole language which we took for a reason, because that’s mainly the value proposition of Wasm, that it not try to do what VMs before did, not build in high-level things when it can get away with

CW: What quantitative benefit is necessary to compromise on that philosophy?

JM: i just wanted to express my support for Adam, nothing in particular.

AK: More philosophy to be had. Happy to continue that discussion maybe further down in the presentation, AR does that work for you?

NR(via chat): Is there an impact on memory usage? (via voice): Wondering, is there any kind of hidden implication in memory in either direction.

AK: In zero copy case, you’re avoiding duplicating memory, depending on what things are in memory, it might have significant or not memory savings, but it does offer the opportunity to memory savings

AW(via chat): there is also the 1-byte utf-16 thing that most js engines do. but that’s marginal i think

AVK (via chat): Yeah, that's the "Latin1" strings I asked. about.

AK presenting import-based strings

AR: it’s worth pointing out that type imports are not strictly necessary. You can get pretty far with just externref. I would be interested in seeing how far you get, especially given all the other stuff JS engines do.

CW: Would this still be externref with pre-imports though?

AR: If you don’t use pre-imports, I would expect externref to be more expensive with all the deopts, Of all the bullets on your previous slide, I think only the first 2 would remain with pre-imports

AK: Jakob did do measurements with externref without imports and for some things that is okay but for things like accessing individual code units, the overhead of the type check is big enough to be noticeable

AR: as a thing to get off the ground… as you said we want type imports eventually. But this might be good enough to make progress as an intermediate solution, with a path toward getting the best solution later on.

AK: presenting stringref vs import-based strings

AVK(in chat): Aside: I'd kinda dislike if we further double downed on charCodeAt(). JavaScript itself is slowly moving away from it. Nobody really wants code units.

AR: assuming we have preimports, how advanced would these optimizations be? Afaics you only have to specialize certain calls, but then you’d do exactly the same thing as with instructions? All the other advance things go away with pre-imports

AK: We don’t know what pre-imports look like right now

AR: The baseline thing would be very simple, the only thing we need to do is extend the JS API to get a JS object and then you have that available when you compile instead of just when you link. As simple as that and that would be an easy change in terms of spec work and I would hope even implementation.

AK: i think there’s a bunch of implementation details that depend on the separate compilation, i’m not sure it would be that simple. But we don’t yet have practical experience with that yet. Also, we’ve heard folks say they don’t want engines to be required to do hidden advanced optimizations that aren’t specified

FM: Don’t you have the same kind of problem if you are accessing a JS String and you do an index that is represented as a rope? Then you have an under the covers conversion. So you get this unexpected O(n) operation.

AK: that’s this con ”user code doesn’t control underlying representation”

FM: Have the same surprise in both ways of doing it.

AR: The whole brittle performance thing seems to be orthogonal.

AK: stringref performance isn’t nearly as brittle as this.

AR: Why not? In In principle you have the exact same knowledge in the engine and the user has the same logic in both use cases.

AK there’ no need for the toolchain to import a bunch of random operations from the host, they are just opcodes that exist in the spec

AR: Sure but you will import these things and you know what they are, so the end result is the same. Yes the tool has to do an extra step here but other than that, I don’t see that it’s brittle in principle.

CW: i’m a little surprised you brought it up. My understanding is that even with stringref, you want fast regex which you have to get from the host.

AK: We’re not inlining the JIT into wasm, the fastpass thing is the strings being passed to and from that function. I don’t disagree there are similarities but it’s not totally orthogonal, it’s not as brittle in the import case because you can’t make that mistake if you have an opcode you call.

RH: That was my understanding of the brittleness, when you have an import base scheme you have to specify how that works. If there’s a WebAssembly.string type, you have to specify what it does, e.g. if someone monkeypatches the methods then everything is different

AR: You are talking about monkeypatching on the JS side? Yeah okay, I don’t know if that’s something we need to worry about and if we did, we could make these objects frozen, right?

RH: If we’re adding new objects for every operation specialized but if you’re pattern matching things into the existing JS environment.

AR: you have to pass in an import object. Then the JS API could provide a WebAssembly.string object, and if it’s frozen then you can’t mess with it. So i don’t think it's a serious concern either way.

RH: Don’t want to focus on this too much.

JK(chat): JS charCodeAt isn't what matters. Operations like java.lang.String.hashCode() depend on iterating over characters. Implementing library functions that (for good reason, because they're source language specific) won't be in Wasm often requires iterating over characters. Iterating over codepoints (instead of code units) doesn't really affect the performance situation in these tight loops over strings.

AZ(chat): As an example of a potential advanced optimization (beyond just replacing calls with code), if after runtime inlining some code ends up as "a" + "b" + c (two added string constants + an added unknown string) then that can be optimized to "ab" + c. (I don't know how important that specifically is, but just as an example of the space.)

AK presenting “lack of polyfill => unintended consequences”, and “optimized copy-at-the-border API”

AR: you’re making one assumption which is that all languages can equally benefit. But you’re measuring on just the language that just happen to have the same string semantics as JS. this is true for certain major languages but not true for the majority if you just count them. So most will end up doing the third thing anyway. Is it fair to privilege some of these languages, at the risk of having this in wasm in the future when this utf16 legacy goes away? You're supporting UTF8 but thats also not necessarily reflective of what languages do.

AK: Yes UTF8 is supported and that covers a lot today but there is certainly a tradeoff between supporting the languages targeting us and the unknown future encodings we haven’t thought of yet. Of course stringref can be modified to support additional views and the implementations can tweak to support the demands of their users.

AR: even in the case where i’m a UTF8 language, don't i essentially also pay the copying cost on the boundary as long as JS is optimized for UTF16, depending on what it does in the future?

AK: There is a lot of ‘ifs’ there, if your language discourages users from indexing into strings, you might be able to get away with not exposing the UTF16-ness or engines may decide to implement a UTF8 encoding alongside a WTF16 encoding. I think there is a lot of freedom in the StringRef approach. I would love for someone who doesn’t use WTF16 to take a look and generate some numbers, that would be awesome.

RH: that was very helpful. We should probably extend and have another meeting in the future, since there seems to be more to discuss, would be helpful to have it live

ZB(in chat): do we need one more meeting for discussion?

CW: The agenda for the next few meetings feels light.

DG: We don’t have anything, so we can cover some time in the next couple of weeks for this discussion.

CW: At time, declaring this meeting over. We’ll get follow ups scheduled.