Chrome 91: Handwriting Recognition, WebXR Plane Detection and More

Thursday, April 22, 2021

Unless otherwise noted, changes described below apply to the newest Chrome beta channel release for Android, Chrome OS, Linux, macOS, and Windows. Learn more about the features listed here through the provided links or from the list on ChromeStatus.com. Chrome 91 is beta as of April 22, 2021.Origin TrialsThis version of Chrome introduces the origin trials described below. Origin trials allow you to try new features and give feedback on usability, practicality, and effectiveness to the web standards community. To register for any of the origin trials currently supported in Chrome, including the ones described below, visit the Microsoft Edge Origin Trials Developer Console. New Origin TrialsDeclarative Link Capturing for PWAsThe new Web App Manifest member called capture_links controls what happens when the user navigates to a page within scope of an installed web app. It allows sites to automatically open a new PWA window when the user clicks a link to their app or to have a single window mode like mobile apps. Sign up for the origin trial and learn more on the origin trial dashboard.WebTransportWebTransport is a protocol framework that enables clients constrained by the Web security model to communicate with a remote server using a secure multiplexed transport.Currently, Web application developers have two APIs for bidirectional communications with a remote server: WebSockets and RTCDataChannel. WebSockets are TCP-based, thus having all of the drawbacks of TCP (head of line blocking, lack of support for unreliable data transport) that make it a poor fit for latency-sensitive applications. RTCDataChannel is based on the Stream Control Transmission Protocol (SCTP), which does not have these drawbacks; however, it is designed to be used in a peer-to-peer context, which causes its use in client-server settings to be fairly low. WebTransport provides a client-server API that supports bidirectional transfer of both unreliable and reliable data, using UDP-like datagrams and cancellable streams. WebTransport calls are visible in the Network panel of DevTools and identified as such in the Type column. For more information, see Sign up for the origin trial and learn more on the origin trial dashboard.WebXR Plane Detection APIWebXR applications can now retrieve data about planes (flat surfaces) in the user's environment, allowing better user experiences with less processing power. Without this feature plane detection requires custom computer vision algorithms using data from MediaDevices.getUserMedia(). These solutions usually fall short of quality and accuracy expectations for AR experiences and don't support world scale. Sign up for the origin trial and learn more on the dashboard.Completed Origin TrialsThe following features, previously in a Chrome origin trial, are now enabled by default.WebAssembly SIMDWebAssembly SIMD exposes hardware SIMD instructions to WebAssembly applications in a platform-independent way. This introduces a new 128-bit type that can represent different types of packed data, and several vector operations that work on packed data. SIMD can boost performance by exploiting data level parallelism and is also useful when compiling native code to WebAssembly. For more information, see the V8 feature explainer for WebAssembly SIMD.Other features in this releaseAlign performance API timer resolution to cross-origin isolated capabilityCoarsening of performance.now() and related timestamps based on site isolation status is now consistent across platforms. This decreases the resolution on desktop from 5 microseconds to 100 microseconds in non-isolated contexts. It also increases their resolution on Android from 100 microseconds to 5 microseconds in cross-origin isolated contexts, where it's safe to do so.Clipboard: Read-Only Files SupportOn desktop, apps can now read files from the clipboard (but not write files to the clipboard). For files on the clipboard, apps have read-only access.async function onPaste(e) { let file = e.clipboardData.files[0]; let contents = await file.text(); } CSSCustom Counter StylesThe CSS Counter Styles Level 3 except:

Image symbols, which no browsers support, and is 'at-risk' per the spec

The speak-as descriptor, which is an accessibility feature

The symbols() function.

Single <compound-selector> for :host() and :host-context()The :host() and :host-context() pseudo-classes now accept a single <compound-selector> in addition to a <compound-selector-list>.Form Controls Visual Refresh on AndroidForm controls have a new, refreshed appearance, with better accessibility and touch support. This was a collaboration between Microsoft and Google, and if you'd like additional information, you can view a Microsoft's blog post.In this release, we have brought the same form controls UX to Android as already launched on other platforms. The new form controls include automatically darkening form controls and scrollbars when in dark mode.Dark mode is an accessibility feature that allows web authors to enable their web pages to be viewed in dark mode. When enabled, users are able to view dark mode supported websites by toggling the dark mode settings on their Android devices. dark mode is easier on the eyes in a low light environment and lowers battery consumption.GravitySensor InterfaceThe GravitySensor interface provides a three-axis reading of the gravity force. It's already possible to derive readings close to those provided by this interface removing the LinerAccelerometer reading from the Accelerometer reading.Suggested file name and location for the File System Access APIWhen using the File System Access API, web apps can now The File System Access API: simplifying access to local files.WebOTP API: cross-origin iframe supportThe WebOTP API is now usable in cross-origin iframes when enabled by a permission policy. The WebOTP API gives developers the ability to programmatically read one time codes from specially-formatted SMS messages addressed to their origin to reduce user friction. Many sites embed iframes that handle authentication.WebSockets over HTTP/2Chrome supports RFC 8441. This is only used for secure WebSockets requests, and only when there is already an HTTP/2 connection where the server has already advertised support for WebSockets over HTTP/2 via the HTTP/2 SETTINGS parameter defined in the specification.Credentials sharing for sites affiliated with Digital Asset LinksSince 2015 developers have used Digital Asset Links (DALs) to associate Android apps with websites to assist users with logging in. If you employ multiple domains that share the same account management backend, you can now also associate them with one another to enable users to Enable Chrome to share login credentials across affiliated sites.JavaScriptThis version of Chrome incorporates version 9.1 of the V8 JavaScript engine. It specifically includes the changes listed below. You can find a complete list of recent features in the V8 release notes.ES Modules for service workers ('module' type option)JavaScript now supports modules in service workers. Setting 'module' type by the constructor's type attribute, worker scripts are loaded as ES modules and the import statement is available on worker contexts. With this feature, web developers can more easily write programs in a composable way and share them among a page and workers.Checks for Private FieldsDevelopers can now test for the existence of private fields in an object using the syntax #foo in obj.Deprecations, and RemovalsThis version of Chrome introduces the deprecation listed below. Visit ChromeStatus.com for lists of previous removals.Remove alert(), confirm(), and prompt() for Cross Origin iframesChrome allows iframes to trigger Javascript dialogs. For example it shows “<URL> says ...” when the iframe is the same origin as the top frame, and “An embedded page on this page says...” when the iframe is cross-origin. This is confusing, and has led to spoofs where sites pretend the message comes from Chrome or a different website. Chrome 91 deprecates this ability. Removing support for cross origin iframes’ ability to call alert(), confirm(), and prompt() will prevent this kind of spoofing, and unblock further UI simplifications. For example, this means notexample.com will no longer be able to call window.alert(), window.prompt(), or window.confirm() if embedded in an iframe on example.com.

Digging for performance gold: finding hidden performance wins

Thursday, April 22, 2021

We are fortunate that so many people choose Chrome as their browser to get things done, which is why we are continually investing in making Chrome more performant. But with software as complex as Chrome, there is a lot of performance left hidden in areas we aren’t actively working on. In our latest post in the

chrome://tracing view of a 2 seconds Jank on AutocompleteController::UpdateResult() on an otherwise healthy machine

We have a culprit! Let’s optimize AutocompleteController? No! We don’t know why yet: keep assuming ignorance!

By augmenting BackgroundTracing with stack sampling, we were able to find a recurring stack under stalled AutoComplete events:

RegEnumValueW

RegEnumValueWStub

base::win::RegistryValueIterator::Read()

gfx::`anonymous namespace\'::CachedFontLinkSettings::GetLinkedFonts

gfx::internal::LinkedFontsIterator::GetLinkedFonts()

gfx::internal::LinkedFontsIterator::NextFont(gfx::Font *)

gfx::GetFallbackFonts(gfx::Font const &)

gfx::RenderTextHarfBuzz::ShapeRuns(...)

gfx::RenderTextHarfBuzz::ItemizeAndShapeText(...)

gfx::RenderTextHarfBuzz::EnsureLayoutRunList()

gfx::RenderTextHarfBuzz::EnsureLayout()

gfx::RenderTextHarfBuzz::GetStringSizeF()

gfx::RenderTextHarfBuzz::GetStringSize()

OmniboxTextView::CalculatePreferredSize()

OmniboxTextView::ReapplyStyling()

OmniboxTextView::SetText...)

OmniboxResultView::Invalidate()

OmniboxResultView::SetMatch(AutocompleteMatch const &)

OmniboxPopupContentsView::UpdatePopupAppearance()

OmniboxPopupModel::OnResultChanged()

OmniboxEditModel::OnCurrentMatchChanged()

OmniboxController::OnResultChanged(bool)

AutocompleteController::UpdateResult(bool,bool)

AutocompleteController::Start(AutocompleteInput const &)

(...)

Ah ha! Autocomplete is not at fault. Time to optimize GetFallbackFonts()?! But wait… Why is GetFallbackFonts() even called in the first place?

And before we figure that out, how do we know this is the #1 root cause of our overall long-tail performance issue? We’ve only looked at one trace so far after all...

The Measurement Conundrum

The metrics tell us how many users are affected and how bad it is, but they do not highlight the root cause.

Slow Reports tell us what the problem is for a specific user but not how many users are affected. And while we can query our corpus of Slow Report traces, it comes with inherent biases that make it impossible to correlate 1:1 with metrics. For instance, because Chrome only reports the first instance of bad performance per-session and only for users of the Canary/Dev channel, there’s both a startup and a population bias.

This is the measurement conundrum. The more actionability (data) a tool provides, the fewer scenarios it captures and the more bias it incurs. Depth vs. breadth.

Tools that attempt to do both sit somewhere in the middle, where they use aggregation over a large dataset and risk showing aggregate results based on flawed input (e.g. circular buffer tracing having dropped the interesting portion and contributing to a biased aggregate).

DirectWrite; however, DirectWrite was added in Windows 7, when Chrome still supported Windows XP. Therefore the GetFallbackFont() logic was forced to stick to a less reliable

Still not zero though, and still seeing instances of the aforementioned AutoComplete issue in our Slow Reports. Keep digging. DirectWrite’s GetFallbackFont() failing was unexpected, but since Slow Reports are anonymized, no user-generated strings can be uploaded -- and therefore, finding which codepoints were problematic was tricky. We teamed up with our privacy experts to instrument Unicode Block and Script of text blocks going through segmentation logic which incorrectly broke down these two

99th percentile of # of unresponsive 100ms intervals over a 30 seconds sample

Posted by Gabriel Charette 🤸🏼 and Etienne Bergeron 🕵🏻, Chrome Software Engineers

Data source for all statistics: Real-world data anonymously aggregated from Chrome clients.

Help users log in across affiliated sites in Chrome

Thursday, April 22, 2021

Chrome can generate a new password, automatically fill saved passwords, sync them and warn users when passwords are compromised. This means users do not need to maintain passwords themselves. However, if you employ multiple domains (for example, top-level domains such as https://www.example.com and https://www.example.co.uk) that share the same account management backend, Chrome may not offer to fill passwords across them. This can result in two entries for the same password in different domains, which may get out of sync. Starting in version 91, Chrome will offer to fill passwords saved to domains associated with Digital Asset Links (DALs). DALs have been adopted since 2015, which allow you to DAL syntax at /.well-known/assetlinks.json on both domains.

To learn more about how to set up a DAL association, enable Chrome to share login credentials across affiliated sites.Posted by Eiji Kitamura, Developer Advocate and Ali Sarraf, Product Manager

Efficient And Safe Allocations Everywhere!

Monday, April 12, 2021

In our constant work to improve performance, our engineers sometimes have to seek optimizations in places that most software developers don’t venture. In this post in our series, In Chrome 89 the entire Chromium codebase transitioned to using PartitionAlloc everywhere (by intercepting and replacing malloc() and new) on Windows 64-bit and Android. Data from the field demonstrates up to 22% memory savings, and up to 9% improvement in responsiveness and scroll latency of Chrome.

Background
Chrome is a multi-platform, multi-process, multi-threaded application, serving a wide range of needs, from small embedded scudo on Android, and slab-based allocator to conserve memory, with a minimal per-thread cache in front for scaling to multi-threaded workloads. This simplicity also pays performance dividends: we’ve extensively profiled and aggressively trimmed the allocator’s fast path, improving thread-local storage access, locks, reducing cache line fetches, and removing branches.

PartitionAlloc pre-reserves slabs of virtual address space. They are gradually backed by physical memory, as allocation requests arrive. Small and medium-sized allocations are grouped in geometrically-spaced, size-segregated buckets, e.g. [241; 256], [257; 288]. Each slab is split into regions (called “slot spans”) that satisfy allocations (“slots”) from only one particular bucket, thereby increasing cache locality while lowering fragmentation. Conversely, larger allocations don’t go through the bucket logic and are fulfilled using the operating system’s primitives directly (mmap() on thread-local storage lookup, improving cache locality in the process. The per-thread cache has been tailored to satisfy the majority of requests by allocating from and releasing memory to the second layer in batches, amortizing lock acquisition, and further improving locality while not trapping excess memory.

The second layer (Slot span free-lists) is invoked upon a per-thread cache miss. For each bucket size, PartitionAlloc knows a slot span with free slots associated with that size, and captures a slot from the free-list of that span. This is still a fast path, but slower than per-thread cache as it requires taking a lock. However, this section is only hit for larger allocations not supported by per-thread cache, or as a batch to fill the per-thread cache.

Finally, if there are no free slots in the bucket, the third layer (Slot span management) either carves out space from a slab for a new slot span, or allocates an entirely new slab from the operating system, which is a slow but very infrequent operation.

The overall performance and space-efficiency of the allocator hinges on the many tradeoffs across its layers such as how much to cache, how many buckets, and memory reclaiming policy. Please refer to Real-world data anonymously aggregated from Chrome clients.
*The core metric measures jank -- delay handling user input -- every 30 seconds.

Don’t Copy That Surface

Monday, April 5, 2021

This post is part of a new series
It doesn’t look like much, but I recognized immediately that the 124 samples in KiPageFault were worth investigating. Most of the CPU-intensive work in this trace was important and unavoidable work but I had a hunch that these samples represented avoidable work - something that I could fix. And, even though they represented just 0.75% of the samples I suspected that they indicated a somewhat greater cost.

I recognized their importance immediately because this is something that I have seen before. hidden costs of memory allocation in a 2014 blog post. The basic memory architecture of Windows hasn’t changed since then so the hidden costs remain about the same.

In addition to CPU samples my ETW trace contained call stacks for every call to VirtualAlloc. This WPA screenshot shows a 10-second period where the OnSample function does 298 allocations that are each 1.320 MB, roughly 30 per second:

backported to M84.

You’d have to be paying very close attention to see the difference - spread across a Chrome process and the system process - but I hope that this helped some computers run a bit cooler and last longer on their batteries. And, while this inefficiency was found by profiling Google Meet, the improvement actually benefits any product that uses the webcam inside Chrome (and other Chromium-based browsers).

Verification

After the fix landed I compared two 10-second ETW traces from Chrome Canary before and after the change, each taken with no other programs running except a single Chrome tab running the Google Meet pre-meeting page. In both cases I looked at a 10-second period of time in the profiler. This showed:

CPU time in OnSample:

Before: 458 ms (432 ms of which were in Lock/Unlock/KiPageFault)
After: 27 ms

Allocations:

Before: 30 allocations per second of 1.32 MB (one per frame, running at 30 fps - a higher framerate would mean more allocations), totalling 396 MB over 10 seconds
After: 0 allocations

CPU time in the System process's MiZeroPageThread:

Before: 36 ms
After: 0 ms

These measurements showed - in three different ways - that the performance problem was fixed. The memory copying in OnSample was gone, the repeated allocations were gone, and the system process was doing less work. Mission accomplished, bug closed.

Chromium Blog

Chrome 91: Handwriting Recognition, WebXR Plane Detection and More

Digging for performance gold: finding hidden performance wins

Help users log in across affiliated sites in Chrome

Efficient And Safe Allocations Everywhere!

Don’t Copy That Surface

Labels

Archive

Feed