Update, February 2019: Two years later the VC++ bug I reported here still exists, even in VS 2019 preview. However Chromium no longer builds with VC++ so I’m reverting the changes where I deleted const in order to make some global arrays const. Read on for details if that doesn’t seem to make sense.
I just completed a series of changes that shrunk the Chrome browser’s on-disk size on Windows by over a megabyte, moved about 500 KB of data from its read/write data segments to its read-only data segments, and reduced its private working set by about 200 KB per-process. The amusing thing about this series of changes is that they consisted entirely of removing const from some places, and adding const to others. Compilers be weird.
I filed several bugs while writing this post. One of them asked VC++ to support extern of constexpr objects, and that was just (on November 7, 2017) closed as fixed.
This task started when I was that’s funny…” I’d noticed that several of the large globals, which looked like they should be constant data, were in the read/write data segment. An abbreviated version of the tool’s output is shown below:
Most executable formats have at least two data segments – one for read/write globals and one for read-only globals. If you have constant data, such as kBrotliDictionary, then it is best to have it in the read-only data segment, which is segment ‘2’ in Chrome’s Windows binaries. However some logically constant data, such as unigram_table, device::UsbIds::vendors_, and blink::serializedCharacterData were in section ‘3’, the read/write data segment.
Putting data in the read-only data segment has a few advantages. It makes it impossible to corrupt the data through stray writes, and it also ensures greater efficiency. The read-only pages are guaranteed to be shared between all processes that load that DLL, which in the case of Chrome can be a lot of processes. And, in some cases (although probably not these) the compiler can use those constant values directly.
Pages in the read/write data segment might end up staying shared, but they might not. These pages are all created as copy-on-write which means that they are shared until written to, at which point they are copied into per-process private pages. So, if a global variable is initialized at run-time it becomes private data. Or, if a global variable is on the same page as another global that is written to then it becomes private data – everything happens at the page granularity (4 KiB).
Private data uses more memory because there is a separate copy for each process, but private pages are also more expensive because they are backed by the pagefile instead of by the image file. This means that when memory gets low they have to be written to the pagefile instead of just being discarded, and paging them back in is more expensive because they tend to get written randomly and therefore get read randomly. For all of these reasons shared pages are strictly better than potentially-private pages.
Adding const is good
So, when my landed a change to add a const modifier, and it dutifully moved to the read-only data segment. Simple enough. A change like this is never a bad idea, but it’s hard to know how much it helps. Because this array is never written to it might end up staying as shared data forever. Or, more likely, if the ends of the array end up sharing pages with global variables that are written to then part of the array will end up turning into per-process private data, wasting 7,748 or 3,652 bytes (the size of the array minus one or two pages in the middle which are guaranteed to stay shared) in each process. Changes like this should help on all platforms, with all toolchains.
Marking your const arrays as const is a good idea, you should do it, and I’m sure some developers don’t know this, but I’m not sure this information alone would have been enough to inspire this blog post. Here is where we veer into peculiar new territory…
Sometimes removing const is even better
The next array that I investigated was unigram_table. This one was peculiar because it was initialized entirely with constant data using struct/array initializer syntax and it was marked as const but it was still being placed in the read/write data segment. This looked like an interesting VC++ compiler quirk so I followed my fit in a tweet, or one line of a blog post:
const struct T {const int b[999]; } a[] = {{{}}}; int main() {return(size_t)a;}
If you compile this and run ShowGlobals on the PDB it will show that ‘a’ is in section ‘3’, despite being tagged as const. Here are the steps for building and testing the code:
> “%VS140COMNTOOLS%..\..\VC\vcvarsall.bat”
> cl /Zi constbug.cpp
/out:constbug.exe
> ShowGlobals.exe constbug.pdb
Size Section Symbol name
3996 3 a
Having reduced my example to less than 140 characters it was easy to find the trigger. With VC++ (2010, 2015, 2017 RC) if you have a class/struct with a const member variable then any global objects of that type end up in the read/write data segment. Jonathan Caves explains in a comment on my bug report that this happens because the type has a “compiler generated deleted default constructor” (makes sense) which confuses VC++ which incorrectly sees this as a class that needs dynamic initialization.
So, the problem in this case is the const modifier on the ‘b’ member variable. Once that is deleted the array, ironically enough, gets put in read-only memory. Since the object itself is const the removal of const from the member variable doesn’t reduce safety at all, and actually improves it for VC++ builds.
I suspect that the VC++ team will fix this for VS 2017 – it seems like an easy win to me – but I didn’t want to wait that long. So, I started landing changes to remove const on member variables in Chrome where it was causing problems. This was simply a matter of continuing to look through the list of large global variables in the read/write data segment and categorizing them:
- Actually written to – leave alone
- Not written and missing const on the global variable – add it
- Not written and problematic const on a member variable – remove it
Okay, this looks really funny…
So, I worked my way through Chrome’s source code, adding and removing const in a few useful places. For most of my changes the effect was just to move some data from the read/write data segment to the read-only data segment, as expected, but two of the changes did much more. Two of the changes shrunk the .text and .reloc sections. This was great, but it seemed too good to be true. This suggested that VC++ was generating code to initialize some of these arrays – a lot of code.
The most interesting change was removing three const keywords from the declaration of the UnigramEntry struct. Doing this moved 53,064 bytes to the read-only data segment, but it also saved over 364,500 bytes of code, in both chrome.dll and chrome_child.dll. That suggests that VC++ had been quietly creating an initializer function that used an average of almost seven bytes of code to initialized each byte of unigram_table. That couldn’t be. It was too much to hope for that my change would help that much!
So, I launched Chrome under the VS debugger and set a breakpoint at the end of the unigram_table array and VS dutifully stopped there at the beginning of the initializer. I’ve put a cleaned up and simplified excerpt of the assembly language below (I replaced ‘unigram_table’ with ‘u’ to make it fit better):
55 push ebp
8B EC mov ebp,esp
83 25 78 91 43 12 00 and dword [u],0
83 25 7C 91 43 12 00 and dword [u+4],0
83 25 80 91 43 12 00 and dword [u+8],0
83 25 84 91 43 12 00 and dword [u+0Ch],0
C6 05 88 91 43 12 4D mov byte [u+10h],4Dh
C6 05 89 91 43 12 CF mov byte [u+11h],0CFh
C6 05 8A 91 43 12 1D mov byte [u+12h],1Dh
C6 05 8B 91 43 12 1B mov byte [u+13h],1Bh
C7 05 8C 91 43 12 FF 00 00 00 mov dword [u+14h],0FFh
C6 05 90 91 43 12 00 mov byte [u+18h],0
C6 05 91 91 43 12 00 mov byte [u+19h],0
C6 05 92 91 43 12 00 mov byte [u+1Ah],0
C6 05 93 91 43 12 00 mov byte [u+1Bh],0
… 52,040 lines deleted…
c6 05 02 6e 0b 12 6c mov byte [u+cf42h],6Ch
c6 05 03 6e 0b 12 6e mov byte [u+cf43h],6Eh
c6 05 04 6e 0b 12 a2 mov byte [u+cf44h],0A2h
c6 05 05 6e 0b 12 c2 mov byte [u+cf45h],0C2h
c6 05 06 6e 0b 12 80 mov byte [u+cf46h],80h
c6 05 07 6e 0b 12 c4 mov byte [u+cf47h],0C4h
5d pop ebp
c3 ret
The hex numbers along the left are the machine-code bytes and the text to the right of that is the assembly language. After some prologue code the code gets into a routine of writing to the array… one byte at a time… using seven-byte instructions. Well, that explains the size of the initializer.
It is well known that optimizing compilers can write code that is as good or better than human beings in most cases. And yet. Sometimes they don’t. There are so many things that could be better about this function:
- It could not exist. The array is initialized using C array syntax and if it wasn’t for this VC++ code-quality bug with const struct members then there would be no initializer – on other platforms there isn’t
- The writes of zero could be omitted. The array is a global variable being initialized at startup and the contents of memory are already guaranteed to be zero, so every write of zero can be removed
- Data could be written four bytes at a time (ten-byte instruction) instead of one byte at a time (seven-byte instruction)
- The address of the array could be loaded into a register and then dereferenced instead of specifying its address in every. single. instruction. This would make the instructions smaller and would also save two bytes per instruction of relocation data, found in the .reloc segment
You get the idea. The function could be a quarter the size, or it could be omitted entirely. Anyway, with the three const keywords removed the initializer disappeared entirely (it is gone from This change gave similar but slightly smaller improvements.
Most importantly, the various globals involved in two changes go from being mostly or completely per-process private data (due to being initialized at run-time) to being shared data, saving an estimated 200 KB of data per process.
Example changes
I started with the largest and most common objects and types in order to get the biggest wins first and I quickly reduced the size of the read/write data segment by about 250 KB, moving over 1,500 global variables to the read-only data segment. It’s easy to get caught up in the game (who me? OCD? I don’t know what you’re talking about) but I have managed to stop even though I know that there are still hundreds more small global variables that I could fix. After a while I hit diminishing returns and it was time to move onto something else. To put it another way, if you’ve always wanted to land a Chromium change I left a few for you. Here are some of the changes landed as part of this project:
Some changes that deleted const:
- long-standing bug)
- Delete const three times to move 166 KB to read-only data segment and save 224 KB of code (1,500 separate global variables!)
- Delete const five times to move ~12,500 bytes to read-only data segment
- Delete const once to move ~6,800 bytes to read-only data segment
- Delete const four times to move ~2,500 bytes to read-only data segment
- Delete const six times to move 960 bytes to read-only data segment
- Delete const five times to move ~256 bytes to read-only data segment (this one was done because of fears that the data was getting corrupted)
Some changes that added const:
- Add const once to move 12,864 bytes to read-only data segment
- Add const three times to move 11,844 bytes to read-only data segment (this change also added extern twice)
- Add const twice to move 3,000 bytes to read-only data segment
- Add const once to move 396 bytes to read-only data segment
I was really hoping for conservation-of-const, and there are enough globals missing const that I could yet attain it.
See for yourself
If you want to debug Chrome and see the unigram_table initializer before it goes away then it’s fairly straightforward – you don’t need to be a Chrome developer to do it. Start by executing these two commands:
> “%VS140COMNTOOLS%..\..\VC\vcvarsall.bat”
> devenv /debugexe chrome.exe
Make sure you have Chrome’s symbol server set up in VS per these instructions and then set a breakpoint on this symbol:
`dynamic initializer for ‘unigram_table”
In case wordpress mangles it that’s a back-tick, some magic text, a single quote, the symbol name, and then two single quotes. Make sure you’ve shutdown Chrome completely, and then launch Chrome from VS. VS will download Chrome’s symbols (the source server by checking the box in the Visual Studio debug options.
And that’s how I spent my holidays – for less technical holiday shenanigans read about warm weather snowmen, and the importance of winning Christmas.
Reddit discussion is here.
You should prefer constexpr instead of const whenever you want to ensure static initializiation. It does the right thing in your “fit-in-a-tweet” example even if you keep the inner const. Bugs aside, it’s safer in general: since constexpr makes the code ill-formed when static initializiation is not feasible, you can spot the potential inefficiency without using post-build tools.
What about an extern const ref / pointer that is initialized in the .cpp with a constexpr data?
Wa that’s a cool feature, thanks for sharing! That’s what I had in mind but in a cleaner and more direct way. Note that in case of an array you don’t get an extra indirection by declaring another pointer to the same data (but you do with a ref). But I know you master this details better than me 🙂
Your blog is cool and you are doing a great job (in my previous job I was also in charge of spotting the VC++ bugs and finding a workaround for them and it made me measure how much patience and effort it requires when you are dealing with millions of LOC).
This is what I meant:
https://godbolt.org/g/M9XOvI
VC++ 2015 does the same optimization too.
In case the link dies, here’s the code:
const char str1[] = “hello1”;
/ double const required
const char * const str2 = str1;
const int array1[] = { 0, 1 };
/ static is a game changer here
static const int *array2 = array1;
I understand. I realized afterwards that it works because of the static symbol, while you need an extern one to work across TU. I missed that point. But that was cool to learn a few things. Thanks for taking the time to explain 🙂
I believe the site has some malware injected in it. At least the mobile version of it. After a few second it redirected me to some site with a fake Facebook skin.
Anonymous, try reporting that at https://en.support.wordpress.com/about-these-ads/ . Include every detail you can…
instead of removing the consts you should have replaced them with some macro that evaluates to empty string on the problematic compilers. this way the locations are easily greppable and can be reverted when the compilers are fixed.
Interesting, it matters saving even a byte or single instruction and that too complex code base like layout engines. An age old article by Dan Saks, may still be helpful in the era of modern C++(11).
Click to access 1998-06%20Placing%20const%20in%20Declarations.pdf
Click to access 1998-08%20What%20const%20Really%20Means.pdf
http://www.dansaks.com/articles.htm
RE: The removal of const because of VS bug
At what point is it okay to go against the language semantics to optimize in short term? and how do you plan to track the bug and reverse this change?
Is it feasible to use gcc or Clang to build chrome in Windows?
btw, http://gcc.godbolt.org/ finally added the MS compiler! Just switch to ‘x86-64 CL’
Nice for bug reports, quickly checking codegen etc.
Nice tool, we found some places where people had declared const std::strings in includes and also a static const std::mutex in a header outside of a class – oops! We also noticed some duplicated ::’vftable’ entries, but we believe these are normal consequences of multiple inheritance, and I will probably modify your source code to filter out these ::’vftable’ symbols. We don’t care about vtables > 500 bytes either, Qt has a few of these. Thanks!