WebAssembly Specification
1. Introduction
1.1. Introduction
WebAssembly (abbreviated Wasm [1]) is a safe, portable, low-level code format designed for efficient execution and compact representation. Its main goal is to enable high performance applications on the Web, but it does not make any Web-specific assumptions or provide Web-specific features, so it can be employed in other environments as well.
WebAssembly is an open standard developed by a W3C Community Group.
This document describes version 1.0 of the core WebAssembly standard. It is intended that it will be superseded by new incremental releases with additional features in the future.
1.1.1. Design Goals
The design goals of WebAssembly are the following:
-
Fast, safe, and portable semantics:
- Fast: executes with near native code performance, taking advantage of capabilities common to all contemporary hardware.
- Safe: code is validated and executes in a memory-safe [2], sandboxed environment preventing data corruption or security breaches.
- Well-defined: fully and precisely defines valid programs and their behavior in a way that is easy to reason about informally and formally.
- Hardware-independent: can be compiled on all modern architectures, desktop or mobile devices and embedded systems alike.
- Language-independent: does not privilege any particular language, programming model, or object model.
- Platform-independent: can be embedded in browsers, run as a stand-alone VM, or integrated in other environments.
- Open: programs can interoperate with their environment in a simple and universal manner.
-
Efficient and portable representation:
- Compact: has a binary format that is fast to transmit by being smaller than typical text or native code formats.
- Modular: programs can be split up in smaller parts that can be transmitted, cached, and consumed separately.
- Efficient: can be decoded, validated, and compiled in a fast single pass, equally with either just-in-time (JIT) or ahead-of-time (AOT) compilation.
- Streamable: allows decoding, validation, and compilation to begin as soon as possible, before all data has been seen.
- Parallelizable: allows decoding, validation, and compilation to be split into many independent parallel tasks.
- Portable: makes no architectural assumptions that are not broadly supported across modern hardware.
WebAssembly code is also intended to be easy to inspect and debug, especially in environments like web browsers, but such features are beyond the scope of this specification.
[1] | A contraction of “WebAssembly”, not an acronym, hence not using all-caps. |
[2] | No program can break WebAssembly’s memory model. Of course, it cannot guarantee that an unsafe language compiling to WebAssembly does not corrupt its own memory layout, e.g. inside WebAssembly’s linear memory. |
1.1.2. Scope
At its core, WebAssembly is a virtual instruction set architecture (virtual ISA). As such, it has many use cases and can be embedded in many different environments. To encompass their variety and enable maximum reuse, the WebAssembly specification is split and layered into several documents.
This document is concerned with the core ISA layer of WebAssembly. It defines the instruction set, binary encoding, validation, and execution semantics, as well as a textual representation. It does not, however, define how WebAssembly programs can interact with a specific environment they execute in, nor how they are invoked from such an environment.
Instead, this specification is complemented by additional documents defining interfaces to specific embedding environments such as the Web. These will each define a WebAssembly application programming interface (API) suitable for a given environment.
1.2. Security Considerations
WebAssembly provides no ambient access to the computing environment in which code is executed. Any interaction with the environment, such as I/O, access to resources, or operating system calls, can only be performed by invoking functions provided by the embedder and imported into a WebAssembly module. An embedder can establish security policies suitable for a respective environment by controlling or limiting which functional capabilities it makes available for import. Such considerations are an embedder’s responsibility and the subject of API definitions for a specific environment.
Because WebAssembly is designed to be translated into machine code running directly on the host’s hardware, it is potentially vulnerable to side channel attacks on the hardware level. In environments where this is a concern, an embedder may have to put suitable mitigations into place to isolate WebAssembly computations.
1.2.1. Dependencies
WebAssembly depends on two existing standards:
- [IEEE-754-2019], for the representation of floating-point data and the semantics of respective numeric operations.
- [UNICODE], for the representation of import/export names and the text format.
However, to make this specification self-contained, relevant aspects of the aforementioned standards are defined and formalized as part of this specification, such as the binary representation and rounding of floating-point values, and the value range and UTF-8 encoding of Unicode characters.
Note
The aforementioned standards are the authoritative source of all respective definitions. Formalizations given in this specification are intended to match these definitions. Any discrepancy in the syntax or semantics described is to be considered an error.
1.3. Overview
1.3.1. Concepts
WebAssembly encodes a low-level, assembly-like programming language. This language is structured around the following concepts.
- Values
- WebAssembly provides only four basic value types. These are integers and [IEEE-754-2019] numbers, each in 32 and 64 bit width. 32 bit integers also serve as Booleans and as memory addresses. The usual operations on these types are available, including the full matrix of conversions between them. There is no distinction between signed and unsigned integer types. Instead, integers are interpreted by respective operations as either unsigned or signed in two’s complement representation.
- Instructions
- The computational model of WebAssembly is based on a stack machine. Code consists of sequences of instructions that are executed in order. Instructions manipulate values on an implicit operand stack [1] and fall into two main categories. Simple instructions perform basic operations on data. They pop arguments from the operand stack and push results back to it. Control instructions alter control flow. Control flow is structured, meaning it is expressed with well-nested constructs such as blocks, loops, and conditionals. Branches can only target such constructs.
- Traps
- Under some conditions, certain instructions may produce a trap, which immediately aborts execution. Traps cannot be handled by WebAssembly code, but are reported to the outside environment, where they typically can be caught.
- Functions
- Code is organized into separate functions. Each function takes a sequence of values as parameters and returns a sequence of values as results. [2] Functions can call each other, including recursively, resulting in an implicit call stack that cannot be accessed directly. Functions may also declare mutable local variables that are usable as virtual registers.
- Tables
- A table is an array of opaque values of a particular element type. It allows programs to select such values indirectly through a dynamic index operand. Currently, the only available element type is an untyped function reference. Thereby, a program can call functions indirectly through a dynamic index into a table. For example, this allows emulating function pointers by way of table indices.
- Linear Memory
- A linear memory is a contiguous, mutable array of raw bytes. Such a memory is created with an initial size but can be grown dynamically. A program can load and store values from/to a linear memory at any byte address (including unaligned). Integer loads and stores can specify a storage size which is smaller than the size of the respective value type. A trap occurs if an access is not within the bounds of the current memory size.
- Modules
- A WebAssembly binary takes the form of a module that contains definitions for functions, tables, and linear memories, as well as mutable or immutable global variables. Definitions can also be imported, specifying a module/name pair and a suitable type. Each definition can optionally be exported under one or more names. In addition to definitions, modules can define initialization data for their memories or tables that takes the form of segments copied to given offsets. They can also define a start function that is automatically executed.
- Embedder
- A WebAssembly implementation will typically be embedded into a host environment. This environment defines how loading of modules is initiated, how imports are provided (including host-side definitions), and how exports can be accessed. However, the details of any particular embedding are beyond the scope of this specification, and will instead be provided by complementary, environment-specific API definitions.
[1] | In practice, implementations need not maintain an actual operand stack. Instead, the stack can be viewed as a set of anonymous registers that are implicitly referenced by instructions. The type system ensures that the stack height, and thus any referenced register, is always known statically. |
[2] | In the current version of WebAssembly, there may be at most one result value. |
1.3.2. Semantic Phases
Conceptually, the semantics of WebAssembly is divided into three phases. For each part of the language, the specification specifies each of them.
- Decoding
- WebAssembly modules are distributed in a binary format. Decoding processes that format and converts it into an internal representation of a module. In this specification, this representation is modelled by abstract syntax, but a real implementation could compile directly to machine code instead.
- Validation
- A decoded module has to be valid. Validation checks a number of well-formedness conditions to guarantee that the module is meaningful and safe. In particular, it performs type checking of functions and the instruction sequences in their bodies, ensuring for example that the operand stack is used consistently.
- Execution
-
Finally, a valid module can be executed. Execution can be further divided into two phases:
Instantiation. A module instance is the dynamic representation of a module, complete with its own state and execution stack. Instantiation executes the module body itself, given definitions for all its imports. It initializes globals, memories and tables and invokes the module’s start function if defined. It returns the instances of the module’s exports.
Invocation. Once instantiated, further WebAssembly computations can be initiated by invoking an exported function on a module instance. Given the required arguments, that executes the respective function and returns its results.
Instantiation and invocation are operations within the embedding environment.
2. Structure
2.1. Conventions
WebAssembly is a programming language that has multiple concrete representations (its binary format and the text format). Both map to a common structure. For conciseness, this structure is described in the form of an abstract syntax. All parts of this specification are defined in terms of this abstract syntax.
2.1.1. Grammar Notation
The following conventions are adopted in defining grammar rules for abstract syntax.
- Terminal symbols (atoms) are written in sans-serif font:
2.1.2. Auxiliary Notation
When dealing with syntactic constructs the following notation is also used:
Moreover, the following conventions are employed:
- The notation
Productions of the following form are interpreted as records that map a fixed set of fields
The following notation is adopted for manipulating such records:
The update notation for sequences and records generalizes recursively to nested components accessed by “paths”
where
2.2. Values
WebAssembly programs operate on primitive numeric values. Moreover, in the definition of programs, immutable sequences of values occur to represent more complex data, such as text strings or other vectors.
2.2.1. Bytes
The simplest form of value are raw uninterpreted bytes. In the abstract syntax they are represented as hexadecimal literals.
2.2.2. Integers
Different classes of integers with different value ranges are distinguished by their bit width
The latter class defines uninterpreted integers, whose signedness interpretation can vary depending on context. In the abstract syntax, they are represented as unsigned values. However, some operations convert them to signed based on a two’s complement interpretation.
Note
The main integer types occurring in this specification are
2.2.3. Floating-Point
Floating-point data represents 32 or 64 bit values that correspond to the respective binary formats of the [IEEE-754-2019] standard (Section 3.3).
Every value has a sign and a magnitude. Magnitudes can either be expressed as normal numbers of the form
Possible magnitudes also include the special values
where
A canonical NaN is a floating-point value
An arithmetic NaN is a floating-point value
Note
In the abstract syntax, subnormals are distinguished by the leading 0 of the significand. The exponent of subnormals has the same value as the smallest possible exponent of a normal number. Only in the binary representation the exponent of a subnormal is encoded differently than the exponent of any normal number.
2.2.4. Names
Names are sequences of characters, which are scalar values as defined by [UNICODE] (Section 2.4).
Due to the limitations of the binary format, the length of a name is bounded by the length of its UTF-8 encoding.
2.3. Types
Various entities in WebAssembly are classified by types. Types are checked during validation, instantiation, and possibly execution.
2.3.1. Value Types
Value types classify the individual values that WebAssembly code can compute with and the values that a variable accepts.
The types
The types
2.3.2. Result Types
Result types classify the result of executing instructions or blocks, which is a sequence of values written with brackets.
Note
In the current version of WebAssembly, at most one value is allowed as a result. However, this may be generalized to sequences of values in future versions.
2.3.3. Function Types
Function types classify the signature of functions, mapping a vector of parameters to a vector of results, written as follows.
Note
In the current version of WebAssembly, the length of the result type vector of a valid function type may be at most
2.3.4. Limits
Limits classify the size range of resizeable storage associated with memory types and table types.
If no maximum is given, the respective storage can grow to any size.
2.3.5. Memory Types
Memory types classify linear memories and their size range.
The limits constrain the minimum and optionally the maximum size of a memory. The limits are given in units of page size.
2.3.6. Table Types
Table types classify tables over elements of element types within a size range.
Like memories, tables are constrained by limits for their minimum and optionally maximum size. The limits are given in numbers of entries.
The element type
Note
In future versions of WebAssembly, additional element types may be introduced.
2.3.7. Global Types
Global types classify global variables, which hold a value and can either be mutable or immutable.
2.3.8. External Types
External types classify imports and external values with their respective types.
2.4. Instructions
WebAssembly code consists of sequences of instructions. Its computational model is based on a stack machine in that instructions manipulate values on an implicit operand stack, consuming (popping) argument values and producing or returning (pushing) result values.
Note
In the current version of WebAssembly, at most one result value can be pushed by a single instruction. This restriction may be lifted in future versions.
In addition to dynamic operands from the stack, some instructions also have static immediate arguments, typically indices or type annotations, which are part of the instruction itself.
Some instructions are structured in that they bracket nested sequences of instructions.
The following sections group instructions into a number of different categories.
2.4.1. Numeric Instructions
Numeric instructions provide basic operations over numeric values of specific type. These operations closely match respective operations available in hardware.
Numeric instructions are divided by value type. For each type, several subcategories can be distinguished:
- Constants: return a static constant.
- Unary Operators: consume one operand and produce one result of the respective type.
- Binary Operators: consume two operands and produce one result of the respective type.
- Tests: consume one operand of the respective type and produce a Boolean integer result.
- Comparisons: consume two operands of the respective type and produce a Boolean integer result.
- Conversions: consume a value of one type and produce a result of another (the source type of the conversion is the one after the “
Some integer instructions come in two flavors, where a signedness annotation
2.4.2. Parametric Instructions
Instructions in this group can operate on operands of any value type.
The
The
2.4.3. Variable Instructions
Variable instructions are concerned with access to local or global variables.
These instructions get or set the values of variables, respectively. The
2.4.4. Memory Instructions
Instructions in this group are concerned with linear memory.
Memory is accessed with
The static address offset is added to the dynamic address operand, yielding a 33 bit effective address that is the zero-based index at which the memory is accessed. All values are read and written in little endian byte order. A trap results if any of the accessed memory bytes lies outside the address range implied by the memory’s current size.
Note
Future version of WebAssembly might provide memory instructions with 64 bit address ranges.
The
2.4.5. Control Instructions
Instructions in this group affect the flow of control.
The
The
The
Each structured control instruction introduces an implicit label. Labels are targets for branch instructions that reference them with label indices. Unlike with other index spaces, indexing of labels is relative by nesting depth, that is, label
Note
This enforces structured control flow. Intuitively, a branch targeting a
Branch instructions come in several flavors:
The
Note
In the current version of WebAssembly,
2.4.6. Expressions
Function bodies, initialization values for globals, and offsets of element or data segments are given as expressions, which are sequences of instructions terminated by an
In some places, validation restricts expressions to be constant, which limits the set of allowable instructions.
2.5. Modules
WebAssembly programs are organized into modules, which are the unit of deployment, loading, and compilation. A module collects definitions for types, functions, tables, memories, and globals. In addition, it can declare imports and exports and provide initialization logic in the form of data and element segments or a start function.
Each of the vectors – and thus the entire module – may be empty.
2.5.1. Indices
Definitions are referenced with zero-based indices. Each class of definition has its own index space, as distinguished by the following classes.
The index space for functions, tables, memories and globals includes respective imports declared in the same module. The indices of these imports precede the indices of other definitions in the same index space.
The index space for locals is only accessible inside a function and includes the parameters of that function, which precede the local variables.
Label indices reference structured control instructions inside an instruction sequence.
2.5.2. Types
The
All function types used in a module must be defined in this component. They are referenced by type indices.
Note
Future versions of WebAssembly may add additional forms of type definitions.
2.5.3. Functions
The
The
The
The
Functions are referenced through function indices, starting with the smallest index not referencing a function import.
2.5.4. Tables
The
A table is a vector of opaque values of a particular table element type. The
Tables can be initialized through element segments.
Tables are referenced through table indices, starting with the smallest index not referencing a table import. Most constructs implicitly reference table index
Note
In the current version of WebAssembly, at most one table may be defined or imported in a single module, and all constructs implicitly reference this table
2.5.5. Memories
The
A memory is a vector of raw uninterpreted bytes. The
Memories can be initialized through data segments.
Memories are referenced through memory indices, starting with the smallest index not referencing a memory import. Most constructs implicitly reference memory index
Note
In the current version of WebAssembly, at most one memory may be defined or imported in a single module, and all constructs implicitly reference this memory
2.5.6. Globals
The
Each global stores a single value of the given global type. Its
Globals are referenced through global indices, starting with the smallest index not referencing a global import.
2.5.7. Element Segments
The initial contents of a table is uninitialized. The
The
Note
In the current version of WebAssembly, at most one table is allowed in a module. Consequently, the only valid
2.5.8. Data Segments
The initial contents of a memory are zero-valued bytes. The
The
Note
In the current version of WebAssembly, at most one memory is allowed in a module. Consequently, the only valid
2.5.9. Start Function
The
Note
The start function is intended for initializing the state of a module. The module and its exports are not accessible before this initialization has completed.
2.5.10. Exports
The
Each export is labeled by a unique name. Exportable definitions are functions, tables, memories, and globals, which are referenced through a respective descriptor.
2.5.11. Imports
The
Each import is labeled by a two-level name space, consisting of a
Every import defines an index in the respective index space. In each index space, the indices of imports go before the first index of any definition contained in the module itself.
Note
Unlike export names, import names are not necessarily unique. It is possible to import the same
3. Validation
3.1. Conventions
Validation checks that a WebAssembly module is well-formed. Only valid modules can be instantiated.
Validity is defined by a type system over the abstract syntax of a module and its contents. For each piece of abstract syntax, there is a typing rule that specifies the constraints that apply to it. All rules are given in two equivalent forms:
- In prose, describing the meaning in intuitive form.
- In formal notation, describing the rule in mathematical form. [1]
Note
The prose and formal rules are equivalent, so that understanding of the formal notation is not required to read this specification. The formalism offers a more concise description in notation that is used widely in programming languages semantics and is readily amenable to mathematical proof.
In both cases, the rules are formulated in a declarative manner. That is, they only formulate the constraints, they do not define an algorithm. The skeleton of a sound and complete algorithm for type-checking instruction sequences according to this specification is provided in the appendix.
3.1.1. Contexts
Validity of an individual definition is specified relative to a context, which collects relevant information about the surrounding module and the definitions in scope:
- Types: the list of types defined in the current module.
- Functions: the list of functions declared in the current module, represented by their function type.
- Tables: the list of tables declared in the current module, represented by their table type.
- Memories: the list of memories declared in the current module, represented by their memory type.
- Globals: the list of globals declared in the current module, represented by their global type.
- Locals: the list of locals declared in the current function (including parameters), represented by their value type.
- Labels: the stack of labels accessible from the current position, represented by their result type.
- Return: the return type of the current function, represented as an optional result type that is absent when no return is allowed, as in free-standing expressions.
In other words, a context contains a sequence of suitable types for each index space, describing each defined entry in that space. Locals, labels and return type are only used for validating instructions in function bodies, and are left empty elsewhere. The label stack is the only part of the context that changes as validation of an instruction sequence proceeds.
More concretely, contexts are defined as records
In addition to field access written
- When spelling out a context, empty fields are omitted.
Note
We use indexing notation like
3.1.2. Prose Notation
Validation is specified by stylised rules for each relevant part of the abstract syntax. The rules not only state constraints defining when a phrase is valid, they also classify it with a type. The following conventions are adopted in stating these rules.
-
A phrase
Note
For example, if
-
The rules implicitly assume a given context
-
In some places, this context is locally extended to a context
3.1.3. Formal Notation
Note
This section gives a brief explanation of the notation for specifying typing rules formally. For the interested reader, a more thorough introduction can be found in respective text books. [2]
The proposition that a phrase
The formal typing rules use a standard approach for specifying type systems, rendering them into deduction rules. Every rule has the following general form:
Such a rule is read as a big implication: if all premises hold, then the conclusion holds. Some rules have no premises; they are axioms whose conclusion holds unconditionally. The conclusion always is a judgment
Note
For example, the typing rule for the
The instruction is always valid with type
An instruction like
Here, the premise enforces that the immediate local index
Finally, a structured instruction requires a recursive rule, where the premise is itself a typing judgement:
A
[1] | The semantics is derived from the following article: Andreas Haas, Andreas Rossberg, Derek Schuff, Ben Titzer, Dan Gohman, Luke Wagner, Alon Zakai, JF Bastien, Michael Holman. Bringing the Web up to Speed with WebAssembly. Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM 2017. |
[2] | For example: Benjamin Pierce. Types and Programming Languages. The MIT Press 2002 |
3.2. Types
Most types are universally valid. However, restrictions apply to function types as well as the limits of table types and memory types, which must be checked during validation.
3.2.1. Limits
Limits must have meaningful bounds that are within a given range.
3.2.1.1.
- The value of
3.2.2. Function Types
Function types may not specify more than one result.
3.2.2.1.
- The arity
Note
The restriction to at most one result may be removed in future versions of WebAssembly.
3.3. Instructions
Instructions are classified by function types
Note
For example, the instruction
Typing extends to instruction sequences
For some instructions, the typing rules do not fully constrain the type, and therefore allow for multiple types. Such instructions are called polymorphic. Two degrees of polymorphism can be distinguished:
- value-polymorphic: the value type
In both cases, the unconstrained types or type sequences can be chosen arbitrarily, as long as they meet the constraints imposed for the surrounding parts of the program.
Note
For example, the
and
are valid, with
The
is valid by assuming type
is invalid, because there is no possible type to pick for the
3.3.1. Numeric Instructions
3.3.1.1.
- The instruction is valid with type
3.3.1.2.
- The instruction is valid with type
3.3.1.3.
- The instruction is valid with type
3.3.1.4.
- The instruction is valid with type
3.3.1.5.
- The instruction is valid with type
3.3.1.6.
- The instruction is valid with type
3.3.2. Parametric Instructions
3.3.2.1.
- The instruction is valid with type
3.3.2.2.
- The instruction is valid with type
Note
Both
3.3.3. Variable Instructions
3.3.3.1.
- The local
3.3.3.2.
- The local
3.3.3.3.
- The local
3.3.3.4.
- The global
3.3.3.5.
- The global
3.3.4. Memory Instructions
3.3.4.1.
- The memory
3.3.4.2.
- The memory
3.3.4.3.
- The memory
3.3.4.4.
- The memory
3.3.4.5.
- The memory
3.3.4.6.
- The memory
3.3.5. Control Instructions
3.3.5.1.
- The instruction is valid with type
3.3.5.2.
- The instruction is valid with type
Note
The
3.3.5.9.
- The return type
Note
The
3.3.5.10.
- The function
3.3.5.11.
- The table
3.3.6. Instruction Sequences
Typing of instruction sequences is defined recursively.
3.3.6.1. Empty Instruction Sequence:
- The empty instruction sequence is valid with type
3.3.6.2. Non-empty Instruction Sequence:
- The instruction sequence
3.3.7. Expressions
Expressions
3.3.7.1.
- The instruction sequence
3.3.7.2. Constant Expressions
- In a constant expression
Note
Currently, constant expressions occurring as initializers of globals are further constrained in that contained
The definition of constant expression may be extended in future versions of WebAssembly.
3.4. Modules
Modules are valid when all the components they contain are valid. Furthermore, most definitions are themselves classified with a suitable type.
3.4.8. Exports
Exports
3.4.8.1.
- The export description
3.4.8.2.
- The function
3.4.8.3.
- The table
3.4.8.4.
- The memory
3.4.8.5.
- The global
3.4.9. Imports
Imports
3.4.9.1.
- The import description
3.4.9.2.
- The function
3.4.9.3.
- The table type
3.4.9.4.
- The memory type
3.4.9.5.
- The global type
3.4.10. Modules
Modules are classified by their mapping from the external types of their imports to those of their exports.
A module is entirely closed, that is, its components can only refer to definitions that appear in the module itself. Consequently, no initial context is required. Instead, the context
- Let