Glossary - Database Manual

Tier	Recommended environments
`M10` and `M20`	Development Low-traffic production
`M30` and greater	Production

In JSON schema that defines which fields are queryable and which query types are permitted on those fields.

In computing, endianness refers to the order in which bytes are arranged. This ordering can refer to transmission over a communication medium or more commonly how the bytes are ordered in computer memory, based on their significance and position. For details, see big-endian and little-endian.

An encryption procedure where data is encrypted using a Data Encryption Key and the data encryption key is encrypted by another key called the Customer Master Key. The encrypted keys are stored as BSON documents in a MongoDB collection called the KeyVault.

Formula to calculate the similarity by using the distance between two vectors in multi-dimensional space. Euclidean distance is sensitive to the magnitude of the vectors. Atlas Vector Search supports using euclidean similarity function for indexing vectors and when searching for nearest neighbors.

A property of a distributed system that allows changes to the system to propagate gradually. In a database system, this means that readable members aren't required to have the latest updates.

When using In-Use Encryption, explicitly specifying the encryption or decryption operation, keyID, and query type (for Queryable Encryption) or algorithm (for Client-Side Field Level Encryption) when working with encrypted data. Compare to automatic encryption.

A component of a query that resolves to a value. Expressions are stateless, meaning they return a value without mutating any of the values used to build the expression.

In the MongoDB Query Language, you can build expressions from the following components:

Component	Example
Constants	`3`
Operators	`$add`
Field path expressions	`"$<path.to.field>"`

For example, { $add: [ 3, "$inventory.total" ] } is an expression consisting of the $add operator and two input expressions:

The constant 3
The field path expression "$inventory.total"

The expression returns the result of adding 3 to the value at path inventory.total of the input document.

The process that allows a secondary member of a replica set to become primary in the event of a failure. See Automatic Failover.

A name-value pair in a document. A document has zero or more fields. Fields are analogous to columns in relational databases. See Document Structure.

Path to a field in a document. To specify a field path, use a string that prefixes the field name with a dollar sign ($).

A system level network filter that restricts access based on IP addresses and other parameters. Firewalls are part of a secure network. See Firewalls.

Free-to-use cluster tier that provides a small-scale development environment to host your data. Free clusters never expire, and provide access to a subset of Atlas features and functionality. Free clusters might also be referred to by their instance size, M0.

Tip

Atlas M0 (Free Cluster), M2, and M5 Limits

A system call that flushes all dirty, in-memory pages to storage. As applications write data, MongoDB records the data in the storage layer.

To provide durable data, WiredTiger uses checkpoints. For more details, see Journaling and the WiredTiger Storage Engine.

A geohash value is a binary representation of the location on a coordinate grid. See Geohash Values.

A geospatial data interchange format based on JavaScript Object Notation (JSON). GeoJSON is used in geospatial queries. For supported GeoJSON objects, see Geospatial Data. For the GeoJSON format specification, see https://tools.ietf.org/html/rfc7946#section-3.1.

Relating to geographical location. See Geospatial Queries.

Clusters with defined geographic zones to support location-aware read and write operations for globally distributed application instances and clients. You can enable global sharding on clusters of tier M30 and greater.

Tip

Create a Global Cluster

Geographic zone representing a subset of your global cluster distribution. Each global cluster supports up to 9 distinct global write zones. Each zone consists of one highest priority region and one or more electable, read-only, or analytics regions.

The available geographic regions depend on the selected cloud service provider.

A convention for storing large files in a MongoDB database. All of the official MongoDB drivers support the GridFS convention, as does the mongofiles program. See GridFS for Self-Managed Deployments.

See project.

See project ID.

A type of shard key that uses a hash of the value in the shard key field to distribute documents among members of the sharded cluster. See Hashed Indexes.

A health manager runs health checks on a health manager facet at a specified intensity level. The health manager checks are run at specified time intervals. A health manager can be configured to move a failing mongos out of a cluster automatically.

A set of features that a health manager can be configured to run health checks for. For example, you can configure a health manager to monitor and manage DNS or LDAP cluster health issues automatically. See Health Manager Facets for details.

A replica set member that cannot become primary and are invisible to client applications. See Hidden Replica Set Members.

Algorithm for performing efficient nearest neighbor search in multi-dimensional space. Atlas Vector Search performs ANN search with Hierarchical Navigable Small Worlds.

High availability indicates a system designed for durability, redundancy, and automatic failover. Applications supported by the system can operate without downtime for a long time period. MongoDB replica sets support high availability when deployed according to the best practices.

For guidance on replica set deployment architecture, see Replica Set Deployment Architectures.

Region in a multi-region cluster which Atlas prioritizes for primary eligibility during elections.

Tip

Configure High Availability and Workload Isolation

Method of combining different search methods, such as a full-text and a semantic search, to take advantage of their respective strengths. The results are combined by using a technique such as Reciprocal Rank Fusion (RRF).

An operation produces the same result with the same input when run multiple times.

Estimated performance improvement of an index that Performance Advisor suggests.

Tip

Review Index Ranking

A sort that must be performed in memory before the output is returned. In-memory sorts may impact performance for large data sets. Use an indexed sort to avoid an in-memory sort.

See Sort and Index Use for more information on indexed sort operations.

Encryption that secures data when transmitted, stored, and processed, and enables supported queries on that encrypted data. MongoDB provides two approaches to In-Use Encryption: Queryable Encryption and Client-Side Field Level Encryption.

A data structure that optimizes queries. See Indexes.

The range of index values that MongoDB searches when using an index to run a query. To learn more, see Multikey Index Bounds.

A sort where an index provides the sorted result. Sort operations that use an index often have better performance than an in-memory sort. See Use Indexed to Sort Query Results for more information.

A shell script used by a Linux platform's init system to start, restart, or stop a daemon process. If you installed MongoDB using a package manager, an init script is provided for your system as part of the installation. See the respective Installation Guide for your operating system.

The init system is the first process started on a Linux platform after the kernel starts, and manages all other processes on the system. The init system uses an init script to start, restart, or stop a daemon process, such as mongod or mongos. Recent Linux versions typically use the systemd init system and the systemctl command. Older Linux versions typically use the System V init system and the service command. See the Installation Guide for your operating system.

The replica set operation that replicates data from an existing replica set member to a new replica set member. See Initial Sync.

A lock on a resource that indicates the lock holder will read from (intent shared) or write to (intent exclusive) the resource using concurrency control at a finer granularity than that of the resource with the intent lock. Intent locks allow concurrent readers and writers of a resource. See What type of locking does MongoDB use?.

AWS VPC endpoint with a private IP address that sends traffic to the Atlas private endpoint service over AWS PrivateLink.

Tip

Learn About Private Endpoints in Atlas

A point in an operation when it can safely end. MongoDB only ends an operation at designated interrupt points. See Terminate Running Operations.

List of IP addresses and CIDR blocks with access to clusters within an Atlas project. For client connections over the public Internet, Atlas allows connections to a cluster only from entries in the corresponding project's IP access list. The access list may have up to 200 entries.

Atlas also allows client connections over nonpublic networking, such network peering connections or private endpoints. These types of connections work irrespective of the IP access list. To learn more, see Set Up a Network Peering Connection and Learn About Private Endpoints in Atlas.

A revision to the IP (Internet Protocol) standard with a large address space to support Internet hosts.

The international date format used by mongosh to display dates. The format is YYYY-MM-DD HH:MM.SS.millis.

A scripting language. mongosh, the legacy mongo shell, and certain server functions use a JavaScript interpreter. See Server-side JavaScript for more information.

A sequential, binary transaction log used to bring the database into a valid state in the event of a hard shutdown. Journaling writes data first to the journal and then to the core data files. MongoDB enables journaling by default for 64-bit builds of MongoDB version 2.0 and newer. Journal files are pre-allocated and exist as files in the data directory. See Journaling.

JavaScript Object Notation. A plain text format for expressing structured data with support in many programming languages. For more information, see http://www.json.org. Certain MongoDB tools render an approximation of MongoDB BSON documents in JSON format. See MongoDB Extended JSON (v2).

A JSON document is a collection of fields and values in a structured format. For sample JSON documents, see http://json.org/example.html.

JSON with padding. Refers to a method of injecting JSON into applications. Presents potential security concerns.

A chunk that grows beyond the specified chunk size and cannot split into smaller chunks. For more details, see Indivisible/Jumbo Chunks.

Given a set of points P with a defined similarity function S, for a query point q, finds the set of k points in P with the best values of S*(*p, q). Atlas Vector Search ENN search returns the exact top k points and ANN search returns k points that are similar to q, but not necessarily the k most similar to q.

The random string of bits used by an encryption algorithm to encrypt and decrypt data.

A MongoDB collection that stores the encrypted Data Encryption Keys as BSON documents.

Cross-platform protocol used to authenticate users and authorize them to access data on a cluster. You can use Atlas to manage user authentication and authorization from all MongoDB clients using your own LDAP server over TLS. A single LDAPS configuration applies to all clusters in an Atlas project.

An authorization policy that grants a user only the access that is essential to that user's work.

The format used for geospatial data before MongoDB version 2.4. This format stores geospatial data as points on a planar coordinate system (for example, [ x, y ]). See Geospatial Queries.

A LineString is an array of two or more positions. A closed LineString with four or more positions is called a LinearRing, as described in the GeoJSON LineString specification: https://tools.ietf.org/html/rfc7946#section-3.1.4. To use a LineString in MongoDB, see GeoJSON Objects.

String that contains the information necessary to connect from Cloud Manager or Ops Manager to Atlas during a live migration from a Cloud Manager or Ops Manager deployment to a cluster in Atlas.

When you are ready to live migrate data from a Cloud Manager or Ops Manager deployment, you generate a link-token in Atlas and then enter it in your Cloud Manager or Ops Manager organization's settings. You use the same link-token to migrate each deployment in your Cloud Manager or Ops Manager organization sequentially, one at a time. You can generate multiple link-tokens in Atlas. Use one unique link-token for each Cloud Manager or Ops Manager organization.

A byte order in which the least significant byte (little end) of a multibyte data value is stored at the lowest memory address.

click to enlarge

Process to seamlessly move an existing source replica set or sharded cluster to Atlas. During the live migration process, Atlas keeps the target cluster in sync with the remote source until you cut your applications over to the Atlas cluster. Atlas offers two modes of live migration:

Push live migration, known in the Atlas user interface as Live Migration from Ops Manager or Cloud Manager, where Atlas pushes a deployment from Cloud Manager or Ops Manager to Atlas.
Pull live migration, known in the Atlas user interface as General Live Migration, where Atlas pulls a deployment from a cloud or on-premise deployment to Atlas.

Tip

Legacy Live Migration (Pull) of Replica Sets to Atlas

MongoDB uses locks to ensure that concurrency does not affect correctness. MongoDB uses read locks, write locks and intent locks. For more information, see What type of locking does MongoDB use?.

Contain server events, such as incoming connections, commands run, and issues encountered. For more details, see Log Messages.

Logical volume manager. LVM is a program that abstracts disk images from physical devices and provides a number of raw disk manipulation and snapshot capabilities useful for system management. For information on LVM and MongoDB, see Back Up and Restore Using LVM on Linux.

Day and time of the week when Atlas should start weekly maintenance on your cluster. You can set your maintenance window in your Project Settings.

Important

Maintenance Window Considerations

Urgent Maintenance Activities Urgent maintenance activities such as security patches cannot wait for your chosen window. Atlas will start those maintenance activities when needed.

Ongoing Maintenance Operations Once maintenance is scheduled for your cluster, you cannot change your maintenance window until the current maintenance efforts have completed.

Maintenance Requires Replica Set Elections Atlas performs maintenance the same way as the maintenance procedure described in the MongoDB Manual. This procedure requires at least one replica set election during the maintenance window per replica set.

Maintenance Starts As Close to the Hour As Possible Maintenance always begins as close to the scheduled hour as possible, but in-progress cluster updates or unexpected system issues could delay the start time.

An aggregation process that has a "map" phase that selects the data and a "reduce" phase that transforms the data. In MongoDB, you can run arbitrary aggregations over data using map-reduce. For the map-reduce implementation, see Map-Reduce. For all approaches to aggregation, see Aggregation Operations.

A structure in programming languages that associate keys with values. Keys may contain embedded pairs of keys and values (for example, dictionaries, hashes, maps, and associative arrays). The properties of these structures depend on the language specification and implementation. Typically, the order of keys in mapping types is arbitrary and not guaranteed.

A hashing algorithm that calculates a checksum for the supplied data. The algorithm returns a unique value to identify the data. MongoDB uses md5 to identify chunks of data for GridFS. See filemd5.

Average of a set of numbers.

In a dataset, the median is the percentile value where 50% of the data falls at or below that value.

An individual mongod process. A replica set has multiple members. A member is also known as a node.

In Queryable Encryption, the internal collections MongoDB uses to enable querying on encrypted fields. See Metadata Collections.

Multipurpose Internet Mail Extensions. A standard set of type and encoding definitions used to declare the encoding and type of data in multiple data storage, transmission, and email contexts. The mongofiles tool provides an option to specify a MIME type to describe a file inserted into GridFS storage.

Number that occurs most frequently in a set of numbers.

The legacy MongoDB shell. The mongo process starts the legacy shell as a daemon connected to either a mongod or mongos instance. The shell has a JavaScript interface.

Starting in MongoDB v5.0, mongo is deprecated and mongosh replaces mongo as the client shell. See mongosh.

The MongoDB database server. The mongod process starts the MongoDB server as a daemon. The MongoDB server manages data requests and background operations. See mongod.

Visualization tool for your Atlas data. You can launch MongoDB Charts from your Atlas cluster and view your data with the Charts application to begin visualizing your data.

Tip

MongoDB Charts

The MongoDB sharded cluster query router. The mongos process starts the MongoDB router as a daemon. The MongoDB router acts as an interface between an application and a MongoDB sharded cluster and handles all routing and load balancing across the cluster. See mongos Instances.

MongoDB Shell. mongosh provides a shell interface to either a mongod or a mongos instance.

Starting in MongoDB v5.0, mongosh replaces mongo as the preferred shell.

Atlas cluster spanning multiple geographic regions. Multi-region clusters can increase availability and improve performance by routing application queries to the most appropriate geographic regions.

Multi-region clusters must contain electable nodes.

Multi-region clusters may contain read-only nodes and analytics nodes.

A namespace is a combination of the database name and the name of the collection or index: <database-name>.<collection-or-index-name>. All documents belong to a namespace. See Namespaces.

Atlas tool that monitors collection-level query latency. You can view query latency metrics and statistics for certain hosts and operation types. Manage pinned namespaces and choose up to five namespaces to show in the corresponding query latency charts.

Tip

Monitor Collection-Level Query Latency

The order recordIds are created and stored in the WiredTiger index. The default sort order for a collection scan run on a single instance is natural order.

In replica sets, natural order is not guaranteed to be consistent and can differ between members.

In sharded collections, natural order is not defined. However, using $natural still forces each shard to perform a collection scan.

For details, see $natural and Return in Natural Order.

A network failure that separates a distributed system into partitions such that nodes in one partition cannot communicate with the nodes in the other partition.

Sometimes, partitions are partial or asymmetric. An example partial partition is the a division of the nodes of a network into three sets, where members of the first set cannot communicate with members of the second set, and the reverse, but all nodes can communicate with members of the third set.

In an asymmetric partition, communication may be possible only when it originates with certain nodes. For example, nodes on one side of the partition can communicate with the other side only if they originate the communications channel.

Process by which two Internet networks connect and exchange traffic. You can directly peer your VPC with the Atlas VPC created for your MongoDB clusters. Using network peering, your application servers can directly connect to Atlas while remaining isolated from public networks.

Tip

Set Up a Network Peering Connection

An individual mongod process. A replica set has multiple nodes. A node is also known as a member.

No Operation (noop), is an I/O operation scheduler that allocates I/O bandwidth for incoming processes based on a first in, first out queue.

NVMe (Non-Volatile Memory Express) is a protocol for accessing high-speed storage media.

Available for M40+ clusters hosted on AWS

For applications hosted on AWS which require low-latency and high-throughput IO, you can use the NVMe cluster class. The NVMe cluster class leverages a unique data protocol to greatly improve data access speeds.

NVMe clusters use a hidden secondary node consisting of a provisioned volume with high throughput and IOPS to facilitate backup.

Tip

Customize Cluster Storage

See ObjectId.

A 12-byte BSON type that is unique within a collection. The ObjectId is generated using the timestamp, computer ID, process ID, and a local process incremental counter. MongoDB uses ObjectId values as the default values for _id fields.

See oplog.

Information about the execution of processes rather than their content, such as the number and time of insert, update, and delete operations.

A rejected query shape. For more details, see Block Slow Queries with Operation Rejection Filters.

See optime.

Any electable node or a read-only node in your Atlas cluster.

A keyword beginning with a $ used to express an update, complex query, or data transformation. For example, $gt is the query language's "greater than" operator. For available operators, see Operators.

A capped collection that stores an ordered history of logical writes to a MongoDB database. The oplog is the basic mechanism enabling replication in MongoDB. See Replica Set Oplog.

A temporary collection created during resharding operations that stores oplog entries from a donor shard.

Oplog buffer collections ensure that recipient shards can access oplog entries when they get deleted from the donor shard. Oplog buffer collections are removed when resharding is complete.

A temporary gap in the oplog because the oplog writes aren't in sequence. Replica set primaries apply oplog entries in parallel as a batch operation. As a result, temporary gaps in the oplog can occur from entries that aren't yet written from a batch.

oplog entries are time-stamped. The oplog window is the time difference between the newest and the oldest timestamps in the oplog. If a secondary node loses connection with the primary, it can only use replication to sync up again if the connection is restored within the oplog window.

A reference to a position in the replication oplog. The optime value is a document that contains:

ts, the Timestamp of the operation.
t, the term in which the operation was originally generated on the primary.

A query plan that returns results in the order consistent with the sort() order. See Query Plans.

Logical grouping of Atlas projects. You can leverage an organization to manage billing, users, and security settings for the projects it contains.

Billing happens at the organization level while preserving visibility into usage in each project.
You can view all projects within an organization.
You can use teams to bulk assign organization users to projects within the organization.

Tip

Organizations

Unique 24-digit hexadecimal string used to identify your Atlas organization. The Return All Organizations endpoint returns the ID of all organizations that the authenticated user executing the API call can access.

A cursor that is not correctly closed or iterated over in your application code. Orphaned cursors can cause performance issues in your MongoDB deployment.

In a sharded cluster, orphaned documents are those documents on a shard that also exist in chunks on other shards. This is caused by a failed migration or an incomplete migration cleanup because of an atypical shutdown.

Orphaned documents are cleaned up automatically after a chunk migration completes. You no longer need to run cleanupOrphaned to delete orphaned documents.

A member of a replica set that cannot become primary because its members[n].priority is 0. See Priority 0 Replica Set Members.

A type of cache that locally stores memory for a specific CPU core. Per-CPU caches are used by the new version of TCMalloc, which is introduced in MongoDB 8.0.

A type of cache that locally stores memory for each application thread. Per-thread caches are used by the legacy version of TCMalloc, which is used in MongoDB 7.0 and earlier.

In a dataset, a percentile is a value where that percentage of the data is at or below the specified value. For details, see Calculation Considerations.

Atlas tool that monitors slow queries executed on your cluster and suggests indexes to improve query performance. Each index that the Performance Advisor suggests include an impact score indicating the potential performance improvement that index would bring.

Tip

Monitor and Improve Slow Queries

A process identifier. UNIX-like systems assign a unique-integer PID to each running process. You can use a PID to inspect a running process and send signals to it. See /proc File System.

A communication channel in UNIX-like systems allowing independent processes to send and receive data. In the UNIX shell, piped operations allow users to direct the output of one command into the input of another.

A series of operations in an aggregation. See Aggregation Pipeline.

A combination of query predicate, sort, projection, and collation. The plan cache query shape allows MongoDB to identify equivalent queries and analyze their performance.

For the query predicate, only the predicate structure and field names are used. The values in the query predicate aren't used. For example, a query predicate { type: 'food' } is equivalent to { type: 'drink' }.

To identify slow queries with the same plan cache query shape, each plan cache query shape has a hexadecimal planCacheShapeHash value. For more information, see planCacheShapeHash and planCacheKey.

Starting in MongoDB 8.0, the existing queryHash field is duplicated in a new field named planCacheShapeHash. If you're using an earlier MongoDB version, you'll only see the queryHash field. Future MongoDB versions will remove the deprecated queryHash field, and you'll need to use the planCacheShapeHash field instead.

A single coordinate pair as described in the GeoJSON Point specification: https://tools.ietf.org/html/rfc7946#section-3.1.2. To use a Point in MongoDB, see GeoJSON Objects.

An array of LinearRing coordinate arrays, as described in the GeoJSON Polygon specification: https://tools.ietf.org/html/rfc7946#section-3.1.6. For Polygons with multiple rings, the first must be the exterior ring and any others must be interior rings or holes.

MongoDB does not permit the exterior ring to self-intersect. Interior rings must be fully contained within the outer loop and cannot intersect or overlap with each other. See GeoJSON Objects.

A document after it was inserted, replaced, or updated. See Change Streams with Document Pre- and Post-Images.

A setting for each collection that allocates space for each document to maximize storage reuse and reduce fragmentation. powerOf2Sizes is the default for TTL Collections. To change collection settings, see collMod.

A document before it was replaced, updated, or deleted. See Change Streams with Document Pre- and Post-Images.

An operation performed before inserting data that divides the range of possible shard key values into chunks to facilitate easy insertion and high write throughput. In some cases pre-splitting expedites the initial distribution of documents in sharded cluster by manually dividing the collection rather than waiting for the MongoDB balancer to do so. See Create Ranges in a Sharded Cluster.

Reduces memory and disk consumption by storing any identical index key prefixes only once, per page of memory. See: Compression for more about WiredTiger's compression behavior.

In a replica set, the primary is the member that receives all write operations. See Primary.

A record's unique immutable identifier. In RDBMS software, the primary key is typically an integer stored in each row's id field. In MongoDB, the _id field stores a document's primary key, which is typically a BSON ObjectId.

Each database in a sharded cluster has a primary shard. It is the default shard for all unsharded collections in the database. See Primary Shard.

A configurable value that helps determine which members in a replica set are most likely to become primary. See members[n].priority.

A combination of specified resource and actions permitted on the resource. See privilege.

Logical grouping of clusters. You can have multiple clusters within a single project and multiple projects within a single organization.

Note

Project is synonymous with group.

Unique 24-digit hexadecimal string used to identify your Atlas project. The Get All Projects API endpoint returns the ID of all projects that the authenticated user executing the API call can access.

Note

Project ID is synonymous with group ID.

A document supplied to a query that specifies the fields MongoDB returns in the result set. For more information about projections, see Project Fields to Return from Query and Projection Operators.

Method of compressing the value of individual dimensions in a vector into a smaller range to reduce resource consumption and improve speed. Atlas Vector Search supports indexing and querying quantized vectors.

A read request. MongoDB uses a JSON form of query language that includes query operators with names that begin with a $ character. In mongosh, you can run queries using the db.collection.find() and db.collection.findOne() methods. See Query Documents.

A combination of the query optimizer and query execution engine that processes an operation.

A keyword beginning with $ in a query. For example, $gt is the "greater than" operator. For a list of query operators, see query operators.

A process that generates query plans. For each query, the optimizer generates a plan that matches the query to the index that returns the results as efficiently as possible. The optimizer reuses the query plan each time the query runs. If a collection changes significantly, the optimizer creates a new query plan. See Query Plans.

Most efficient execution plan chosen by the query planner. For more details, see Query Plans.

An expression that returns a boolean indicating whether a document matches the specified query. For example, { name: { $eq: "Alice" } }, which returns documents that have a field "name" whose value is the string "Alice".

Query predicates can contain child expressions and operators for more complex matching. To see available query operators, see Query and Projection Operators.

Atlas tool that diagnoses and monitors performance issues in your cluster. The Query Profiler can expose long-running queries and their performance statistics. You can filter the data returned by the Query Profiler to hone in on specific namespaces and operation types.

A query shape is a set of specifications that group similar queries. For details, see Query Shapes.

A contiguous range of shard key values within a chunk. Data ranges include the lower boundary and exclude the upper boundary. MongoDB migrates data when a shard contains too much data of a collection relative to other shards. See Data Partitioning with Chunks and Sharded Cluster Balancer.

Relational Database Management System. A database management system based on the relational model, typically using SQL as the query language.

Specifies a level of isolation for read operations. For example, you can use read concern to only read data that has propagated to a majority of nodes in a replica set. See Read Concern.

A shared lock on a resource such as a collection or database that, while held, allows concurrent readers but no writers. See What type of locking does MongoDB use?.

A setting that determines how clients direct read operations. Read preference affects all replica sets, including shard replica sets. By default, MongoDB directs reads to primaries. However, you may also direct reads to secondaries for eventually consistent reads. See Read Preference.

Replica set in a dedicated geographic region that supplements your electable node regions. You can use read-only nodes to localize data where it is most frequently read to improve performance.

Atlas monitoring service that displays current network traffic, database operations on your clusters, and hardware statistics about your host machines. Use the RTPP to visually evaluate query execution times, monitor network activity, and discover potential replication lag on secondary members of replica sets.

Tip

Monitor Real-Time Performance

Measures the fraction of true nearest neighbors that were returned by an ANN search. This measure reflects how close the algorithm approximates the results of ENN search. The notation Recall@k refers to the measurement of how many of the true nearest neighbors were present in the top k results returned by Atlas Vector Search.

A replica set member status indicating that a member is not ready to begin activities of a secondary or primary. Recovering members are unavailable for reads.

The CPU utilization relative to the amount of baseline CPU assigned to a cloud instance. You can calculate relative system CPU utilization by dividing the absolute system CPU utilization by the amount of baseline CPU assigned to a cloud instance.

MongoDB caps relative system CPU utilization at 100%. When a cloud provider throttles CPU utilization for a cloud instance, or bursts CPU utilization for an instance above the baseline amount of CPU available to that instance, the relative system CPU value is 100%.

Group of MongoDB servers that maintain the same data set. Replica sets provide redundancy, high availability, and are the basis for all production deployments.

A feature allowing multiple database servers to share the same data. Replication ensures data redundancy and enables load balancing. See Replication.

The time period between the last operation in the primary's oplog and the last operation applied to a particular secondary. You typically want replication lag as short as possible. See Replication Lag.

The subset of an application's memory currently stored in physical RAM. Resident memory is a subset of virtual memory, which includes memory mapped to physical RAM and to storage.

A database, collection, set of collections, or cluster. A privilege permits actions on a specified resource. See resource.

A set of privileges that permit actions on specified resources. Roles assigned to a user determine the user's access to resources and operations. See Security.

A process that reverts write operations to ensure the consistency of all replica set members. See Rollbacks During Replica Set Failover.

Process that restarts all nodes in the cluster in sequence. To maintain cluster availability, Atlas restarts one node at a time starting with a secondary node. Atlas always maintains a primary node until the rolling restart completes.

Scalar quantization involves selecting the minimum and maximum values across all indexed vectors within a segment for each dimension, and producing equally sized bins between them. The mappings for each of these dimensions to the bins yields the new quantized values. Atlas Vector Search supports automatic scalar quantization for your float32 vectors, and ingestion and indexing of your scalar quantized vectors from embedding providers.

A replica set member that replicates the contents of the master database. Secondary members may run read requests, but only the primary members can run write operations. See Secondaries.

A database index that improves query performance by minimizing the amount of work that the query engine must perform to run a query. See Indexes.

See secondary. Also known as a secondary node.

A seed list is used by drivers and clients (like mongosh) for initial discovery of the replica set configuration. Seed lists can be provided as a list of host:port pairs (see Standard Connection String Format or through DNS entries.) For more information, see SRV Connection Format.

A MongoDB instance that is set up and maintained by an individual or organization, and not an external management or third-party services (such as MongoDB Atlas).

Search for values that have a similar meaning to query. Semantic search captures the natural relationship between words or phrases even when there is no lexical overlap. Semantic search and vector search are often used interchangeably. Atlas Vector Search supports semantic search on vector data stored in Atlas clusters.

The arbitrary name given to a replica set. All members of a replica set must have the same name specified with the replSetName setting or the --replSet option.

A single mongod instance or replica set that stores part of a sharded cluster's total data set. Typically, in a production deployment, ensure all shards are part of replica sets. See Shards.

The field MongoDB uses to distribute documents among members of a sharded cluster. See Shard Keys.

The set of nodes comprising a sharded MongoDB deployment. A sharded cluster consists of config servers, shards, and one or more mongos routing processes. See Sharded Cluster Components.

A database architecture that partitions data by key ranges and distributes the data among two or more database instances. Sharding enables horizontal scaling. See Sharding.

Cluster category containing M0 (free tier) tier clusters. Shared clusters are generally used for development and small production workloads.

Tip

Atlas M0 (Free Cluster), M2, and M5 Limits

A method in mongosh that has a concise syntax for a database command. Shell helpers improve the interactive experience. See mongosh Methods.

Measures the similarity between two vectors. Atlas Vector Search supports euclidean, cosine, and dotProduct similarity functions.

A replication topology where only a single database instance accepts writes. Single-master replication ensures consistency and is the replication topology used by MongoDB. See Replica Set Primary.

A compression/decompression library to balance efficient computation requirements with reasonable compression rates. Snappy is the default compression library for MongoDB's use of WiredTiger. See WiredTiger compression documentation for more information.

A snapshot is a copy of the data in a mongod instance at a specific point in time. You can retrieve snapshot metadata for the whole cluster or replica set, or for a single config server in a cluster.

The CPU utilization metric that reflects a portion of CPU that a cloud instance currently uses to process software interrupt requests. On some cloud providers, this metric is useful for tracking CPU utilization on burstable instances.

The value compared against when sorting fields. To learn how MongoDB determines the sort key for non-numeric fields, see Comparison/Sort Order.

The division between chunks in a sharded cluster. See Data Partitioning with Chunks.

Structured Query Language (SQL) is used for interaction with relational databases.

Solid State Disk. High-performance storage that uses solid state electronics for persistence instead of rotating platters and movable read/write heads used by mechanical hard drives.

A stale read refers to when a transaction reads old (stale) data that has been modified by another transaction but not yet committed to the database.

An instance of mongod that runs as a single server and not as part of a replica set. To convert a standalone instance to a replica set, see Convert a Standalone Self-Managed mongod to a Replica Set.

Note

A standalone instance is not a replica set with only one member.

A temporary collection created on the recipient shard for each donor shard during resharding operations.

Stash collections temporarily hold documents that cannot be immediately inserted due to operation conflicts. For example, if a document's shard key has been updated, it now belongs to a different shard, and the order of operations applied to this document can be ambiguous. The recipient stores these documents in a stash collection until it can apply operations in the correct order.

The primary member of the replica set removes itself as primary and becomes a secondary member.

If a replica set loses contact with the primary, the secondaries elect a new primary. When the old primary learns of the election, it steps down and rejoins the replica set as a secondary.
If the user runs the replSetStepDown command, the primary steps down, forcing the replica set to elect a new primary.

The part of a database that is responsible for managing how data is stored and accessed, both in memory and on disk. Different storage engines perform better for specific workloads. See Storage Engines for Self-Managed Deployments for specific details on the built-in storage engines in MongoDB.

See natural order.

A property of a distributed system requiring that all members contain the latest changes to the system. In a database system, this means that any system that can provide data must contain the latest writes.

Subject Alternative Name (SAN) is an extension of the X.509 certificate which allows an array of values such as IP addresses and domain names that specify the resources a single security certificate may secure.

The replica set operation where members replicate data from the primary. Sync first occurs when MongoDB creates or restores a member, which is called initial sync. Sync then occurs continually to keep the member updated with changes to the replica set's data. See Replica Set Data Synchronization.

On UNIX-like systems, a logging process that provides a uniform standard for servers and processes to submit logging information. MongoDB provides an option to send output to the host's syslog system. See syslogFacility.

A label applied to a replica set member and used by clients to issue data-center-aware operations. For more information on using tags with replica sets, see Read Preference Tag Set Lists.

Note

Sharded cluster zones replace tags.

A document containing zero or more tags.

For a capped collection, a tailable cursor is a cursor that remains open after the client exhausts the results in the initial cursor. As clients insert new documents into the capped collection, the tailable cursor continues to retrieve documents.

Group of Atlas users in the same organization. You can use teams to grant access to the same group of Atlas users across multiple projects. All users in the team share the same project access.

For the members of a replica set, a monotonically increasing number that corresponds to an election attempt.

A collection that efficiently stores sequences of measurements over a period of time. See Time Series.

The state of a deployment of MongoDB instances. Includes:

Type of deployment (standalone, replica set, or sharded cluster).
Availability of servers.
Role of each server (primary, secondary, config server, or mongos).

Group of read or write operations. For details, see Transactions.

A component of MongoDB that manages transactions in a replica set or a sharded cluster. It coordinates the execution and completion of multi-document transactions across nodes and allows a complex operation to be treated as an atomic operation.

A text-based data format consisting of tab-separated values. This format is commonly used to exchange data between relational databases because the format is suited to tabular data. You can import TSV files using mongoimport.

Time-to-live (TTL) is an expiration time or period for a given piece of information to remain in a cache or other temporary storage before the system deletes it or ages it out. MongoDB has a TTL collection feature. See Expire Data from Collections by Setting TTL.

An array that consistently grows larger over time. If a document field value is an unbounded array, the array may negatively impact performance. In general, design your schema to avoid unbounded arrays.

An index that enforces uniqueness for a particular field in a single collection. See Unique Indexes.

January 1st, 1970 at 00:00:00 UTC. Commonly used in expressing time, where the number of seconds or milliseconds since this point is counted.

A query plan that returns results in an order inconsistent with the sort() order. See Query Plans.

An option for update operations. For example: db.collection.updateOne(), db.collection.findAndModify(). If upsert is true, the update operation either:

updates the document(s) matched by the query.
or if no documents match, inserts a new document. The new document has the field values specified in the update operation.

For more information about upserts, see Insert a New Document if No Match Exists (Upsert).

System that stores vector embeddings and associated metadata, and enables nearest neighbor search on the stored vector embeddings. You can use Atlas as your vector database and Atlas Vector Search to perform vector search on the stored vector embeddings. You can use vector database to implement RAG.

Data structure that efficiently processes nearest neighbor search queries. Atlas Vector Search supports creating indexes of type vector to index fields for running $vectorSearch queries.

Method of performing k nearest neighbor search over a set of vectors stored in a vector index. Atlas Vector Search supports ANN and ENN search for k nearest neighbors.

An application's working memory, typically residing on both disk and in physical RAM.

The default reference system and geodetic datum that MongoDB uses to calculate geometry over an Earth-like sphere for geospatial queries on GeoJSON objects. See the "EPSG:4326: WGS 84" specification: http://spatialreference.org/ref/epsg/4326/.

Returns values from a span of documents from a collection. See window operators.

The data that MongoDB uses most often.

Specifies whether a write operation has succeeded. Write concern allows your application to detect insertion errors or unavailable mongod instances. For replica sets, you can configure write concern to confirm replication to a specified number of members. See Write Concern.

A situation where two concurrent operations, at least one of which is a write, try to use a resource that violates the constraints for a storage engine that uses optimistic concurrency control. MongoDB automatically ends and retries one of the conflicting write operations.

An exclusive lock on a resource such as a collection or database. When a process writes to a resource, it takes an exclusive write lock to prevent other processes from writing to or reading from that resource. For more information on locks, see FAQ: Concurrency.

A data compression library that provides higher compression rates at the cost of more CPU, compared to MongoDB's use of snappy. You can configure WiredTiger to use zlib as its compression library. See WiredTiger compression documentation for more information.

A grouping of documents based on ranges of shard key values for a given sharded collection. Each shard in the sharded cluster can be in one or more zones. In a balanced cluster, MongoDB directs reads and writes for a zone only to those shards inside that zone. See the Zones manual page for more information.

A data compression library that provides higher compression rates and lower CPU usage when compared to zlib.