Various limits are applied to eDiscovery search tools in the Microsoft Purview compliance portal. This includes searches run on the Content search page and searches that are associated with an eDiscovery case on the eDiscovery (Standard) page. These limits help to maintain the health and quality of services provided to organizations. There are also limits related to the indexing of email messages in Exchange Online for search. You can't modify the limits for eDiscovery searches or email indexing, but you should be aware of them so that you can take these limits into consideration when planning, running, and troubleshooting eDiscovery searches.
If you're not an E5 customer, use the 90-day Microsoft Purview solutions trial to explore how additional Purview capabilities can help your organization manage data security and compliance needs. Start now at the Microsoft Purview compliance portal trials hub. Learn details about signing up and trial terms.
Sites: 4,000 when searching all sites or 2,000 when searching up to 20 sites. 3The maximum number of variants returned when using a prefix wildcard to search for an exact phrase in a search query or when using a prefix wildcard and the NEAR Boolean operator.10,000 4The minimum number of alpha characters for prefix wildcards; for example, time*, one*, or set*.3The maximum number of mailboxes in a search that you can delete items in by doing a "search and purge" action (by using the New-ComplianceSearchAction -Purge command). If the search that you're doing a purge action for has more source mailboxes than this limit, the purge action will fail. For more information about search and purge, see Search for and delete email messages in your organization.50,000The maximum number of locations in a search that you can export items from. If the search that you're exporting has more locations than this limit, the export will fail. For more information, see Export content search results.100,000Note
1 Although you can search an unlimited number of mailboxes in a single search, you can only download the exported search results from a maximum of 100,000 mailboxes using the eDiscovery Export Tool in the compliance portal.
2 The intent of the preview page is to show a limited sample of the results. Even for massive searches with thousands of results, the number of items shown on the preview page can, and often will, be much less than maximum possible value of 1000. To see the complete search results, you need to export the results.
3 When searching SharePoint and OneDrive for Business locations, the characters in the URLs of the sites being searched are counted against this limit. This limit takes effect after the query is expanded and includes characters from the keyword query, any search permissions filters applied to the user, and the URLs of all site locations. This means the query will get expanded against each of the keywords. For example, if a search query has 15 keywords and additional parameters and conditions, the query gets expanded 15 times, each with the other parameters and conditions in the query. So even though the number of characters in the search query may be below the limit, it's the expanded query that may contribute to exceeding this limit.
5 For non-phrase queries (a keyword value that doesn't use double quotation marks) we use a special prefix index. This tells us that a word occurs in a document, but not where it occurs in the document. To do a phrase query (a keyword value with double quotation marks), we need to compare the position within the document for the words in the phrase. This means that we can't use the prefix index for phrase queries. In this case, we internally expand the query with all possible words that the prefix expands to; for example, "time*" can expand to "time OR timer OR times OR timex OR timeboxed OR ...". 10,000 is the maximum number of variants the word can expand to, not the number of documents matching the query. There is no upper limit for non-phrase terms.
Microsoft collects performance information for searches run by all organizations. While the complexity of the search query can impact search times, the biggest factor that affects how long searches take is the number of mailboxes searched. Although Microsoft doesn't provide a Service Level Agreement for search times, the following table lists average search times for collection searches based on the number of mailboxes included in the search.
3 If the search results from a user's mailbox are larger than 10 GB, the search results for the mailbox will be exported in two (or more) separate PST files. If you choose to export all search results in a single PST file, the PST file will be spilt into additional PST files if the total size of the search results is larger than 10 GB.
1 Parsing is the process where the indexing service extracts text from the attachment, removes unnecessary characters like punctuation and spaces, and then divides the text into words (in a process called tokenization), that are then stored in the index.
Check out the Resolve common eDiscovery issues article for basic troubleshooting steps that you can take to identify and resolve issues that you might encounter during an eDiscovery search or elsewhere in the eDiscovery process.
A gigabyte (GB) -- pronounced with two hard Gs -- is a unit of data storage capacity that is roughly equivalent to 1 billion bytes. In decimal notation (base 10), a gigabyte is exactly 1 billion bytes. In binary notation (base 2), a gigabyte is equal to 230 bytes, or 1,073,741,824 bytes. Giga comes from a Greek word meaning giant. Werner Buchholz is credited with coining the term byte in 1956, while helping design IBM's 7030 Stretch, the first transistorized supercomputer.
A gigabyte has been a common unit of capacity measurement for data storage products since the mid-1980s. In recent years, terabytes (TB) have become a more common unit of storage capacity measurement, especially for hard disk drives (HDDs) and solid-state drives (SSDs).
Cloud providers and hardware vendors still often refer to storage capacity costs in terms of the amount per gigabyte, although that's been slowly transitioning to costs per terabyte. Today's HDDs and flash SSDs can easily store hundreds of gigabytes of data or even thousands, which is why the TB label has been steadily replacing GB in many instances.
For example, an HDD might offer 500 GB of raw capacity but is currently storing only 200 GB of data. In addition to stored data, the gigabyte might also be used to refer to the amount of transferred data or storage throughput.
The difference between the decimal amount and the binary amount has caused a fair amount of confusion among consumers, especially if they also come across such terms as kibibyte, mebibyte, gibibyte or tebibyte.
Computer, storage and network systems use two standards to measure the number of bytes in a gigabyte: base 10 and base 2. The base 10 definition of gigabyte uses the decimal system to show that 1 GB is equal to 109 bytes, or 1 billion bytes. Today, most storage manufacturers and consumers use the base 10 standard to define a gigabyte.
Computers and their operating systems often use the base 2, or binary, form of measurement, in which 1 GB equals 1,073,741,824 bytes. In this model, a gigabyte is sometimes referred to as a gibibyte, although not all vendors take this approach, thus the confusion. In the early days of storage, this wasn't a significant issue because the discrepancy between the base 10 and base 2 measurements wasn't substantial. However, the differences became more pronounced as vendors started manufacturing storage media with more capacity.
Because of the differences between the two standards, users might see discrepancies on their systems in how the amount of storage is reported. For example, a manufacturer might show the capacity of an HDD as 500 GB, but the computer reports the HDD's capacity as 466 GB. Fortunately, many systems now use GiB when specifically referring to gibibytes, helping to clarify the differences.
With today's data-intensive workloads, a gigabyte of data can be used up quickly. A single-layer digital video disk (DVD) can hold only 4.7 GB of data, and a double-layer disc can hold only 8.5 GB, a drop in the bucket compared to a 10 TB HDD. Even a typical laptop or desktop computer contains only 8 GB or 16 GB of RAM.
When purchasing smartphones, customers can often choose from multiple storage capacity options, which are typically based on available gigabytes. The more gigabytes, the higher the storage capacity -- and the higher the price tag.
For many customers, storage capacity is often one of the most important factors when choosing a phone. For example, Apple's iPhone 13 Pro offers four capacity options -- 128 GB, 256 GB, 512 GB and 1 TB -- while Samsung's Galaxy Z Flip3 5G offers only two options -- 128 GB or 256 GB. Most of today's phones use gigabyte as the measure of storage capacity, but as the iPhone 13 Pro indicates, it might be just a matter of time before gigabyte is supplanted by terabyte.
Regardless of which storage options that customers choose, they should be aware that available capacity is often less than the total capacity. For example, an iPhone might use between 11 GB and 14 GB of storage space for iOS and the preinstalled apps -- the exact amount depends on the model and settings. Software updates can also affect available capacity.
In addition, gigabyte measurements can play a role when smartphone customers select their cellular service plans. These days, many plans offer unlimited calling and texting but restrict the amount of data that can be transferred to or from the device. In fact, the data amount is often the prime differentiator between plans.
For instance, T-Mobile offers the Essentials, Magenta and Magenta MAX plans. The Essentials plan provides 50 GB of data, the Magenta plan provides 100 GB and the MAX plan offers unlimited data. In addition, the Magenta plan gets up to 5 GB of mobile hotspot data, and the MAX plan gets 40 GB of high-speed data and unlimited data at 3G speeds.
c80f0f1006