Manticore Docs

0 views

Skip to first unread message

Geneva Andreotti

unread,

Aug 5, 2024, 5:02:48 AM8/5/24

to beitantsurmo

Asan open-source database (available on GitHub), Manticore Search was created in 2017 as a continuation of Sphinx Search engine. Our development team took all the best features of Sphinx and significantly improved its functionality, fixing hundreds of bugs along the way (as detailed in our Changelog). With nearly complete code rewrites, Manticore Search is now a modern, fast, and light-weight database with full features and exceptional full-text search capabilities.

Manticore Search supports the ability to add embeddings generated by your Machine Learning models to each document, and then doing a nearest-neighbor search on them. This lets you build features like similarity search, recommendations, semantic search, and relevance ranking based on NLP algorithms, among others, including image, video, and sound searches.

The cost-based query optimizer uses statistical data about the indexed data to evaluate the relative costs of different execution plans for a given query. This allows the optimizer to determine the most efficient plan for retrieving the desired results, taking into account factors such as the size of the indexed data, the complexity of the query, and the available resources.

Manticore offers both row-wise and column-oriented storage options to accommodate datasets of various sizes. The traditional and default row-wise storage option is available for datasets of all sizes - small, medium, and large, while the columnar storage option is provided through the Manticore Columnar Library for even larger datasets. The key difference between these storage options is that row-wise storage requires all attributes (excluding full-text fields) to be kept in RAM for optimal performance, while columnar storage does not, thus offering lower RAM consumption, but with a potential for slightly slower performance (as demonstrated by the statistics on -benchmarks.com/).

Manticore Columnar Library uses Piecewise Geometric Model index, which exploits a learned mapping between the indexed keys and their location in memory. The succinctness of this mapping, coupled with a peculiar recursive construction algorithm, makes the PGM-index a data structure that dominates traditional indexes by orders of magnitude in space while still offering the best query and update time performance. Secondary indexes are ON by default for all numeric fields.

The Manticore Search daemon is developed in C++, offering fast start times and efficient memory utilization. The utilization of low-level optimizations further boosts performance. Another crucial component, called Manticore Buddy, is written in PHP and is utilized for high-level functionality that does not require lightning-fast response times or extremely high processing power. Although contributing to the C++ code may pose a challenge, adding a new SQL/JSON command using Manticore Buddy should be a straightforward process.

Data can be distributed across servers and data centers with any Manticore Search node acting as both a load balancer and a data node. Manticore implements virtually synchronous multi-master replication using the Galera library, ensuring data consistency across all nodes, preventing data loss, and providing exceptional replication performance.

Manticore is equipped with an external tool manticore-backup, and the BACKUP SQL command to simplify the process of backing up and restoring your data. Alternatively, you can use mysqldump to make logical backups.

Manticore offers a special table type, the "percolate" table, which allows you to search queries instead of data, making it an efficient tool for filtering full-text data streams. Simply store your queries in the table, process your data stream by sending each batch of documents to Manticore Search, and receive only the results that match your stored queries.

We love search and we've made our best to make searching in this manual as convenient as possible. Of course it's backed by Manticore Search. Besides using the search bar which requires opening the manual first there is a very easy way to find something by just opening mnt.cr/your-search-keyword :

You cannot combine the 2 modes and need to decide which one you want to follow by specifying data_dir in your configuration file (which is the default behaviour). If you are unsure our recommendation is to follow the RT mode as if even you need a plain table you can build it with a separate plain table config and import to your main Manticore instance.

Real-time tables can be used in both RT and plain modes. In the RT mode a real-time table is defined with a CREATE TABLE command, while in the plain mode it is defined in the configuration file. Plain (offline) tables are supported only in the plain mode. Plain tables cannot be created in the RT mode, but existing plain tables made in the plain mode can be converted to real-time tables and imported in the RT mode.

Package manticore implements Client to work with manticoresearch over it's internal binary protocol.Also in many cases it may be used to work with sphinxsearch daemon as well.It implements Client connector which may be used as

Set of functions is mostly imitates API description of Manticoresearch for PHP, but with fewchanges which are specific to Go language as more effective and mainstream for that language (as, for example,error handling).

Checking if a document matches any of the predefined criterias (queries) performed viaCallPQ function, or via http /json/pq//_search endpoint.They returns list of matched queries and may be additional info as matching clause,filters, and tags.

`words` is a string that contains the keywords to highlight.They will be processed with respect to index settings. For instance, if English stemming isenabled in the index, shoes will be highlighted even if keyword is shoe. Keywords can contain wildcards,that work similarly to star-syntax available in queries.

Snippets extraction algorithm currently favors better passages (with closer phrase matches),and then passages with keywords not yet in snippet. Generally, it will try to highlight the best matchwith the query, and it will also to highlight all the query keywords, as made possible by the limits.In case the document does not match the query, beginning of the document trimmed down according to the limitswill be return by default. You can also return an empty snippet instead case by setting allow_empty option to true.

BuildKeywords extracts keywords from query using tokenizer settingsfor given index, optionally with per-keyword occurrence statistics.Returns an array of hashes with per-keyword information. If necessary it will connect to the server before processing.

`opts` packed options. See description of SearchPqOptions for details.In general you need to make instance of options by calling NewSearchPqOptions(), setdesired flags and options, and then invoke CallPQ, providing desired index, set of documents and the options.

Attribute values updated using UpdateAttributes() API call are kept in a memory mapped file.Which means the OS decides when the updates are actually written to disk. FlushAttributes() call lets you enforcea flush, which writes all the changes to disk. The call will block until searchd finishes writing the data to disk,which might take seconds or even minutes depending on the total data size (.spa file size).All the currently updated indexes will be flushed.

You should call it to verify whether your request (such as Query()) was completed but with warnings.For instance, search query against a distributed index might complete successfully even if several remote agentstimed out. In that case, a warning message would be produced.

The warning message is not reset by this call; so you can safely call it several times if needed.If you issued multi-query by running RunQueries(), individual warnings will not be written in client; insteadcheck the Warning field in each returned result of the slice.

IsConnectError checks whether the last error was a network error on API side, or a remote error reported by searchd.Returns true if the last connection attempt to searchd failed on API side, false otherwise(if the error was remote, or there were no connection attempts at all).

Json pefrorms remote call of JSON query, as if it were fired via HTTP connection.It is intented to run updates and deletes, however works sometimes in other cases.General rule: if the endpoint accepts data via POST, it will work via Json call.

RunQueries connects to searchd, runs a batch of queries, obtains and returns the result sets.Returns nil and error message on general error (such as network I/O failure).Returns a slice of result sets on success.

However individual queries within the batch might very well fail. In this case their respectiveresult sets will contain non-empty `error` message, but no matches or query statistics.In the extreme case all queries within the batch could fail. There still will be no general error reported,because API was able to successfully connect to searchd, submit the batch, and receive the results -but every result set will have a specific error message.

Under some circumstances, the server can be delayed in responding, either due to network delays, or a query backlog.In either instance, this allows the client application programmer some degree of control over how their programinteracts with searchd when not available, and can ensure that the client application does not fail due to exceedingthe execution limits.

SetMaxAlloc limits size of client's network buffer. For sending queries and receiving results client reuses byte array,which can grow up to required size. If the limit reached, array will be released and new one will be created. UsuallyAPI needs just few kilobytes of the memory, but sometimes the value may grow significantly high. For example, if you fetch abig resultset with many attributes. Such resultset will be properly received and processed, however at the next querybackend array which used for it will be released, and occupied memory will be returned to runtime.