What is elasticsearch? Elasticsearch is a scalable open source search engine and database that has been gaining increasing popularity among developers building cloud-based systems. When properly configured, it is capable of ingesting and efficiently querying large volumes of data super quickly.
It is very straightforward to build and deploy an Elasticsearch cluster on to Azure. What you can do is, you can create a set of Windows or Linux VMs, then download the appropriate Elasticsearch packages to install it on each VM. As a proper alternative, there is a published ARM template you can use with the Azure portal to automate most of the process.
We have found that Elasticsearch is highly configurable, but we’ve also witnessed many systems where a poor selection of options has led to slow performance. One explanation for this is that there are many factors that you need to take into account in order to achieve the best and most responsive system, including:
* The cluster topology (client nodes, master nodes and data nodes)
* The structure of each index (the number of shards and replicas to specify)
* The virtual hardware (disk capacity and speed, amount of memory, number of CPUs)
* The allocation of resources on each cluster (disk layout, Java Virtual Machine memory usage, Elasticsearch queues and threads, I/O buffers)
Do not consider these items in isolation, because the nature of workloads you are running will also have great influence on the performance of the system. An installation optimized for data ingestion might not be well-tuned for queries, and vice versa. Hence, you will need to balance the requirements of the different operations your system needs to support. For those reasons, there was considerable time working through various series of configurations, performing numerous tests and analyzing the results.
The purpose being to demonstrate how you can design and build an Elasticsearch cluster to suit your own particular requirements, and to also show you how you can test and tune performance. This guidance is now available in Azure documentation. There you will find a series of documents covering the following:
* General guidance on Elasticsearch, describing the configuration options available and how you can apply them to a cluster running on Azure
* Specific guidance on deploying, configuring, and testing an Elasticsearch cluster that must support a high level of data ingestion operations
* Guidance and considerations for Elasticsearch systems that must support mixed workloads and/or query-intensive systems
Apache JMeter was used to conduct performance tests and incorporated JUnit tests written using Java. Then, captured the performance data as a set of CSV files and used Excel to graph and analyze the results. Elasticsearch Marvel was used to monitor systems while the tests were running.
If you’d like to try these steps on your own setup, the documentation provides instructions on how to create your own JMeter test environment and gather performance information from Elasticsearch, in addition to providing scripts to run our JMeter tests.