Elasticsearch Configurations I Learned The Hard Way

2016-12-31

cluster, configuration, elastic, elasticsearch, elk, head, production

Sadly some things just aren’t in the basic manual, and you just need to figure them out as you go along…

Naming

Most common missed setting is node.name. Yeah, I know it’s fun to see all those Marvel character names come up every time you restart a node, but it’s much harder to make sense of stats if you don’t know what server they are coming from.

Split brain

Always set discovery.zen.minimum_master_nodes to (at least) number_of_master_eligible_nodes//2 + 1, you otherwise risk a situation where some nodes who cannot communicate with the cluster (for any reason) might create another cluster with the same name, completely separate from the original cluster, with its own master and shards.

Turn off the self-destruct button

Don’t wait until someone on your team accidentally sends a DELETE * to the wrong server. Just don’t let it happen in the first place.

Memory

You probably already know that swapping kills performance. To disable it, set the bootstrap.mlockall: true config option, to lock the process address space to RAM.

If you do this you should also define the ES_HEAP_SIZE variable. Note that elastic recommends giving elasticsearch no more than half of the machine’s memory (to leave some room for Lucene), and no more than 32 GB (To keep 32-bit pointers).

Monitoring

In order to detect cluster issues it is important to have a visual tool to see cluster state - shards allocation, master nodes, and data nodes. Either include this in your monitoring visualizations or use an existing tool, personally, I like head.

Recovery time

For large clusters with many shards, shard recovery can take plenty of time, this can be tweaked with the cluster.routing.allocation.node_concurrent_recoveries and indices.recovery.max_bytes_per_sec settings. Keep in mind that this is an IO-intensive operation, so the speeds should match your hardware limitations.

Dedicated Master Nodes

For large clusters with many concurrent queries, have your master nodes separate from the data nodes. This is done set by setting node.master: false or node.data: false in the proper node configurations, and has multiple benefits:

Lower load on data nodes, so they’ll be more available to answer queries - each node that is an eligible master must maintain the cluster state (if the current master goes down) - and this takes resources.
Makes sure the master node is free to keep the cluster healthy.
No split brain problems when expanding your cluster - you won’t need to change the minimum master node and restart all nodes every time you add a node to the cluster.