When you are in the AWS console you may have noticed that there are two options for search, ElasticSearch Service and CloudSearch. Both of these are in different sections of the console and both seem to offer search, but which should you use? As we are in need of search at Optimal now was a good time to go on a fact-finding mission and answer a few questions.
What are AWS ElasticSearch Service and AWS CloudSearch?
ElasticSearch is an open source product from Elastic that’s is designed to help us search in a way that is highly available and abstracted from our datastore. Amazon ElasticSearch Service is AWS hosted ElasticSearch that takes care of set-up and management of the back end server and provides us with an endpoint that we can get developing with.
AWS CloudSearch is an AWS designed searching interface that (since 2014) is backed by Apache Solr. This mainly uses the AWS SDK (the native language libraries for AWS) or HTTP queries and is exclusive to the AWS cloud. Again set-up and management of AWS CloudSearch server are taken care of by Amazon and we just have to develop and not worry about our search tripping over and falling flat on its face.
How does initial setup work?
Initial setup is relatively easy in the AWS Cloud. Both services get up and running from the AWS Console without any extra configuration or custom tools, and the endpoints you require are all provided clearly from the page. Once we have a search domain up and running there are a few different things for getting the index populated.
With CloudSearch we get many more buttons to get us started without having to write any code. The AWS Console can help us out with many things including defining our indexes from an example DynamoDB Table, S3 Bucket or document or a file that we upload through the console. Though not a perfect system it is a very good place to start and for many data sets providing a very good index definition. Once we have an index defined the console also give us an easy upload document button which will allow us to upload a document and start searching straight away. This is great if you have a static set of data (a store catalogue maybe) and just need data in so that you can search from a web portal. The CloudSearch portal also has a few tools to help us develop and debug, one being the ability to search right there in the portal and another to set up Expressions and Suggesters.
ElasticSearch is a little different once the endpoint is up and running. There are few options available from the console and all of them are around scaling and management of the ElasticSearch cluster. This is not of huge consequence though as ElasticSearch is one of the most popular (if not the most popular) search tools/frameworks around. There are many third-party tools and libraries to get you started with searching and with AWS managing the servers ElasticSearch can be a really quick way to get searching. Speaking of tools, AWS set’s up Kibana for any ElasticSearch service running so we can easily get some searching and visualizations running once we have the index populated.
What is searching like?
ElasticSearch is backed by Apache Lucene, a Java library that many companies have used to power search before. CloudSearch is backed by Apache Solr which (you guessed it) is backed by Apache Lucene. Both of these engines then produce very good Search performance and great results. They also sweeten searching with things like help with spelling (because we are all 8 years old inside), suggesters and category search. If your index is run well and populated there is no reason searching shouldn’t be easy with these tools, just build a web page and away you go.
How hard is developing/integrating?
Development for the ElasticSearch offering is easy and mostly involves interacting with HTTP endpoints. AWS provides an easy to use interface for securing interaction as well so this is rather secure out of the box. If you or your team don’t like HTTP endpoints then there are many libraries in most languages.
CloudSearch is mostly limited to using the AWS SDK or the AWS CLI. This works well for teams that have some initial knowledge on AWS Access Keys, permissions, IAM users and other such AWS fun. Once you have that figured out though it’s mostly the same as using a library with ElasticSearch.
How hard is it to keep running?
So now we have Search in the cloud how hard is it to stop the whole thing falling over or being so slow that it’s unusable? Well as we know scaling with AWS is actually very easy, and with both CloudSearch and AWS ElasticSearch Service failover is free and built-in. As expected for both these services, when/if a node goes down the service will bring another one up and bring it into line as soon as it can.
With AWS CloudSearch autoscaling happens automatically and without any config or setup required. The only choices we have here are the replication of our search data and whether we are deployed in multiple availability zones or not (Multi AZ). This means that as your search gets used more or becomes more popular Amazon will automatically provide a positive throughput when it’s under strain so that no one notices that everyone suddenly needs to search for a replacement rubber ducky.
With ElasticSearch scaling is a little more of a manual process. When we setup an ElasticSearch Cluster we pick the number of instances and the size of instances that are in the cluster, but due to how ElasticSearch functions autoscaling would cause some weird performance hiccups. Instead, we just have standard scaling out, where when we see our cluster struggling we should add extra instances permanently. Adding extra instances is an easy task and all done through the AWS console.
How much do I have to spend?
AWS CloudSearch has a varying price point depending on how much throughput your search system is getting and what size server your cluster is using. It will then scale up this cluster as the throughput increases as this will cost you more for the time that the cluster is bigger. If you want MultiAZ or increased duplication count then this will cost you more as well. ElasticSearch service is in a similar cost space but the smallest server size is smaller (and, therefore, cheaper) than the smallest CloudSearch service. This won’t make much difference to most teams, but if you are a small team or if Search is a very small part of a product then ElasticSearch might save you a few cents. Once the search starts to get used and the cluster is of a standard size then there is no real difference in costs.
So, which do I choose?
As normal with AWS, there are horses for courses. If you are a team developing a product that needs to be deployed outside the AWS space or is developing something now that you plan to bring to AWS later than the only choice is ElasticSearch as this runs outside the cloud with the same (or similar) codebase. AWS CloudSearch is a good answer if you really don’t care about how the servers work and just want quick reliable search for your application that does not get slow as you get more popular. Of course, even if you do deploy outside the AWS cloud having your search in the cloud is not a bad idea but if you are in the AWS cloud then it is a no-brainer. Either way, searching is a lot easier than it used to be (as much as we love Zookeeper) and all the applications that deserve it can now get a quick rich search with minimal management.
Hope that answered any questions that you had, if not drop them in the comments and I will do my best to answer them.
Until next time!
Coffee to Code – Tim Gray
Tim blogs about the sharp end of code and the languages it is written in.
You can read, Dropbox security breach, how to protect yourself better, or all of Tim’s blogs here.
We run regular business intelligence courses in both Wellington and Auckland. Find out more here.