Select Page

Have you heard about the full stack web developers? Unlike those developers who are specialised in front-end or back-end, a full stack developer can do all of that alone. My area is databases. I’m quite confident in calling myself a relational databases specialist. Last year I delved into graph databases, and wrote a couple of blogs about the experience. This year has started with the quick dive into MongoDB, which is a document store. I’d like to think that I’m acquiring skills of a full stack data developer, if we allow such role to exist.

OrientDB Concepts

I found OrientDB to be a perfect tool if you’d like to get the most of NoSQL experience at once, as this platform combines both document and graph databases functionality. The simple solution for that is that nodes and edges, the concepts of graph database, are also documents. Another concept borrowed from the object-oriented programming is class. Class is a collection of documents, but unlike collections in, for example, MongoDB, class can extend another class or be abstract, i.e. this class exists only to hold the structure that can be inherited by other classes, and you can’t create a document of this class. This is how the documents could be enforced to have a structure. For example, nodes, which are called vertices in OrientDB, are documents of the base class V, or the child classes of V. To query data you can use SQL-like syntax. Also you can write functions in SQL or JavaScript. OrientDB server has RESTful API, so you can manage or query documents via HTTP requests. Do you feel dizzy yet? I am. On one hand this mixture of concepts and technologies is overwhelming, but I also find it combined creatively with much elegance.

Getting Started

I had tried to install OrientDB first and get my hands dirty, but I got stuck and had to look into the documentation for proper way of doing things, however that turned out to be impossible. OrientDB doesn’t have an installator package for Windows and it wasn’t clear how to get to the graphic console, but I must admit, I’m spoiled by large corporations (while OrientDB is an open source tool which is built by a community of enthusiasts). I tried the other way to get started, and I have enjoyed it much more! OrientDB has a free course on Udemy. I have watched a bunch of videos on how to run the instance, how to get to the graphic console, concepts and query language syntax before the next try. The course has high educational quality and is quite inspirational. After that I could easily start again, felling armed to be able to query anything.

Screenshot of the schema tab of MovieRatings database

The interface is quite similar to one the Neo4J has. However, in OrientDB graphical representation of graph (the best it could be described by “bubbles”) is not the default view of the data, but rather a separate tab in the menu. I had a very positive experience with it, everything is adjustable in just one click. Another tab is “Schema”, which is quite handy when you explore a new database, e.g. one of many available example databases.

As I have mentioned, query language is SQL-like with extra commands available. It took me just five minutes to write my first query to get the number of female and male users with occupation title starting with “exec” (implying “executive”) who have rated the “Life of Brian” movie, which I address by database-wide unique id. In this query I used SELECT and TRAVERSE commands which were mentioned in the course, but later I found that MATCH is also supported, which is a command used by Cypher language (Neo4J) and SQL Server graph.

Much more males on executive positions than females of the same occupation rated the movie

I can’t say it was all just positive and smooth experience. Graphic console is web-based, and a session time is limited. You have queried the data, then switched to some other window, then you returned to the browser where you have OrientDB console open and typed a new query. You hit the “Run” button to execute it, and instead of getting the results you are redirected to the logon screen; also your query is gone! I’ve used web-based interfaces before, they use a pop-up window over the screen to notify users when the session has expired and they need to reconnect. I believe OrientDB developers should consider implementing this feature. Also a code completion aid would be a nice thing to have. It is not possible to know all classes and properties names by heart, and switching between the “Browse” tab, where you type the query, and “Schema” tab, where you can check the names, (surprise!) clears the query you type. The “Browse” screen shows the result of the last 10 commands execution, so, I guess, you can execute a few simple SELECT * from all the involved classes you are planning to query to have a list of all properties before your eyes when typing.

OrientDB has an ETL tool, although it doesn’t look like a simple tool to use, it’s all config based. I believe, it wouldn’t be too hard to establish a regular data load into the database, either by pulling data from the source with this tool, or by pushing from other ETL tool, for example, using REST API.

Security model is very simple, OrientDB has users and roles with a bunch of permissions (e.g. update, read, delete) which could be granted or revoked to a group of objects (e.g. database, function) or a specific object (e.g. a particular class). Also there’s a row-level security in place.

Why Use OrientDB?

If you have a team of analysts, they will love it. Graph database is a great tool to do investigative-type analysis, like fraud detection, finding ways to improve user experience or master data management. However, as I have mentioned in my blog about SQL Server graph, switching to the new platform could be a huge change, so the multi-model database is more preferable for integration with the existing data sources. Document store allows setting up the satisfactory level of structure: from no structure to the copy of the relational database. All the familiar technologies and very flexible query language makes the entry into OrientDB databases quite easy for people with any background.

OrientDB Community edition is free even for commercial use. It supports High Availability replication and sharding, which allows you to have a proper production database still for free. Enterprise, i.e. paid, edition includes all the same functionality, but also monitoring and support. Therefore, one can give it a go with the minimal cost.

Kate
Data masseuse

 

Kate writes technical blogs about data warehouses, including data vault.  If you want to know more about data vault, you can read our blogs on the topic, or look into becoming data vault certified!

%d bloggers like this: