The use of graphs in databases: Advantages and options
I learned about databases (DB) during a specific course in my graduate program. I am talking about the year 2011 or 2012, I need to check the transcript to be more exact. At that time the consensus of using databases was very clear. The more traditional option of course that I learned was the Structured Query Language or SQL, also known as relational databases. There I learned how to design a good database schema, for those not aware about it, a database schema is nothing more about the skeleton of the DB, such as how many and what tables we will have in there, how they are connected between each other using relations. Also, I learned how to feed the database with data and how to query after the database is populated. Mostly the option that we used in the course was PostgreSQL, but there are also other options such as MySQL.
During my whole career until now I have been using SQL to solve my DB problems, until the beginning of the year. In my new job, my supervisor presented a new type of database structure using graphs. Just a little bit of context, graphs are a way to represent things and how they are connected with each of these things. The first usage of a graph was performed by the mathematician Leonhard Euler in the 18th century to solve the problem of bridges in the city of Königsberg. He reduced the city into a draw of vertices (each region of the city) and edges (each bridge that should be constructed). That way to represent relationships is very used in biology, and more in molecular biology to represent the interaction between genes for example.Bellow you can see a figure of gene interactions network using graphs in a figure from the first paper that I published (Wajnberg et al., 2015).
GraphQL the first graph database solution was developed by facebook in 2012, interesting that I was learning about SQL databases at that time, however it was open-sourced only in 2015. So basically people started to use more after 2015. What are the advantages? So let's talk about what are the difficulties that SQL language can bring. First in the schema: To connect two or more tables we will need to add extra columns in them that will help them to connect to each other, such as those called primary keys, which are unique Id's for each row, in addition to that we need foreign keys as the example bellow:
diagram created at app.diagrams.net
For the tables to be connected they are using the primary keys from the previous table as a foreign key, so we need to be cautions when we create the table to add this foreign keys. In the case of graphql is more fluid! We don't need foreign keys, and many times even we don't need to add schema previously. We just use JSON files with indentation separating the relationship, like the example below using the same entities from the SQL example:
JSON format of the same schema from the SQL example, real data removed to privacy protection
graph representation of the data, with names erased
In addition to the schema design, the insert of data is also simpler, while in SQL we still need to keep in mind to always link which Sample_id will be in the next table, and which food_id will be in the next table, using graphs you don't need to worry with that, so it is less headaches. And the something happens to the query where in SQL we will need to INNER JOIN tables, while using graphs the query is simpler with less just mentioning which vertices names we want to see in the result in a certain query.
Graph database options
print screen from dgraph webpage
DGraph uses graphQL syntax and also their own syntax called DQL, which has more shortcuts. However, for some reason DQL didn't become so popular, so most of the people who uses Dgraph nowadays only use graphQL syntax. You can check it out more in here . They have a cloud instance which is paid, like any other cloud instance. But it is free to run in any computer or server. There are clients for python and nodeJS of course.
print screen of my azure portal login
Microsoft of course invested a bit into graphql, and using the Gremlin syntax, added this available in their Azure portal, a interesting option to build your database using graphs. Gremlin syntax is a bit different because it doesn't use JSON format, it has something similar to a command line instead for data insertion and queries.
There are other options such Neo4J or implementing GraphQL syntax in your own script for example in this amazing tutorial by The Net Ninja, which I recommend all his tutorials, I learned to react app with his tutorials. Bellow you can find the initial introduction of a free course that he shows how to integrate GraphQL and a react app.
Graphs are an interesting way to implement your database, and it is worth taking a look at, big companies with big data are using it, so it is worth giving a look and a chance. I hope that this article opens the eyes of people willing to learn more about it. Like I read once in here, devs should be in continuing education and updating all the time.