Learning Cypher : Summary

learningcypher

Cypher is a query language for Neo4j graph database. The basic model in Neo4j can be described as

Each node can have a number of relationships with other nodes
Each relationship goes from one node either to another node or to the same node
Both nodes and relationships can have properties, and each property has a name and a value

Cypher was first introduced in Nov 2013 and since then the popularity of graph databases as a category has taken off. The following visual shows the pivotal moment:

cypherintro

Looking at the popularity of Cypher, Neo4j was made open source in October 2015. Neo4j founders claim that the rationale behind the decision was that a common query syntax could be followed across all the graph databases. Cypher provides a declarative syntax, which is readable and powerful and a rich set of graph patterns can be recognized in a graph.

Via Neo4j’s blog:

Cypher is the closest thing to drawing on a white board with a keyboard. Graph databases are whiteboard friendly; Cypher makes them keyboard friendly.

Given that Cypher has become open source and has the potential to become the de facto standard in graph database segment, it becomes important for anyone working with graph data to have a familiarity with the syntax. Since the syntax looks like SQL syntax, has some pythonic element to the query formulation, it can be easily picked up by reading a few articles on it. Do you really need a book for it ? Not necessarily. Having said that, this book reads like a long tutorial and is not dense. It might be worth one’s time to read this book to get a nice tour of various aspects of Cypher.

Chapter 1 : Querying Neo4j effectively with Pattern Matching

Querying a graph database using API is usually very tedious. I have had this experience first hand while working on a graph database that had ONLY API interface to obtain graph data. SPARQL is a relief in such situations but SPARQL has a learning curve. I would not call it steep, but the syntax is a little different and one needs to get used to thinking in triples, for writing effective SPARQL queries. Writing effective SPARQL queries entails thinking in subject-predicate-object terms. Cypher on the other hand is a declarative query language, i.e. it focuses on the aspects of the result rather than on methods or ways to get the result. Also it is human-readable and expressive

The first part of the chapter starts with instructions to set up a new Neo4j instance. Neo4j server can be run as a standalone machine with the client making API calls OR can be run as an embedded component in an application. For learning purpose, working with standalone server is the most convenient option as you have a ready console to test out sample queries. The second part of the chapter introduces a few key elements of Cypher such as

MATCH
RETURN
() for nodes
[] for relations
-> for directions
– for choosing bidirectional relations
Filtering matches via specifying node labels and properties
Filtering relationships via specifying relationship labels and properties
OPTIONAL to match optional paths
Assigning the entire paths to a variable
Passing parameters to Cypher queries
Using built in functions such as allShortestPaths
Matching paths that connect nodes via a variable number of hops

Chapter 2 : Filter, Aggregate and Combine Results

This chapter introduces several Cypher statements that can be used to extract summary statistics of various nodes and relationships in a graph. The following are the Cypher keywords explained in this chapter

WHERE for text and value comparisons
IN to filter based on certain values
“item identifier IN collection WHERE rule” pattern that can be used to work with collections. This pattern is similar to list comprehension in python
LIMIT and SKIP for pagination purposes. The examples do not use ORDER BY which is crucial for obtaining paginated results
SORT
COALESCE function to work around null values
COUNT(*) and COUNT(property value) - Subtle difference between the two is highlighted
math functions like MIN, MAX, AVG
COLLECT to gather all the values of properties in a certain path pattern
CASE WHEN ELSE pattern for conditional expressions
WITH to separate query parts
UNION and UNION ALL

Chapter 3 : Manipulating the Database

This chapter talks about Create, Update and Delete operations on various nodes and relations. The Cypher keywords explained in the chapter are

CREATE used to create nodes, relationships and paths
CREATE UNIQUE
SET for changing properties and labels
MERGE to check for an existing pattern and create the pattern if it does not exist in the database
MERGE SET and MERGE CREATE for setting properties during merge operations
REMOVE for removing properties and labels
DELETE
FOREACH pattern to loop through nodes in a path

By the end of this chapter, any reader should be fairly comfortable in executing CRUD queries. The queries comprise three phases

READ : This is the phase where you read data from the graph using MATCH, OPTIONAL, and MATCH clauses
WRITE : This is the phase where you modify the graph using CREATE, MERGE, SET and all other clauses
RETURN : This is the phase where you choose what to return to the caller

Improving Performance

This chapter mentions the following guidelines for creating queries in Neo4j :

Use Parametrized queries: Wherever possible, write queries with parameters that allows engine to reuse the execution of the query. This takes advantage of the fact the Neo4j engine can cache the query
Avoid unnecessary clauses such as DISTINCT based on the background information of the graph data
Use direction wherever possible in match clauses
Use a specific depth value while searching for varying length paths
Profile queries so that the server does not get inundated by inefficient query construction
Whenever there is large number of nodes belonging to a certain label, it is better to create index. In fact while importing a large RDF it is always better to create indices on certain types of nodes.
Use constraints if you are worried about property redundancy

Chapter 4 : Migrating from SQL

The chapter talks about various tasks involved in migrating data from a RDBMS to a graph database. There are three main tasks in migrating from SQL to a graph data base :

Migrating the schema from RDBMS to Neo4j
Migrating the data from tables to Neo4j
Migrating queries to let your application continue working

It is better to start with an ER diagram that is close to the white-board representation of the data. Since graph databases can closely represent a white-board than the Table structure mess(primary key, foreign key, cardinality), one can quickly figure out the nodes and relationships needed for the graph data. For migrating the actual data, one needs to import the data in to relevant CSV and load the CSV in to Neo4j. The structure of various CSV files to be generated depends on the labels, nodes, relationships of the graph database schema. Migrating queries from RDMBS world in to graph database world is far more easier as Cypher is a declarative syntax. It is far quicker to code the various business requirement queries using Cypher syntax.

Chapter 5 : Operators and Functions

The last section of the book contains a laundry list of operators and functions that one can use in creating a Cypher query. It is more like a cheat sheet but with elaborate explanation of various Cypher keywords

Takeaway

This book gives a quick introduction to all the relevant keywords needed to construct a Cypher query. In fact it is fair to describe the contents of the book as a long tutorial with sections and subsections that can quickly bring a Cypher novice up to speed.