Graph Database Technologies: Neo4j and Cypher Query Language

I. Neo4j

Neo4j is an open-source graph database implemented in Java, its data are stored in graphs rather than in tables. This way it allows more efficient visualization and analysis for graph-based relationships. Neo4j was first created in Sweden in 2007, and now widely used for many customer-based websites (see http://neo4j.com/customers/ for a list).

neo technology

(picture from: http://www.slideshare.net/emileifrem/an-intro-to-neo4j-and-some-use-cases-jfokus-2011)

Graph chart is useful for many cases:

  • model social network with nodes and relationships, and properties on both.
  • calculate the shortest path between two nodes. For example, the websites are nodes, the links between them are edges, and the calculation tells us how many links we need from one website to another website.
  • find nodes with similar activities patterns. For example, an online dating website can use this attribute to find people with same in-degree and out-degree for certain activities.
  • match complicated relationship. For example, we want to find all the Youtube users with a specific register date who operated on a certain video channel which is subscribed by more than 10M users.

match complicated relationship

(picture from Neo4j 8/14/2014 webinar)

See more use cases, please visit http://neo4j.com/use-cases/

Advantages of Neo4j:

We can see that in traditional SQL DB environment the above queries could take a lot of code and a lot of memory to calculate. But in Neo4j and Cypher Language, minimum code and optimized performance could be achieved.

If you know a little bit of computer systems, you’ll also understand that Neo4j is a system-friendly and optimized product.

In Neo4j, nodes and relationships are stored in separate files. Each node is stored as constant length record with pointer to first property and first relationship. Each relationship is stored as constant length record with pointer to previous and next relationship. This constant length record allows fast look-up. (from Stanford CS145 summer class)

properties <– Neo4j server web interface

II. Cypher

So what is Cypher? 

Cypher is a declarative graph query language for Neo4j. It’s a SQL-like language, but allows for more expressive and efficient querying and updating of the graph store.

The syntax is very intuitive:

 MATCH (Person { name:'Charlie Sheen' })-[:ACTED_IN]-(movie:Movie)
   RETURN movie

In the above we find all the movies acted by actors (Person) with the name “Charlie Sheen”.

Below example enables us to find all nodes which have no more than three layers of relationships between them:

MATCH p = shortestPath (( a ) -[*..3] -( b ) )
where a <> b
RETURN a . name , b . name , length ( p )

Does this remind you of LinkedIn?

See more powerful functions of Cypher, refer to the reference card. bit.ly/cypher-refcard

There’re also advanced track where people do more powerful and creative stuff. Below is in a Graph Database meetup in Chicage where the presenter showed how to hook up to one of the social networks (facebook, twitter, linkedin) and import profiles and relationships in to your graph.

event_148634002

Looking forward to more exciting development in graph database.

Leave a comment