Gremlin vs Cypher Initial Thoughts @Neo4j

Hi,

The Neo4jClient now supports Cypher as a query language with Neo4j. However I noticed the following:

  • Simple graph traversals are much more efficient when using Gremlin
  • Queries in Gremlin are 30-50% faster for simple traversals
  • Cypher is ideal for complex traversals where back tracking is required
  • Cypher is our choice of query language for reporting
  • Gremlin is our choice of query language for simple traversals where projections are not required
  • Cypher has intrinsic table projection model, where Gremlins table projection model relies on AS steps which can be cumbersome when backtracking e.g. Back(), As() and _CopySplit, where cypher is just comma separated matches
  • Cypher is much better suited for outer joins than Gremlin, to achieve similar results in gremlin requires parallel querying with CopySplit, where as in Cypher using the Match clause with optional relationships
  • Gremlin is ideal when you need to retrieve very simple data structures
  • Table projection in gremlin can be very powerful, however outer joins can be very verbose

So in a nutshell, we like to use Cypher when we need tabular data back from Neo4j and is especially useful in outer joins.

Here are two queries that return the exact same data from Neo4j, one in Cypher and one in Gremlin.

Cypher Report Query

var resultSet = graphClient.RootNode
                .StartCypher("root")
                .Match(@"root-[:HOSTS]->(agency)
                       <-[:USER_BELONGS_TO]-(user)-[:USER_LINKED_TO_PROGRAM]
                       ->(program)
                       <-[:HAS_PROGRAM]-(centre),
                       (program)<-[:HAS_SUGGESTED_PROGRAM]-(referralDecisionsSection)
                       <-[:REFERRAL_HAS_DECISIONS_SECTION]-(referral)-[:CREATED_BY]
                       ->(createdByUser), (referral)-[:REFERRAL_HAS_WHO_SECTION]
                       ->(whoSection)-[:HAS_PARTICIPANT]->(participant)")
                .Where<Agency>(agency => agency.Key == userIdentifier.AgencyKey)
                .And()
                .Where<User>(user => user.Username == userIdentifier.Username)
                .And()
                .Where<Referral>(referral => referral.Completed == false)
                .Return((user, program, centre, createdByUser, referral, whoSection, participant) => 
                new ReferralByGroup
                {
                    UserFamilyName = createdByUser.As<User>().FamilyName,
                    UserGivenName = createdByUser.As<User>().GivenName,
                    Program = program.As<Program>().Name,
                    Centre = centre.As<Centre>().Name,
                    ReferralId = referral.As<Referral>().UniqueId,
                    ReferralDate = whoSection.As<ReferralWhoSection>().ReferralDate,
                    ParticipantName = participant.As<ReferralParticipant>().Name,
                    ParticipantDisplayOrder = participant.As<ReferralParticipant>().DisplayOrder,
                })
                .Results
                .ToArray();

Gremlin Report Query using Table Projections

            var resultSet = graphClient
                .RootNode
                .Out<Agency>(Hosts.TypeKey, a => a.Key == userIdentifier.AgencyKey)
                .In<User>(UserBelongsTo.TypeKey, u => u.Username == userIdentifier.Username)
                .Out<Program>(UserLinkedToProgram.TypeKey)
                .As("Program")
                .In<Centre>(HasProgram.TypeKey)
                .As("Centre")
                .BackV<Program>("Program")
                .In<ReferralDecisionsSection>(HasSuggestedProgram.TypeKey)
                .In<Referral>(ReferralHasDecisionsSection.TypeKey, r => r.Completed == false)
                .As("ReferralId")
                .Out<User>(CreatedBy.TypeKey)
                .As("UserGivenName")
                .As("UserFamilyName")
                .BackV<Referral>("ReferralId")
                .Out<ReferralWhoSection>(ReferralHasWhoSection.TypeKey)
                .As("ReferralDate")
                .Out<ReferralParticipant>(HasParticipant.TypeKey)
                .As("ParticipantDisplayOrder")
                .As("ParticipantName")
                .Table
                <ReferralByGroup, Program, Centre, Referral, User, User, ReferralWhoSection, ReferralParticipant,
                    ReferralParticipant>(
                        program => program.Name,
                        centre => centre.Name,
                        referral => referral.UniqueId,
                        user => user.FamilyName,
                        user => user.GivenName,
                        who => who.ReferralDate,
                        participant => participant.Name,
                        participant => participant.DisplayOrder
                )
                .ToArray();

Below is the converted parameterised script sent for cypher and gremlin respectively for those not familiar with the Neo4jClient.

Cypher

START root=node({p8})
MATCH root-[:HOSTS]->(agency)
                       <-[:USER_BELONGS_TO]-(user)-[:USER_LINKED_TO_PROGRAM]->(program)
                       <-[:HAS_PROGRAM]-(centre),
                       (program)<-[:HAS_SUGGESTED_PROGRAM]-(referralDecisionsSection)
                       <-[:REFERRAL_HAS_DECISIONS_SECTION]-(referral)-[:CREATED_BY]
                       ->(createdByUser), (referral)-[:REFERRAL_HAS_WHO_SECTION]
                       ->(whoSection)-[:HAS_PARTICIPANT]
                       ->(participant)
WHERE (agency.Key? = {p0}) AND (user.Username? = {p1}) AND (referral.Completed? = {p2})
RETURN createdByUser.FamilyName? AS UserFamilyName, createdByUser.GivenName? AS UserGivenName, program.Name? AS Program, centre.Name? AS Centre, referral.UniqueId? AS ReferralId, whoSection.ReferralDate? AS ReferralDate, participant.Name? AS ParticipantName, participant.DisplayOrder? AS ParticipantDisplayOrder

Gremlin

g.v(p0)
.out(p1).filter{ it[p2].equalsIgnoreCase(p3) }
.in(p4).filter{ it[p5].equalsIgnoreCase(p6) }
.out(p7).as(p8).in(p9).as(p10).back(p11)
.in(p12).in(p13).filter{ it[p14] == p15 }.as(p16)
.out(p17).as(p18).as(p19).back(p20)
.out(p21).as(p22).out(p23).as(p24).as(p25)
.table(new Table()){it[p26]}{it[p27]}{it[p28]}{it[p29]}{it[p30]}{it[p31]}{it[p32]}{it[p33]}
.cap

I have included below the non-paramerterised cypher and gremlin query respectively.

Cypher

START root=node(0)
MATCH root-[:HOSTS]->(agency)<-[:USER_BELONGS_TO]-(user)-[:USER_LINKED_TO_PROGRAM]
->(program)
<-[:HAS_PROGRAM]-(centre),(program)
<-[:HAS_SUGGESTED_PROGRAM]-(referralDecisionsSection)
<-[:REFERRAL_HAS_DECISIONS_SECTION]-(referral)-[:CREATED_BY]->(createdByUser), (referral)-[:REFERRAL_HAS_WHO_SECTION]
->(whoSection)-[:HAS_PARTICIPANT]
->(participant)   WHERE (agency.Key? = romikoagency) AND (user.Username? = romiko.derbynew) AND (referral.Completed? = false)   
RETURN createdByUser.FamilyName? AS UserFamilyName, createdByUser.GivenName? AS UserGivenName, program.Name? AS Program, centre.Name? AS Centre, referral.UniqueId? AS ReferralId, whoSection.ReferralDate? AS ReferralDate, participant.Name? AS ParticipantName, participant.DisplayOrder? AS ParticipantDisplayOrder

Gremlin

g.v('0').out('HOSTS').filter{ it['Key'].equalsIgnoreCase('romikoagency') }
.in('USER_BELONGS_TO').filter{ it['Username'].equalsIgnoreCase('romiko.derbynew') }
.out('USER_LINKED_TO_PROGRAM').as('Program')
.in('HAS_PROGRAM').as('Centre').back('Program')
.in('HAS_SUGGESTED_PROGRAM')
.in('REFERRAL_HAS_DECISIONS_SECTION').filter{ it['Completed'] == false }.as('ReferralId')
.out('CREATED_BY').as('UserGivenName').as('UserFamilyName').back('ReferralId')
.out('REFERRAL_HAS_WHO_SECTION').as('ReferralDate')
.out('HAS_PARTICIPANT').as('ParticipantDisplayOrder').as('ParticipantName')
.table(new Table()){it['Name']}{it['Name']}{it['UniqueId']}{it['FamilyName']}{it['GivenName']}{it['ReferralDate']}{it['Name']}{it['DisplayOrder']}.cap

#Neo4j Neo4jClient Cypher support added

Cypher support has now been added to the Neo4jClient

The Neo4jClient also has a built in custom Cypher REST result deserializer, so you do not need to worry about all the serialization/deserialization logic, courtesy to Tatham Oddie.

Below are some sample queries to get you going.

You can look at the Source Code for other code samples in the Test project.

Simple Query from API

            return graphClient.RootNode
                .StartCypher("root")
                .Match("root-[:BELONGS]->(user)")
                .Return<SimpleResultDto>("user")
                .Results
                .OrderBy(u => u.Username);
Simple Query

        [Test]
        public void WhereBooleanOperations()
        {
            // http://docs.neo4j.org/chunked/1.6/query-where.html#where-boolean-operations
            // START n=node(3, 1)
            // WHERE (n.age < 30 and n.name = "Tobias") or not(n.name = "Tobias")
            // RETURN n

            var client = Substitute.For<IGraphClient>();
            var query = new CypherFluentQuery(client)
                .Start("n", (NodeReference)3, (NodeReference)1)
                .Where<FooNode>(n => (n.Age < 30 && n.Name == "Tobias") || n.Name != "Tobias")
                .Return<object>("n")
                .Query;

            Assert.AreEqual("START n=node({p0}, {p1})\r\nWHERE (((n.Age < {p2}) AND (n.Name = {p3})) OR (n.Name != {p4}))\r\nRETURN n".Replace("'","\""), query.QueryText);
            Assert.AreEqual(3, query.QueryParameters["p0"]);
            Assert.AreEqual(1, query.QueryParameters["p1"]);
            Assert.AreEqual(30, query.QueryParameters["p2"]);
            Assert.AreEqual("Tobias", query.QueryParameters["p3"]);
            Assert.AreEqual("Tobias", query.QueryParameters["p4"]);
        }
Simple Query Column Aliases
 [Test]
        public void ReturnColumnAlias()
        {
            // http://docs.neo4j.org/chunked/1.6/query-return.html#return-column-alias
            // START a=node(1)
            // RETURN a.Age AS SomethingTotallyDifferent

            var client = Substitute.For<IGraphClient>();
            var query = new CypherFluentQuery(client)
                .Start("a", (NodeReference)1)
                .Return(a => new ReturnPropertyQueryResult
                {
                    SomethingTotallyDifferent = a.As<FooNode>().Age
                })
                .Query;

            Assert.AreEqual("START a=node({p0})\r\nRETURN a.Age AS SomethingTotallyDifferent", query.QueryText);
            Assert.AreEqual(1, query.QueryParameters["p0"]);
        }

Below is deserialization code, note this is Internal to the client and you would never explicitly call it directly, you would always use one of the Return overloads instead. I have put it here for those coders interested in deserialzing Cypher Rest results from Neo4j using expressions.

Deserialization Test Sample
        [Test]
        public void ShouldDeserializeTableStructureWithRelationships()
        {
            // Arrange
            const string queryText = @"
                START x = node({p0})
                MATCH x-[r]->n
                RETURN x AS Fooness, type(r) AS RelationshipType, n.Name? AS Name, n.UniqueId? AS UniqueId
                LIMIT 3";
            var query = new CypherQuery(
                queryText,
                new Dictionary<string, object>
                {
                    {"p0", 123}
                });

            var httpFactory = MockHttpFactory.Generate("http://foo/db/data", new Dictionary<RestRequest, HttpResponse>
            {
                {
                    new RestRequest
                    {
                        Resource = "/",
                        Method = Method.GET
                    },
                    new HttpResponse
                    {
                        StatusCode = HttpStatusCode.OK,
                        ContentType = "application/json",
                        Content =
                            @"{
                                'cypher' : 'http://foo/db/data/cypher',
                                'batch' : 'http://foo/db/data/batch',
                                'node' : 'http://foo/db/data/node',
                                'node_index' : 'http://foo/db/data/index/node',
                                'relationship_index' : 'http://foo/db/data/index/relationship',
                                'reference_node' : 'http://foo/db/data/node/0',
                                'extensions_info' : 'http://foo/db/data/ext',
                                'extensions' : {
                                'GremlinPlugin' : {
                                    'execute_script' : 'http://foo/db/data/ext/GremlinPlugin/graphdb/execute_script'
                                }
                                }
                            }".Replace('\'', '"')
                    }
                },
                {
                    new RestRequest
                    {
                        Resource = "/cypher",
                        Method = Method.POST,
                        RequestFormat = DataFormat.Json
                    }.AddBody(new CypherApiQuery(query)),
                    new HttpResponse
                    {
                        StatusCode = HttpStatusCode.OK,
                        ContentType = "application/json",
                        Content =
                            @"{
                                'data' : [ [ {
                                'start' : 'http://foo/db/data/node/0',
                                'data' : {
                                    'Bar' : 'bar',
                                    'Baz' : 'baz'
                                },
                                'property' : 'http://foo/db/data/relationship/0/properties/{key}',
                                'self' : 'http://foo/db/data/relationship/0',
                                'properties' : 'http://foo/db/data/relationship/0/properties',
                                'type' : 'HAS_REFERENCE_DATA',
                                'extensions' : {
                                },
                                'end' : 'http://foo/db/data/node/1'
                                }, 'HOSTS', 'foo', 44321 ], [ {
                                'start' : 'http://foo/db/data/node/1',
                                'data' : {
                                    'Bar' : 'bar',
                                    'Baz' : 'baz'
                                },
                                'property' : 'http://foo/db/data/relationship/1/properties/{key}',
                                'self' : 'http://foo/db/data/relationship/1',
                                'properties' : 'http://foo/db/data/relationship/1/properties',
                                'type' : 'HAS_REFERENCE_DATA',
                                'extensions' : {
                                },
                                'end' : 'http://foo/db/data/node/1'
                                }, 'LIKES', 'bar', 44311 ], [ {
                                'start' : 'http://foo/db/data/node/2',
                                'data' : {
                                    'Bar' : 'bar',
                                    'Baz' : 'baz'
                                },
                                'property' : 'http://foo/db/data/relationship/2/properties/{key}',
                                'self' : 'http://foo/db/data/relationship/2',
                                'properties' : 'http://foo/db/data/relationship/2/properties',
                                'type' : 'HAS_REFERENCE_DATA',
                                'extensions' : {
                                },
                                'end' : 'http://foo/db/data/node/1'
                                }, 'HOSTS', 'baz', 42586 ] ],
                                'columns' : [ 'Fooness', 'RelationshipType', 'Name', 'UniqueId' ]
                            }".Replace('\'', '"')
                    }
                }
            });
            var graphClient = new GraphClient(new Uri("http://foo/db/data"), httpFactory);
            graphClient.Connect();

            //Act
            var results = graphClient.ExecuteGetCypherResults<ResultWithRelationshipDto>(query);

            //Assert
            Assert.IsInstanceOf<IEnumerable<ResultWithRelationshipDto>>(results);

            var resultsArray = results.ToArray();
            Assert.AreEqual(3, resultsArray.Count());

            var firstResult = resultsArray[0];
            Assert.AreEqual(0, firstResult.Fooness.Reference.Id);
            Assert.AreEqual("bar", firstResult.Fooness.Data.Bar);
            Assert.AreEqual("baz", firstResult.Fooness.Data.Baz);
            Assert.AreEqual("HOSTS", firstResult.RelationshipType);
            Assert.AreEqual("foo", firstResult.Name);
            Assert.AreEqual(44321, firstResult.UniqueId);

            var secondResult = resultsArray[1];
            Assert.AreEqual(1, secondResult.Fooness.Reference.Id);
            Assert.AreEqual("bar", secondResult.Fooness.Data.Bar);
            Assert.AreEqual("baz", secondResult.Fooness.Data.Baz);
            Assert.AreEqual("LIKES", secondResult.RelationshipType);
            Assert.AreEqual("bar", secondResult.Name);
            Assert.AreEqual(44311, secondResult.UniqueId);

            var thirdResult = resultsArray[2];
            Assert.AreEqual(2, thirdResult.Fooness.Reference.Id);
            Assert.AreEqual("bar", thirdResult.Fooness.Data.Bar);
            Assert.AreEqual("baz", thirdResult.Fooness.Data.Baz);
            Assert.AreEqual("HOSTS", thirdResult.RelationshipType);
            Assert.AreEqual("baz", thirdResult.Name);
            Assert.AreEqual(42586, thirdResult.UniqueId);
        }
    }
Conclusion

The Neo4jClient now supports both Gremlin and Cypher query language in one logical graphClient, this should prove to be sufficient for all graph client query needs and CRUD operations, we now get the best of both worlds. You have intrinsic Neo4j CRUD + Gremlin + Cypher.

We find a balance where Gremlin is used for simple Graph Traversals and Cypher is used as our reporting tool.

#Neo4j Gremlin queries with CopySplit/table leveraging Neo4jClient

Hi,

I would like to share gremlin querying using the .Net Neo4jClient.

Consider the following graph

image

The object is to produce a table of results that shows

  • ReferralDate (ReferralDecisionSection Node)
  • ReferralId (Referral Node)
  • FamilyName (User Node)
  • GivenName (User Node)
    The trick is we want to get all referrals but we also what referrals that do not have a who section, so the ReferralDate will be NULL. We also want to get referrals that are indirectly linked to a program (via a decision) but we also want the ones that are not indirectly linked to a program
ReferralDate ReferralId FamilyName GivenName
13 Jan 2012 1 Derbynew Romiko
NULL 2 Derbynew Romiko

So, what we doing is essentially left/right joins on ReferralNode, ReferralDecisionNode and Program.

Lets see how we can do this in .Net Neo4jClient

 return graphClient
                .RootNode
                .CopySplitV<Referral>((
                    new IdentityPipe()
                        .Out(Hosts.TypeKey, a => a.Key == userIdentifier.AgencyKey)
                        .In(UserBelongsTo.TypeKey, u => u.Username == userIdentifier.Username)
                        .Out(UserLinkedToProgram.TypeKey, p => p.Name == "Foundation")
                        .In(HasSuggestedProgram.TypeKey)
                        .In(ReferralHasDecisionsSection.TypeKey, r => r.Completed == false)
                        .AggregateV("ReferralWithProgramFoundation"),
                    new IdentityPipe()
                        .Out(Hosts.TypeKey, a => a.Key == userIdentifier.AgencyKey)
                        .In(ReferralBelongsTo.TypeKey, r => r.Completed == false)
                )
                .FairMerge()
                .ExceptV("ReferralWithProgramFoundation")
                .GremlinDistinct()
                .Out(ReferralHasWhoSection.TypeKey)
                .As("ReferralDate")
                .In(ReferralHasWhoSection.TypeKey)
                .As("ReferralId")
                .Out(CreatedBy.TypeKey, u => u.Username == userIdentifier.Username)
                .As("UserGivenName")
                .As("UserFamilyName")
                .Table(
                    who => who.ReferralDate,
                    referral => referral.UniqueId,
                    user => user.FamilyName,
                    user => user.GivenName
                );

Notice the following

  • CopySplit uses a concept of an identity pipe as a continuation of the previous output
  • CopySplit will execute the two queries in parallel
  • We are getting all the referrals in the system and then we are getting all the referrals in the system that have a program “Foundation”
  • We then store the referrals that have a program (“Foundation”) in a aggregate (variable)
  • We merge the parallel query results together with a FaireMerge
  • We exclude referrals that in a a Program called “Foundation”  with an Except
  • We then deduplicate results with GremlinDistinct
  • We then use AS to mark areas we need for table projections

Note: Using the AS clause within a CopySplit pipe in conjunction with table projections will produce undesired results, I am not sure if Gremlin supports such operations, if you know, please contact me.

Visit Marko Rodriguez for in depth discussions on Gremlin.