Lucene Full Text Indexing with Neo4j

Hi Guys,

I spent some time working on full text search for Neo4j. The basic goals were as follows.

    • Control the pointers of the index
    • Full Text Search
    • All operations are done via Rest
    • Can create an index when creating a node
    • Can update and index
    • Can check if an index exists
    • When bootstrapping Neo4j in the cloud run Index checks
    • Query Index using full text search lucene query language.
Download:
This is based on Neo4jClient:
Source Code at:

Introduction

So with the above objectives, I decided to go with Manual Indexing. The main reason here is that I can put an index pointing to node A based on values in node B.

Imagine the following.

You have Node A with a list:

Surname, FirstName and MiddleName. However Node A also has a relationship to Node B which has other names, perhaps Display Names, Avatar Names and AKA’s.

So with manual indexing, you can have all the above entries for names in Node A and Node B point to Node A only.

So, in a Rest call to the Neo4j server, it would look something like this in Fiddler.

image

Notice the following:

Url: http://localhost:7474/db/data/index/node/{IndexName}/{Key}/{Value}

So, if we were adding 3 names for the SAME client from 2 different nodes. You would have the same IndexName and Key then with different values in the Url. The node pointer (In the request body) will then be the address to the Node.

Neo4jClient Nuget Package

I have updated the Neo4jClient which is on Nuget, to now support:

  • Creating Exact or FullText Indexes on it’s own, so that it just exists
  • Creating Exact or FullTest indexes when creating a node, the node reference will automatically be calculated.
  • Updating an Index
  • Deleting entries from an index.
    Class diagram for the indexing solution in Neo4jClient.

image

RestSharp

The Neo4jClient package uses RestSharp, thus making all the index call operations a trivial task for us, so lets have a look at some of the code inside the client to see how to consume manual index api from .Net, and then in the next section well look how we consume this code from another application.

 public Dictionary<string, IndexMetaData> GetIndexes(IndexFor indexFor)
        {
            CheckRoot();

            string indexResource;
            switch (indexFor)
            {
                case IndexFor.Node:
                    indexResource = RootApiResponse.NodeIndex;
                    break;
                case IndexFor.Relationship:
                    indexResource = RootApiResponse.RelationshipIndex;
                    break;
                default:
                    throw new NotSupportedException(string.Format("GetIndexes does not support indexfor {0}", indexFor));
            }

            var request = new RestRequest(indexResource, Method.GET)
            {
                RequestFormat = DataFormat.Json,
                JsonSerializer = new CustomJsonSerializer { NullHandling = JsonSerializerNullValueHandling }
            };

            var response =  client.Execute<Dictionary<string, IndexMetaData>>(request);

            if (response.StatusCode != HttpStatusCode.OK)
                throw new NotSupportedException(string.Format(
                    "Received an unexpected HTTP status when executing the request.\r\n\r\n\r\nThe response status was: {0} {1}",
                    (int)response.StatusCode,
                    response.StatusDescription));

            return response.Data;
        }

        public bool CheckIndexExists(string indexName, IndexFor indexFor)
        {
            CheckRoot();

            string indexResource;
            switch (indexFor)
            {
                case IndexFor.Node:
                    indexResource = RootApiResponse.NodeIndex;
                    break;
                case IndexFor.Relationship:
                    indexResource = RootApiResponse.RelationshipIndex;
                    break;
                default:
                    throw new NotSupportedException(string.Format("IndexExists does not support indexfor {0}", indexFor));
            }

            var request = new RestRequest(string.Format("{0}/{1}",indexResource, indexName), Method.GET)
            {
                RequestFormat = DataFormat.Json,
                JsonSerializer = new CustomJsonSerializer { NullHandling = JsonSerializerNullValueHandling }
            };

            var response = client.Execute<Dictionary<string, IndexMetaData>>(request);

            return response.StatusCode == HttpStatusCode.OK;
        }

        void CheckRoot()
        {
            if (RootApiResponse == null)
                throw new InvalidOperationException(
                    "The graph client is not connected to the server. Call the Connect method first.");
        }

        public void CreateIndex(string indexName, IndexConfiguration config, IndexFor indexFor)
        {
            CheckRoot();

            string nodeResource;
            switch (indexFor)
            {
                case IndexFor.Node:
                    nodeResource = RootApiResponse.NodeIndex;
                    break;
                case IndexFor.Relationship:
                    nodeResource = RootApiResponse.RelationshipIndex;
                    break;
                default:
                    throw new NotSupportedException(string.Format("CreateIndex does not support indexfor {0}", indexFor));
            }

            var createIndexApiRequest = new
                {
                    name = indexName.ToLower(),
                    config
                };

            var request = new RestRequest(nodeResource, Method.POST)
                {
                    RequestFormat = DataFormat.Json,
                    JsonSerializer = new CustomJsonSerializer {NullHandling = JsonSerializerNullValueHandling}
                };
            request.AddBody(createIndexApiRequest);

            var response = client.Execute(request);

            if (response.StatusCode != HttpStatusCode.Created)
                throw new NotSupportedException(string.Format(
                    "Received an unexpected HTTP status when executing the request..\r\n\r\nThe index name was: {0}\r\n\r\nThe response status was: {1} {2}",
                    indexName,
                    (int) response.StatusCode,
                    response.StatusDescription));
        }

        public void ReIndex(NodeReference node, IEnumerable<IndexEntry> indexEntries)
        {
            CheckRoot();

            var nodeAddress = string.Join("/", new[] {RootApiResponse.Node, node.Id.ToString()});

            var updates = indexEntries
                .SelectMany(
                    i => i.KeyValues,
                    (i, kv) => new {IndexName = i.Name, kv.Key, kv.Value});

            foreach (var update in updates)
            {
                if (update.Value == null)
                    break;

                string indexValue;
                if(update.Value is DateTimeOffset)
                {
                    indexValue = ((DateTimeOffset) update.Value).UtcTicks.ToString();
                }
                else if (update.Value is DateTime)
                {
                    indexValue = ((DateTime)update.Value).Ticks.ToString();
                }
                else
                {
                    indexValue = update.Value.ToString();
                }

                AddNodeToIndex(update.IndexName, update.Key, indexValue, nodeAddress);
            }
        }

        public void DeleteIndex(string indexName, IndexFor indexFor)
        {
            CheckRoot();

            string indexResource;
            switch (indexFor)
            {
                case IndexFor.Node:
                    indexResource = RootApiResponse.NodeIndex;
                    break;
                case IndexFor.Relationship:
                    indexResource = RootApiResponse.RelationshipIndex;
                    break;
                default:
                    throw new NotSupportedException(string.Format("DeleteIndex does not support indexfor {0}", indexFor));
            }

            var request = new RestRequest(string.Format("{0}/{1}", indexResource, indexName), Method.DELETE)
            {
                RequestFormat = DataFormat.Json,
                JsonSerializer = new CustomJsonSerializer { NullHandling = JsonSerializerNullValueHandling }
            };

            var response = client.Execute(request);

            if (response.StatusCode != HttpStatusCode.NoContent)
                throw new NotSupportedException(string.Format(
                    "Received an unexpected HTTP status when executing the request.\r\n\r\nThe index name was: {0}\r\n\r\nThe response status was: {1} {2}",
                    indexName,
                    (int)response.StatusCode,
                    response.StatusDescription));
        }

        void AddNodeToIndex(string indexName, string indexKey, string indexValue, string nodeAddress)
        {
            var nodeIndexAddress = string.Join("/", new[] { RootApiResponse.NodeIndex, indexName, indexKey, indexValue });
            var request = new RestRequest(nodeIndexAddress, Method.POST)
            {
                RequestFormat = DataFormat.Json,
                JsonSerializer = new CustomJsonSerializer { NullHandling = JsonSerializerNullValueHandling }
            };
            request.AddBody(string.Join("", client.BaseUrl, nodeAddress));

            var response = client.Execute(request);

            if (response.StatusCode != HttpStatusCode.Created)
                throw new NotSupportedException(string.Format(
                    "Received an unexpected HTTP status when executing the request.\r\n\r\nThe index name was: {0}\r\n\r\nThe response status was: {1} {2}",
                    indexName,
                    (int)response.StatusCode,
                    response.StatusDescription));
        }

        public IEnumerable<Node<TNode>> QueryIndex<TNode>(string indexName, IndexFor indexFor, string query)
        {
            CheckRoot();

            string indexResource;

            switch (indexFor)
            {
                case IndexFor.Node:
                    indexResource = RootApiResponse.NodeIndex;
                    break;
                case IndexFor.Relationship:
                    indexResource = RootApiResponse.RelationshipIndex;
                    break;
                default:
                    throw new NotSupportedException(string.Format("QueryIndex does not support indexfor {0}", indexFor));
            }

            var request = new RestRequest(indexResource + "/" + indexName, Method.GET)
                {
                    RequestFormat = DataFormat.Json,
                    JsonSerializer = new CustomJsonSerializer {NullHandling = JsonSerializerNullValueHandling}
                };

            request.AddParameter("query", query);

            var response = client.Execute<List<NodeApiResponse<TNode>>>(request);

            if (response.StatusCode != HttpStatusCode.OK)
                throw new NotSupportedException(string.Format(
                    "Received an unexpected HTTP status when executing the request.\r\n\r\nThe index name was: {0}\r\n\r\nThe response status was: {1} {2}",
                    indexName,
                    (int) response.StatusCode,
                    response.StatusDescription));

            return response.Data == null
           ? Enumerable.Empty<Node<TNode>>()
           : response.Data.Select(r => r.ToNode(this));
        }
		

Using the Neo4jClient from within an application

Create an Index and check if it exists

This is useful when bootstrapping Neo4j, to see if there are any indexes that SHOULD be there and are not, so that you can enumerate all the nodes for that index and add entries.

public void CreateIndexesForAgencyClients()
        {
            var agencies = graphClient
                .RootNode
                .Out<Agency>(Hosts.TypeKey)
                .ToList();

            foreach (var agency in agencies)
            {
                var indexName = IndexNames.Clients(agency.Data);
                var indexConfiguration = new IndexConfiguration
                    {
                        Provider = IndexProvider.lucene,
                        Type = IndexType.fulltext
                    };

                if (!graphClient.CheckIndexExists(indexName, IndexFor.Node))
                {
                    Trace.TraceInformation("CreateIndexIfNotExists {0} for Agency Key {0}", indexName, agency.Data.Key);
                    graphClient.CreateIndex(indexName, indexConfiguration, IndexFor.Node);
                    PopulateAgencyClientIndex(agency.Data);
                }
            }
        }

Create an Index Node Entry when creating a node

 var indexEntries = GetIndexEntries(agency.Data, client, clientViewModel.AlsoKnownAses);

var clientNodeReference = graphClient.Create(
                client,
                new[] {new ClientBelongsTo(agencyNode.Reference)}, indexEntries);

public IEnumerable<IndexEntry> GetIndexEntries(Agency agency, Client client, IEnumerable<AlsoKnownAs> alsoKnownAses)
        {
            var indexKeyValues = new List<KeyValuePair<string, object>>
            {
                new KeyValuePair<string, object>(AgencyClientIndexKeys.Gender.ToString(), client.Gender)
            };

            if (client.DateOfBirth.HasValue)
            {
                var dateOfBirthUtcTicks = client.DateOfBirth.Value.UtcTicks;
                indexKeyValues.Add(new KeyValuePair<string, object>(AgencyClientIndexKeys.DateOfBirth.ToString(), dateOfBirthUtcTicks));
            }

            var names = new List<string>
            {
                client.GivenName,
                client.FamilyName,
                client.PreferredName,
            };

            if (alsoKnownAses != null)
            {
                names.AddRange(alsoKnownAses.Where(a => !string.IsNullOrEmpty(a.Name)).Select(aka => aka.Name));
            }

            indexKeyValues.AddRange(names.Select(name => new KeyValuePair<string, object>(AgencyClientIndexKeys.Name.ToString(), name)));

            return new[]
            {
                new IndexEntry
                {
                    Name = IndexNames.Clients(agency),
                    KeyValues = indexKeyValues.Where(v => v.Value != null)
                }
            };
        }
		

Reindex a node

Notice there was a call to PopulateAgencyClientIndexin in the code, this is done in our bootstrap to ensure indexes are always there as expected, and if for some reason they are not, then they created and populated by using reindex feature.

void PopulateAgencyClientIndex(Agency agency)
        {
            var clients = graphClient
                .RootNode
                .Out<Agency>(Hosts.TypeKey, a => a.Key == agency.Key)
                .In<Client>(ClientBelongsTo.TypeKey);

            foreach (var client in clients)
            {
                var clientService = clientServiceCallback();
                var akas = client.Out<AlsoKnownAs>(IsAlsoKnownAs.TypeKey).Select(a => a.Data);
                var indexEntries = clientService.GetIndexEntries(agency, client.Data, akas);
                graphClient.ReIndex(client.Reference, indexEntries);
            }
        }
		

Querying a full text search index using Lucene

Below is sample code to query full text search. Basically your index entries for a person with

Name: Bob, Surname:Van de Builder, Aka1: Bobby, Aka2: Bobs, PrefferedName: Bob The Builder

The index entries will need to look like the

Key:Value
Name: Bob
Name:Van
Name:de
Name: Builder
Name: Bobby
Name: Bobs

Remember, Lucene has a white space analyser, so any names with spaces MUST become a new index entry, so what we do is split out names based on whitespaces and this becomes our collection of IndexEntries. The above is related to full text search context.

Note: If using EXACT Index match, then composite entries are needed for multiple words, since you no longer using lucene full text searchΒ capabilities. e.g.

Name: Bob The Builder

This is good to know, because things like postal code searches or Gender where exact matches are required do not need full text indexes.

Lets check out an example of querying an index.

        [Test]
        public void VerifyWhenANewClientIsCreateThatPartialNameCanBeFuzzySearchedInTheFullTextSearchIndex()
        {
            using (var agency = Data.NewTestAgency())
            using (var client = Data.NewTestClient(agency, c =>
            {
                c.Gender = Gender.Male;
                c.GivenName = "Joseph";
                c.MiddleNames = "Mark";
                c.FamilyName = "Kitson";
                c.PreferredName = "Joey";

                c.AlsoKnownAses = new List<AlsoKnownAs>
                    {
                       new AlsoKnownAs {Name = "J-Man"},
                       new AlsoKnownAs {Name = "J-Town"}
                    };
            }
                ))
            {
                var indexName = IndexNames.Clients(agency.Agency.Data);
                const string partialName = "+Name:Joe~+Name:Kitson~";
                var result = GraphClient.QueryIndex<Client>(indexName, IndexFor.Node, partialName);
                Assert.AreEqual(client.Client.Data.UniqueId, result.First().Data.UniqueId);
            }
        }
		

Dates

Notice that in some of the code, you may have noticed that when I store date entries in the index, I store them as Ticks, so this will be as long numbers, this is awesome, as it gives raw power to searching dates via longs Smile

 [Test]
        public void VerifyWhenANewClientIsCreateThatTheDateOfBirthCanBeRangeSearchedInTheFullTextSearchIndex()
        {
            // Arrange
            const long dateOfBirthTicks = 634493518171556320;
            using (var agency = Data.NewTestAgency())
            using (var client = Data.NewTestClient(agency, c =>
            {
                c.Gender = Gender.Male;
                c.GivenName = "Joseph";
                c.MiddleNames = "Mark";
                c.FamilyName = "Kitson";
                c.PreferredName = "Joey";
                c.DateOfBirth = new DateTimeOffset(dateOfBirthTicks, new TimeSpan());
                c.CurrentAge = null;
                c.AlsoKnownAses = new List<AlsoKnownAs>
                    {
                       new AlsoKnownAs {Name = "J-Man"},
                       new AlsoKnownAs {Name = "J-Town"}
                    };
            }
                ))
            {
                // Act
                var indexName = IndexNames.Clients(agency.Agency.Data);
                var partialName = string.Format("DateOfBirth:[{0} TO {1}]", dateOfBirthTicks - 5, dateOfBirthTicks + 5);
                var result = GraphClient.QueryIndex<Client>(indexName, IndexFor.Node, partialName);
                // Assert
                Assert.AreEqual(client.Client.Data.UniqueId, result.First().Data.UniqueId);
            }
        }
		

Summary

Well, I hope you found this post useful. Neo4jClientis on nuget, so have a bash using it and would love to know your feedback.

Download

NuGetPackage:
Source Code at:

Cheers

Advertisement

11 thoughts on “Lucene Full Text Indexing with Neo4j

  1. Hey Romiko, great post. Really straightforward, and structured well!

    So you know, we’re going to blog about your findings at blog.neo4j.org, and redirect our post to your original one.

    Hope that’s cool!

    – neo4j team

  2. Just curious – is there is a specific reason the indexName is lower-cased in the CreateIndex function?

    Thanks!

    1. Hi,

      We just standardised our naming convention, where index names are lower case and relationships in Neo4j are uppercase, but you can have it as uppercase/lowercase if you wanted to πŸ™‚

  3. Works great! Is it possible to search over multiple indexes?
    Like: Search for person with name “rom*” and is between 20 and 25 years old.

    1. Indeed, Cypher is definitely gaining headway and the NeojClient is getting more added support for Cypher, when I get some time I will look at other Cypher optimization we could do to leverage indexes.

  4. I like the helpful information you supply in your articles.
    I’ll bookmark your weblog and take a look at again right here frequently. I’m relatively sure I will be informed a lot of new
    stuff right right here! Best of luck for the next!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s