Azure Search Autoindexing DocumentDB Data

Hello.

Isn’t that nice, if you can sync (index) your DocumentDB data to Azure Search in auto schedule? Yap, so that’s what we will do today.  At the end, your DocumentDB data will automatically be indexed to Azure Search, and be able to search from Azure Search like you are searching DocumentDB data.

Azure Search Index DocumentDB Data on Schedule

Original and great walk through tutorial is here at https://azure.microsoft.com/en-us/documentation/articles/documentdb-search-indexer/.   This is my version of walking through this tutorial.

Previously I have cover a bit about Azure Search and DocumentDB so if you are not too familiar with what they are and capable of, please refer to, Mastering Azure DocumentDB Part 1 and Mastering Azure Search.

To auto index your DocumentDB to Search, you need to have two things.  1 is data source and 2 is indexer.

Creating Data Source

Let’s have Azure Search connect to your DocumentDB,  and also define changed data and deleted data so that we only sync what we need.

The API to create data source is this. The full documentation is here at: https://msdn.microsoft.com/en-us/library/azure/dn946876.aspx

and request body should be like below;

It’s simple and straightforward, you need, “name”, “description”, “type”, “credentials”, “container”, “dataChangedDetectionPolicy”, “dataDeletionDetectionPolicy”.

You put some easy to understand “name” and “description”.  And set “documentdb” for your “type”.

You can find connectiong string of DocumentDB at your portal.  Here is my portal and where connection string was;

スクリーンショット_021016_063130_AM

You can copy-paste connection string from your portal at “credentials”.   Then you put DocumentDB collection id as “name” of “container”.  This is optional, but you can also add “query” field, where you have DocumentDB query to select data that you want.  Not necessary my recommendation but if you have nice, organized and relative small per record DocumentDB, then you can leave it blank and let indexer index the whole document.

When you are using DocumentDB, which we are, you don’t have worry too much about detecting changes, it’s handled at _ts field, where they put last modified date of the record.

It seems there isn’t much to do for data deletion detection policies, since it just sport Soft Delete policy.  I don’t know what that is but if they only support 1 type, then I ll go with that.

My request body look like below;スクリーンショット_021016_065140_AM

It seems the request went through fine. Now I went to the portal and check.  I now see newly added Data Source, good.

スクリーンショット_021016_071032_AM

Creating Index

Now creating Index, you can use API or Portal to create Index.  The original tutorial shows how to get it done with API but I go ahead and create my index from the portal.  The query result must match the field of Search. And you can’t change or delete fields once you make them, so I ll be very careful on fields that I make.

スクリーンショット_021016_072028_AM

follow the interface on the portal, and click on save and done.  Retrievable, Filterable, Sortable, Facetable, Searchable, all self exploratory, check them if you need, don’t check them if you don’t need.

Creating Indexer

so it seems I can’t make indexer from the portal, so back to my postman, and ready to send request via API, like below;スクリーンショット_021016_074143_AM

Now I got indexer that runs every 15 minutes.

For detail on creating Indexer, you can refer to https://msdn.microsoft.com/en-us/library/azure/dn946899.aspx

I m back at portal, and now I see newly created indexer.

スクリーンショット_021016_074619_AM

And it even shows that it run good, and read 2 docs.

Super, now we can reference back to Azure Search article and do all cool things that Azure search offers (and not offer).

Leave a Reply