This Could Be True? Lighting Fast Query For Huge Data

alt There is a well known story about “Data” that is, Data sharing was a big problem, until the 2000’s. Today, this problem solved but another problem was born via this. We call as “Data Storing.” Think a web based report querying system that get millions request. In fact, this system need a “Lighting Fast Query” This could be possible, under large scalable systems and huge data. Of course, this can be possible with “Modern SOA” systems. One of the popular player of this system is “Database Management Systems”. There is two modern approach possible about this that is RDBMS and Non-Relational Management.

Today, I want to share one of my comparison result. I try to compare speed of MongoDb and Ef Core under huge data.

Test Machine System Info:

RAM:8 Gb, CPU:i5-4200H, OS: MS-Windows

I prefer to compare these on Dot Net Core 2.0 Framework. For this aim, I used MongoDb Driver and Entity Core 2.0 with Sql Server

Let’s give a quick info about MongoDb and Ef Core:

MongoDb

MongoDb is an NoSql Database solution. There isn't any relationship between collections. Data is stored as Bson format.

MongoDb C# Driver

This is an client library that is allow communicates to MongoDb. You can get detailed info at there
https://docs.mongodb.com/manual/applications/drivers/

MS-Sql

Sql Server is an RDMS. Data storage according to cell and column idea. RDMS was developed according to mathematical expression.

Ef Core

Ef Core is a Object Relational Mapping framework. It is an lightweight version of Ef. Ef Core support only Code First approach.
You can get detailed info at there.
https://docs.microsoft.com/en-us/ef/

Let’s Prepare Record For Testing

For testing purpose, I added 10.000.000 records to mongoDb and MS-Sql Server via Mongo on PowerShell and MS-Sql Server. I create test collection structure like this. (To generate 10.000.000 record, you can search on Bulk Insert concept on MongoDb and Ms-Sql) )

Here is the screenshots of data structures.

db.testCollection.insert({ RecType:1 Content:”Test Record” CreateTime:new Date("2018-01-01T16:00:00Z"); }) alt

At the following part, I accept that I have 4 record type like RecType: 1, 2, 3, 4 For “RecType:1”, recordcount =9.000.000. All of the rest record is shared as randomly between type 2, 3, 4

First Trial Failed

Let’s try to find documents that is RecType:1. Expected result count is around 9.000.000 :)
I used .NetCore 2.0 for only search progress. Also, I use Stopwatch Class to accurately measure elapsed time. Guess What? Computer has frozen.

Second Trial

After first unsuccess trial step, I decided to use “Index Pover”. Before to continue on Test Experiment, I want to give a summary about ‘Index’.

What is Index?

In short definition, I can say that An index is copied version of ordered data. Searching progress can be completed without looking each row. This progress can reduce the search time as very efficiently.

Indexing In MsSql

In MS-Sql, There are many type of index available. In this test, I focused Clustered and Nonclustered index.

You can get detailed indexing in Ms-Sql in this link
https://docs.microsoft.com/en-us/sql/relational-databases/indexes/indexes

Clustered Index: As a simple definition, ‘Alphabetic ordered books’ can give as a good example. According to Ms-Sql rules, each table only can have one clustered index. Clustered index must be unique in this table. Primary Keys are the best example for this index type.

Nonclustered Index: This types indexes are the small copy of tables. In contrast to Clustered definition, there isn’t a unique rule for non clustered index. There is a constraint for this index type about on count.

* The maximum non clustered index count must be 900. In my opinion, If one day, I need a 900 index for a table, I will try to review of database design :) As we know that nonclustered index are the small copy of table.

Indexing In MongoDb

In mongodb, Indexes definitions are similar to other database system.

In these test, I used “Single and Multiple Index keys” definition. But, I have to say that “Geospatial Index” is very useful :) You can use them for location based search ;)

Single Field Index: This type index, contains only one field. This type index can support either direction(ascending, descending)

Multiple Index: This type index, contains many field for this reason contents are stored in array.

MongoDb has a very clear and useful document. Thanks a lot for MongoDb :), I would definitely recommend you review. You can get detailed info at the given link.
https://docs.mongodb.com/manual/indexes/#index-types

Here is the useful keyword for mongoDb “.explain(true)“. This keyword can show info about your query health, affected record count, execution plan, rejected plans and etc.
When you use “.explain(true)“ keyword, if you detect “CoolScan Index” is used for your search progress, this is the proof that your index not working :(

As lastly, “_id”==ObjectId field defined as default Index in MongoDb. ObjectId can be generated by Mongo as unique key with using a defined formula.

Okay, Let’s continue with Second Trial

As I said, After first unsuccess trial step, I decided to use “Index Pover”. For this aim In MsSql I used NonClustered Index. Single Index is chosen for MongoDb

Let’s look at my database structures one more time to remember :)

db.testCollection.insert({ RecType:1 Content:”Test Record” CreateTime:new Date("2018-01-01T16:00:00Z"); })

MsSql : I chosed “RecType” column to implement non-clustered Index

MongoDb: I choose “RecType” field for Single Type Index

In MsSql, I can’t choose “RecType” as Clustered Index. Because, clustered index have to be unique in each table. Of Course, this selection change my Query cost like this figure at the below

Test Result

I know that, I said a lot of thing except test result :) But they are important part to understand test.

I implement this test with ten million data. I execute search queries via dot.net core. I forced to use Indexes. At the below, I used average solutions.

MongoDb : 28.5 second

EfCore : 46.80 second

Here is the reminding part about my system info

Test Machine System Info:
RAM:8 Gb, CPU:i5-4200H, OS: MS-Windows

I prefer to compare these on Dot Net Core 2.0 Framework. For this aim, I used MongoDb Driver and Entity Core 2.0 with Sql Server