MongoDb is one of the biggest names in the NoSQL database space. But it isn’t well known within the South African Enterprise space. So, while Companies and Database Administrators loathe getting involved with projects that make use of it, experience in implementing MongoDb proves that this is a groundless fear. It is a straightforward process and, once MongoDb is up and running, very little support is required.
Traditional Databases such as Microsoft SQL Server are based on the relational model. This has its benefits, but a relational database structure never matches the logic of the coding language used. MongoDb uses a dynamic document oriented storage. The data can be stored in a way that best matches the coding language’s object structure which makes it easier to code against. It also supports sharding, which allows the data to be scaled horizontally across many machines, improving performance quickly with minimum effort. And supporting it is simple, even in an environment dominated by SQL experience.
Developing with MongoDb is not the focus of this paper but it is worth touching on it to highlight some of the best practices: 1. MongoDb provides drivers for all the major programming languages to handle the interactions with the database for the application. 2. Writes are “fire and forget” by default. Implement Write Concern (Safe Mode) to ensure the writes perform a round trip to the server. 3. There is a mind-set shift required when moving from programming against SQL to programming against MongoDb. The difference in the database models is big enough that there is a steep initial learning curve.
MongoDb is free to download from the MongoDb.org site. It is optimised to run on 64bit Operating Systems and the latest versions make use of newer features on Server 2008 R2 and later, to enhance performance. Running in a 32bit environment is not recommended for a Production application. 32bit versions of MongoDb are limited to around 2 GB of data.
MongoDb is designed to run on its own server or virtual server. It makes use of file-mapped memory, so it will use all the memory it can. Running it alongside SQL Server or another program that uses variable amounts of memory will cause performance issues. The more memory available to Mongo, the better it will perform. RAM, Hard Disk space and server architecture are dependent on the particular product and solution it is solving. In two separate implementations the following configurations were used.
A Generic Reconciliation Product running in an environment with around 50 users accessing the data and 8 000 transactions imported monthly. 1. The Application and Mongo are running on the same server a. This is not an ideal setup but the server is dedicated to this application. No other database server runs on it. b. It is a small implementation and the setup is suited to the current volumes.
1. Dual Xeon Server 2. 8Gb Memory 3. 60Gb Drive 4. Windows Server 2008 R2
A massive document collection and tagging application that processed well over 1 000 000 documents of varying sizes in the space of 4 months. Mongo and the Application were running on different virtual servers on the same physical machine.
1. 2 x 12 Core 2GHz processor = 24 logical processors 2. 192Gb Memory 3. 500Gb SSD 4. 2Tb Drive 5. Windows Server 2012
1. 2 Virtual Processors 2. 64Gb Memory 3. Access to both drives on the physical machine
1. Extract MongoDb. It is self-contained and can be installed to any folder on any drive. 2. Run it as a Windows Service to ensure it starts automatically. Starting it through the command line, after server restarts or failures, adds unnecessary complexity to a Production Environment. 3. Create a Config file and use it to manage the instance rather than using the control line arguments. The configuration file settings are functionally the same, but are easier to edit and manage, especially in large-scale deployments.
The level of configuration required depends on the implementation. Below are details of common administrator concerns and solving them in MongoDb:
MongoDb is designed to run in a trusted environment, and authentication and authorisation are not enabled. It is expected that only trusted network access will be allowed. Use secure mode if running on a public network. When authentication is enabled, all clients are required to provide credentials to access the databases. Authorisation determines what level of access a user has to the databases and operations.
Replication keeps multiple copies of the database on different servers. This increases redundancy and improves data availability. In high risk/always-on environments, replica sets should be configured. Different data sets can then be used for tasks like Reporting, Backup and Disaster Recovery. Replica Sets always consist of one Primary mongod instance, which accepts all write operations from the clients. Secondary sets then asynchronously apply all changes to the primary, to their own data set. If the primary fails, the replica set will hold an election to select a new primary, ensuring automatic failover. The most common setup is one primary, and two secondary sets.
MongoDb supports sharding to handle large datasets and deployments with high throughputs. Vertical scaling has a practical limit because it relies on increasing the CPU, storage and RAM of a single machine. In contrast, sharding (horizontal scaling) is a method of splitting the data across multiple machines. A sharded cluster (the set of nodes in a sharded deployment) consists of three config processes, one or more shards (a single mongod instance or a replica set) and one or more mongod instances to route the read and writes between the application and the shards.
MongoDb provides 3 different backup methods. Mongodump has proved effective in the implementations detailed above. 1. Mongodump produces high fidelity BSON files and is suitable for smaller implementations. Mongorestore will populate a database with the contents of the BSON files. It will compete for resources with applications that are modifying the data. If using with Replica, sets run the command on a secondary. 2. Copy the underlying data files to create a backup. 3. Third party tools like the MongoDb Management Services (MMS) continually back up replica sets and sharded systems and can provide point in time recovery. (https://mms.mongodb.com/)
From experience, once MongoDb is configured and running, it needs very little support. The system can be monitored using the Reporting tools available in MongoDb or through third party self-hosted or SaaS tools. The MongoDb tools include the following: 1. Utilities to diagnose issues and assess normal operation a. Mongostat to report on the load distribution on the server and help with capacity planning b. Mongotop to check the database activity 2. REST Interface to configure monitoring and alert scripts 3. HTTP Console is a web page for simple diagnostic and monitoring tasks 4. Various commands can be run from the command line If performance degradation is experienced some common causes are 1. Locking 2. Memory usage 3. Page faults
MongoDb is a very powerful database and can be a beneficial addition in any company. It is easy to install and can be easily configured to scale according to the application’s needs. MongoDb is stable in Production and requires little support outside of regular monitoring. Organisations and DBA’s should not hold back from implementing MongoDb projects simply because it is unknown.