2013年10月13日 星期日

NoSQL Comparison: Redis, Cassandra, HBase, MongoDB, Neo4j

 

Data Model

Performance

Scalability

Flexibility

Complexity

Functionality

Security

Matching DB

Key–value Stores

high

high

high

none

variable

Weak Authentication

Weak Authorization

No Audit function

No Encryption

Redis, Memcached, DynamoDB

Column Store

high

high

moderate

low

minimal

Cassandra, HBase

Document Store

high

variable

high

low

low

MongoDB, CouchDB, CouchBase

Graph Database

variable

variable

high

high

graph theory

Neo4j, Titan

Relational Database

variable

variable

low

moderate

high

Strong

Oracle, MS SQL, MySQL

 

Name

Redis

Cassandra

HBase

MongoDB

Neo4j

MS SQL Server

Description

In-memory database with configurable options performance vs. persistency

Wide-column store based on ideas of BigTable and DynamoDB

Wide-column store based on Apache Hadoop and on concepts of BigTable

One of the most popular document stores

Open source graph database

Microsoft relational DBMS

Website

redis.io

cassandra.apache.org

hbase.apache.org

www.mongodb.org

neo4j.org

www.microsoft.com/­sqlserver

Server operating systems

BSD
Linux
OS X
Windows

BSD
Linux
OS X
Windows

Linux
Unix
Windows(*)

Linux
OS X
Solaris
Windows

Linux
OS X
Windows

Windows

Database model

Key-value store

Wide column store

Wide column store

Document store

Graph DBMS

Relational DBMS

Data scheme

schema-free

schema-free

schema-free

schema-free

schema-free

yes

Predefined data type

no

yes

no

yes

yes

yes

Secondary indexes

no

restricted

no

yes

yes

yes

Name

Redis

Cassandra

HBase

MongoDB

Neo4j

MS SQL Server

APIs and other access methods

proprietary protocol

Cassandra Query Language
Thrift

Java API
RESTful HTTP API
Thrift

proprietary protocol using JSON

Cypher query language
Java API
RESTful HTTP API

OLE DB
Tabular Data Stream (TDS)
ADO.NET
JDBC
ODBC

Supported programming languages

(C#, JavaScript, Java, Python, Ruby Powershell)

C
C#
C++
Java
JavaScript
Objective-C
Perl
PHP
Python
Ruby
……

C#
C++
Java
JavaScript
Perl
PHP
Python
Ruby
……

C
C#
C++
Java
PHP
Python
……

C
C#
C++
Java
JavaScript
Perl
PHP
PowerShell
Python
Ruby
……

C#
Java
JavaScript
Perl
PHP
Python
Ruby
……

C#
Java
PHP
Python
Ruby
Visual Basic

Server-side scripts (Stored Procedure)

Lua

no

Coprocessor in Java

JavaScript

Server Plugin in Java

Transact-SQL and .NET languages

Triggers

no

no

yes

no

yes

yes

Partitioning methods

none

Sharding

Sharding

Sharding

none

tables can be distributed across several files (horizontal partitioning), but no sharding

High Availability

Replication

Automatic failover by Redis Sentinel

No automatic failover

Relies on client failover (like MS SQL mirroring)

Implement by HDFS and Zookeeper

Automatic failover

Replication

Automatic failover

Replication

Automatic failover

Cluster, Replication, Mirroring, Always on, etc.

Automatic failover

Replication methods

Master-slave replication

selectable replication factor

Peer-to-peer replication

Replication based on HDFS

Master-slave replication

Master-slave replication (Enterprise only)

Snapshot, Transactional, Peer-to-peer replications

MapReduce

no

yes

yes

yes

no

no

Consistency in distributed system

n/a

Eventual Consistency
Immediate Consistency

Immediate Consistency

Eventual Consistency
Immediate Consistency

Eventual Consistency

n/a

Name

Redis

Cassandra

HBase

MongoDB

Neo4j

MS SQL Server

Foreign keys

no

no

no

no

yes

yes

Transaction concepts

optimistic locking

no

no

no

ACID

ACID

Concurrency

yes

yes

yes

yes

yes

yes

Durability (Data Persistent)

yes

yes

yes

yes

yes

yes

Access Control

very simple password-based access control

No native LDAP support, can leverage third party component

Access rights for users can be defined per object

Do not support LDAP natively.

Access Control Lists (ACL) depends on Hadoop and Zookeeper

Users can be defined with full access or read-only access

Support LDAP authentication (v2.5)

IP-level restrictions

No native LDAP support, can leverage third party component

Users with fine-grained authorization concept

Specific characteristics

Redis very much emphasize performance. In any design decisions performance has priority over features or memory requirements.

Supports multi-data center replication

   

Can also be used server-less as embedded Java database.

Is one of the "Big 3" commercial database management systems besides Oracle and DB2

Best used

For rapid changing data with a foreseeable data size (should fit mostly in memory).

Write more than read and write is faster than read.

The best way to run Map/Reduce jobs on huge datasets.

Dynamic queries. If you prefer to define indexes, not Map/Reduce jobs. Good performance on big data.

For graph-style, rich or complex, interconnected data.

…..

Typical application scenarios

Applications that can hold all data in memory, and that have high performance requirements.

Stock prices, real-time data collection, analytics, communication.

Distributed databases with many write operations

Log, data analysis

Searching engine, log analysis

For most things you would do with MySQL, but having predefined columns.

Searching routes in social relations, public transport links, road maps, or network topologies

…..

Name

Redis

Cassandra

HBase

MongoDB

Neo4j

MS SQL Server

Disadvantages

· Redis is an in-memory store: all your data must fit in memory. RDBMS usually stores the data on disks, and cache part of the data in memory. With a RDBMS, you can manage more data than you have memory. With Redis, you cannot.

· Redis is a data structure server. There is no query language and no support for a relational algebra. No ad-hoc queries. All data accesses should be anticipated by the developer, and proper data access paths must be designed. A lot of flexibility is lost.

· Redis offers 2 options for persistency: regular snapshotting and append-only files. None of them is as secure as a real transactional server providing redo/undo logging, block checksuming, point-in-time recovery, flashback capabilities, etc ...

· Redis only offers basic security at the instance level.

· A unique Redis instance is not scalable. It only runs on one CPU core in single-threaded mode. To get scalability, several Redis instances must be deployed and started. Distribution and sharding are done on client-side.

· No server side automatic failover. Client applications need to handle it.

· Weak security control

· No join or subquery support, limited support for aggregation

· sorting of data is a design decision; it can be done through one of predefined ways; data can be retrieved back in same order; that’s all - there is no things like ORDER BY, GROUP BY

· A single column value may not be larger than 2GB (someone stored large blob files more than 2GB unofficially).

· NameNode is a single point of failure.

· Map/Reduce jobs are less efficient

· Relies on Hadoop and HDFS

· If something crashes while it’s updating ‘table-contents’ – all data loss. Repair takes a lot of time, but usually ends up in 50-90% data loss if you aren’t lucky. So only way to be fully secure is to have 2 replicas in different data centers.

· Indexes take up a lot of RAM. They are B-tree indexes and if you have many, you can run out of system resources really fast.

· Data size in MongoDB is typically higher due to e.g. each document has field names stored it

· Less flexibility with more complex querying (e.g. no joins)

· No support for transactions

· Global lock for either write or multiple reads, which makes concurrency less efficient.

· Lack of tools and frameworks support

· Not support for ad-hod queries

…..

沒有留言:

張貼留言