Data Model | Performance | Scalability | Flexibility | Complexity | Functionality | Security | Matching DB |
Key–value Stores | high | high | high | none | variable | Weak Authentication Weak Authorization No Audit function No Encryption | Redis, Memcached, DynamoDB |
Column Store | high | high | moderate | low | minimal | Cassandra, HBase | |
Document Store | high | variable | high | low | low | MongoDB, CouchDB, CouchBase | |
Graph Database | variable | variable | high | high | graph theory | Neo4j, Titan | |
Relational Database | variable | variable | low | moderate | high | Strong | Oracle, MS SQL, MySQL |
Name | Redis | Cassandra | HBase | MongoDB | Neo4j | MS SQL Server |
Description | In-memory database with configurable options performance vs. persistency | Wide-column store based on ideas of BigTable and DynamoDB | Wide-column store based on Apache Hadoop and on concepts of BigTable | One of the most popular document stores | Open source graph database | Microsoft relational DBMS |
Website | ||||||
Server operating systems | BSD | BSD | Linux | Linux | Linux | Windows |
Database model | Key-value store | Wide column store | Wide column store | Document store | Graph DBMS | Relational DBMS |
Data scheme | schema-free | schema-free | schema-free | schema-free | schema-free | yes |
Predefined data type | no | yes | no | yes | yes | yes |
Secondary indexes | no | restricted | no | yes | yes | yes |
Name | Redis | Cassandra | HBase | MongoDB | Neo4j | MS SQL Server |
APIs and other access methods | proprietary protocol | Cassandra Query Language | Java API | proprietary protocol using JSON | Cypher query language | OLE DB |
Supported programming languages (C#, JavaScript, Java, Python, Ruby Powershell) | C | C# | C | C | C# | C# |
Server-side scripts (Stored Procedure) | Lua | no | Coprocessor in Java | JavaScript | Server Plugin in Java | Transact-SQL and .NET languages |
Triggers | no | no | yes | no | yes | yes |
Partitioning methods | none | Sharding | Sharding | Sharding | none | tables can be distributed across several files (horizontal partitioning), but no sharding |
High Availability | Replication Automatic failover by Redis Sentinel | No automatic failover Relies on client failover (like MS SQL mirroring) | Implement by HDFS and Zookeeper Automatic failover | Replication Automatic failover | Replication Automatic failover | Cluster, Replication, Mirroring, Always on, etc. Automatic failover |
Replication methods | Master-slave replication | selectable replication factor Peer-to-peer replication | Replication based on HDFS | Master-slave replication | Master-slave replication (Enterprise only) | Snapshot, Transactional, Peer-to-peer replications |
MapReduce | no | yes | yes | yes | no | no |
Consistency in distributed system | n/a | Eventual Consistency | Immediate Consistency | Eventual Consistency | Eventual Consistency | n/a |
Name | Redis | Cassandra | HBase | MongoDB | Neo4j | MS SQL Server |
Foreign keys | no | no | no | no | yes | yes |
Transaction concepts | optimistic locking | no | no | no | ACID | ACID |
Concurrency | yes | yes | yes | yes | yes | yes |
Durability (Data Persistent) | yes | yes | yes | yes | yes | yes |
Access Control | very simple password-based access control No native LDAP support, can leverage third party component | Access rights for users can be defined per object Do not support LDAP natively. | Access Control Lists (ACL) depends on Hadoop and Zookeeper | Users can be defined with full access or read-only access Support LDAP authentication (v2.5) | IP-level restrictions No native LDAP support, can leverage third party component | Users with fine-grained authorization concept |
Specific characteristics | Redis very much emphasize performance. In any design decisions performance has priority over features or memory requirements. | Supports multi-data center replication | Can also be used server-less as embedded Java database. | Is one of the "Big 3" commercial database management systems besides Oracle and DB2 | ||
Best used | For rapid changing data with a foreseeable data size (should fit mostly in memory). | Write more than read and write is faster than read. | The best way to run Map/Reduce jobs on huge datasets. | Dynamic queries. If you prefer to define indexes, not Map/Reduce jobs. Good performance on big data. | For graph-style, rich or complex, interconnected data. | ….. |
Typical application scenarios | Applications that can hold all data in memory, and that have high performance requirements. Stock prices, real-time data collection, analytics, communication. | Distributed databases with many write operations Log, data analysis | Searching engine, log analysis | For most things you would do with MySQL, but having predefined columns. | Searching routes in social relations, public transport links, road maps, or network topologies | ….. |
Name | Redis | Cassandra | HBase | MongoDB | Neo4j | MS SQL Server |
Disadvantages | · Redis is an in-memory store: all your data must fit in memory. RDBMS usually stores the data on disks, and cache part of the data in memory. With a RDBMS, you can manage more data than you have memory. With Redis, you cannot. · Redis is a data structure server. There is no query language and no support for a relational algebra. No ad-hoc queries. All data accesses should be anticipated by the developer, and proper data access paths must be designed. A lot of flexibility is lost. · Redis offers 2 options for persistency: regular snapshotting and append-only files. None of them is as secure as a real transactional server providing redo/undo logging, block checksuming, point-in-time recovery, flashback capabilities, etc ... · Redis only offers basic security at the instance level. · A unique Redis instance is not scalable. It only runs on one CPU core in single-threaded mode. To get scalability, several Redis instances must be deployed and started. Distribution and sharding are done on client-side. | · No server side automatic failover. Client applications need to handle it. · Weak security control · No join or subquery support, limited support for aggregation · sorting of data is a design decision; it can be done through one of predefined ways; data can be retrieved back in same order; that’s all - there is no things like ORDER BY, GROUP BY · A single column value may not be larger than 2GB (someone stored large blob files more than 2GB unofficially). | · NameNode is a single point of failure. · Map/Reduce jobs are less efficient · Relies on Hadoop and HDFS | · If something crashes while it’s updating ‘table-contents’ – all data loss. Repair takes a lot of time, but usually ends up in 50-90% data loss if you aren’t lucky. So only way to be fully secure is to have 2 replicas in different data centers. · Indexes take up a lot of RAM. They are B-tree indexes and if you have many, you can run out of system resources really fast. · Data size in MongoDB is typically higher due to e.g. each document has field names stored it · Less flexibility with more complex querying (e.g. no joins) · No support for transactions · Global lock for either write or multiple reads, which makes concurrency less efficient. | · Lack of tools and frameworks support · Not support for ad-hod queries | ….. |
沒有留言:
張貼留言