http://rickosborne.org/blog/2010/02/infographic-migrating-from-sql-to-mapreduce-with-mongodb/
2014年11月4日 星期二
2013年10月13日 星期日
NoSQL Comparison: Redis, Cassandra, HBase, MongoDB, Neo4j
Data Model | Performance | Scalability | Flexibility | Complexity | Functionality | Security | Matching DB |
Key–value Stores | high | high | high | none | variable | Weak Authentication Weak Authorization No Audit function No Encryption | Redis, Memcached, DynamoDB |
Column Store | high | high | moderate | low | minimal | Cassandra, HBase | |
Document Store | high | variable | high | low | low | MongoDB, CouchDB, CouchBase | |
Graph Database | variable | variable | high | high | graph theory | Neo4j, Titan | |
Relational Database | variable | variable | low | moderate | high | Strong | Oracle, MS SQL, MySQL |
Name | Redis | Cassandra | HBase | MongoDB | Neo4j | MS SQL Server |
Description | In-memory database with configurable options performance vs. persistency | Wide-column store based on ideas of BigTable and DynamoDB | Wide-column store based on Apache Hadoop and on concepts of BigTable | One of the most popular document stores | Open source graph database | Microsoft relational DBMS |
Website | ||||||
Server operating systems | BSD | BSD | Linux | Linux | Linux | Windows |
Database model | Key-value store | Wide column store | Wide column store | Document store | Graph DBMS | Relational DBMS |
Data scheme | schema-free | schema-free | schema-free | schema-free | schema-free | yes |
Predefined data type | no | yes | no | yes | yes | yes |
Secondary indexes | no | restricted | no | yes | yes | yes |
Name | Redis | Cassandra | HBase | MongoDB | Neo4j | MS SQL Server |
APIs and other access methods | proprietary protocol | Cassandra Query Language | Java API | proprietary protocol using JSON | Cypher query language | OLE DB |
Supported programming languages (C#, JavaScript, Java, Python, Ruby Powershell) | C | C# | C | C | C# | C# |
Server-side scripts (Stored Procedure) | Lua | no | Coprocessor in Java | JavaScript | Server Plugin in Java | Transact-SQL and .NET languages |
Triggers | no | no | yes | no | yes | yes |
Partitioning methods | none | Sharding | Sharding | Sharding | none | tables can be distributed across several files (horizontal partitioning), but no sharding |
High Availability | Replication Automatic failover by Redis Sentinel | No automatic failover Relies on client failover (like MS SQL mirroring) | Implement by HDFS and Zookeeper Automatic failover | Replication Automatic failover | Replication Automatic failover | Cluster, Replication, Mirroring, Always on, etc. Automatic failover |
Replication methods | Master-slave replication | selectable replication factor Peer-to-peer replication | Replication based on HDFS | Master-slave replication | Master-slave replication (Enterprise only) | Snapshot, Transactional, Peer-to-peer replications |
MapReduce | no | yes | yes | yes | no | no |
Consistency in distributed system | n/a | Eventual Consistency | Immediate Consistency | Eventual Consistency | Eventual Consistency | n/a |
Name | Redis | Cassandra | HBase | MongoDB | Neo4j | MS SQL Server |
Foreign keys | no | no | no | no | yes | yes |
Transaction concepts | optimistic locking | no | no | no | ACID | ACID |
Concurrency | yes | yes | yes | yes | yes | yes |
Durability (Data Persistent) | yes | yes | yes | yes | yes | yes |
Access Control | very simple password-based access control No native LDAP support, can leverage third party component | Access rights for users can be defined per object Do not support LDAP natively. | Access Control Lists (ACL) depends on Hadoop and Zookeeper | Users can be defined with full access or read-only access Support LDAP authentication (v2.5) | IP-level restrictions No native LDAP support, can leverage third party component | Users with fine-grained authorization concept |
Specific characteristics | Redis very much emphasize performance. In any design decisions performance has priority over features or memory requirements. | Supports multi-data center replication | Can also be used server-less as embedded Java database. | Is one of the "Big 3" commercial database management systems besides Oracle and DB2 | ||
Best used | For rapid changing data with a foreseeable data size (should fit mostly in memory). | Write more than read and write is faster than read. | The best way to run Map/Reduce jobs on huge datasets. | Dynamic queries. If you prefer to define indexes, not Map/Reduce jobs. Good performance on big data. | For graph-style, rich or complex, interconnected data. | ….. |
Typical application scenarios | Applications that can hold all data in memory, and that have high performance requirements. Stock prices, real-time data collection, analytics, communication. | Distributed databases with many write operations Log, data analysis | Searching engine, log analysis | For most things you would do with MySQL, but having predefined columns. | Searching routes in social relations, public transport links, road maps, or network topologies | ….. |
Name | Redis | Cassandra | HBase | MongoDB | Neo4j | MS SQL Server |
Disadvantages | · Redis is an in-memory store: all your data must fit in memory. RDBMS usually stores the data on disks, and cache part of the data in memory. With a RDBMS, you can manage more data than you have memory. With Redis, you cannot. · Redis is a data structure server. There is no query language and no support for a relational algebra. No ad-hoc queries. All data accesses should be anticipated by the developer, and proper data access paths must be designed. A lot of flexibility is lost. · Redis offers 2 options for persistency: regular snapshotting and append-only files. None of them is as secure as a real transactional server providing redo/undo logging, block checksuming, point-in-time recovery, flashback capabilities, etc ... · Redis only offers basic security at the instance level. · A unique Redis instance is not scalable. It only runs on one CPU core in single-threaded mode. To get scalability, several Redis instances must be deployed and started. Distribution and sharding are done on client-side. | · No server side automatic failover. Client applications need to handle it. · Weak security control · No join or subquery support, limited support for aggregation · sorting of data is a design decision; it can be done through one of predefined ways; data can be retrieved back in same order; that’s all - there is no things like ORDER BY, GROUP BY · A single column value may not be larger than 2GB (someone stored large blob files more than 2GB unofficially). | · NameNode is a single point of failure. · Map/Reduce jobs are less efficient · Relies on Hadoop and HDFS | · If something crashes while it’s updating ‘table-contents’ – all data loss. Repair takes a lot of time, but usually ends up in 50-90% data loss if you aren’t lucky. So only way to be fully secure is to have 2 replicas in different data centers. · Indexes take up a lot of RAM. They are B-tree indexes and if you have many, you can run out of system resources really fast. · Data size in MongoDB is typically higher due to e.g. each document has field names stored it · Less flexibility with more complex querying (e.g. no joins) · No support for transactions · Global lock for either write or multiple reads, which makes concurrency less efficient. | · Lack of tools and frameworks support · Not support for ad-hod queries | ….. |
2013年10月11日 星期五
ServiceStack.Redis, C#
https://github.com/ServiceStack/ServiceStack.Redis
Redis Client API Overview
4 Types of RedisClients
You can choose the layer of abstraction to work with
1. ICacheClient:
2. IRedisNativeClient: same command in C# and native client but in binary data format
3. IRedisClient: higher level of redis client
4. IRedisTypedClient:
Using NuGet to Install
Example 1, IRedisNativeClient
Example 2, IRedisClient
Example 3, IRedisTypedClient
Example 4, Transcation
Example 5, Pub and Sub
2013年10月9日 星期三
NoSQL vs Relational Databases
Relational (Conventional) Database
NoSQL
Web Scale
Distributed Architecture
CAP Theorem
Consistency
Compromises (妥協)
NoSQL Query
A MapReduce Example
NoSQL Databases in the Cloud
NoSQL Categories
NoSQL vs Relational Databases
Performance
Appropriateness
Reasons for NoSQL
Reasons for Relational
BI and Big Data in NoSQL
BI
Big Data
Is NoSQL Right for Organization?
Training
OLTP Need?
Recruting
Economics
Funding