编程知识 cdmana.com

Comparison of Oracle, NoSQL and newsql database technology

About author John Ryan Experienced data warehouse architect 、 Developers and database administrators . He specializes in too many bytes Oracle On the system Kimball Dimension design , In many different industries, such as mobile phones and investment banking, it has accumulated more than 30 Year of IT Experience . This article was first published as part of a series of articles on databases and big data .

01 The world has changed

In the past 20 year , The world has changed dramatically . stay 2000 In the year , There are only a few millions of people on the Internet , Or with a desktop computer 56k The cat came to the Internet , At that time, Amazon only sold books . today , Billions of people use smart phones or tablets every week 7 God 、 Every day 24 On the hour network , Almost everything is bought online , Also use Facebook、Twitter and Instagram These social apps interact with people . be a trend which cannot be halted . People's psychological expectations have also changed . If the page doesn't refresh in a few seconds , We lost patience immediately , Change to another website . If a website is not accessible , We fear that that is the end of civilization as we know it . If a large website cannot be accessed , It's going to be big global news . Instant gratification is not enough ! (Instant gratification takes too long!) — Ladawn Clare-Panton notes : If you're not an experienced database architect , You may need to read my previous articles on scalability and database architecture .

02 What has changed ?

The following conclusions can be drawn from the above :

  • Extensibility — Potentially explosive traffic growth ,IT The system needs to scale up quickly , To deal with exponential growth
  • High availability — IT The system has to be weekly 7 God 、 Every day 24 Hour run , And it must be fault tolerant .( Bank of America 2011 A breakdown occurred once a year , Yes 2900 Million customers for six days ).
  • High performance — With the increasing scalability , Performance has to keep up with , Keep it steady and fast . According to Amazon estimates , In extreme cases , Every additional second of page load time , The company loses every year 16 Billion dollars .
  • Speed — More and more networking sensors come with the device ( Far do not say , Smart phones come with built-in networked sensors ), There may be millions of transactions to be processed per second .
  • Real time analysis — Batch processing and business intelligence at night are out of date . The boundary between analysis and manipulation becomes blurred , There is a growing need for real-time decision-making .

The Internet of things (Internet of Things) Let's speed up sharply ! — Stonebraker Doctor (MIT) . The above needs have led to wonderful marketing terms Translytical database , It means a hybrid solution , That is, the same solution can handle massive transactions , Real time analysis can also be done .

03 What's the problem ?

Provide high performance while reducing costs ( You may also want to use cheap servers ), It's a challenge for all database vendors . however , There are conflicting needs :

  • performance — Minimize latency , Complete transactions in milliseconds .
  • Usability — Even if one or more nodes of the system fail or are disconnected from the network , Can also maintain the ability to run .
  • Extensibility — Can continue to scale up , To meet the requirements of massive data and transaction speed .
  • Uniformity — Provide consistency 、 Accurate results — Especially in case of network failure .
  • Durability — Make sure that the modification will not be lost once implemented .
  • flexibility — Provide a common database solution , To support the workload of transaction and analysis .

We should have the ability of massive and progressive expansion , The only realistic way is to deploy a scale out distributed system . Usually , To maximize availability , Changes made to one node are immediately copied to two or more other nodes . however , Once data is allocated to multiple services device , It faces a trade-off between advantages and disadvantages . for example :

3.1 Performance and availability and durability

many NoSQL The database copies the data to other nodes in the cluster , To improve usability . If The database node crashes immediately after the write operation , The data is backed up on other machines , So the changes are persistent . however , You can also relax this requirement , Return immediately without backup . This maximizes performance , But there's a risk of losing changes . Changes may not last at all .  Insert picture description here ▲ Geographically distributed systems

3.2 Consistency and availability

NoSQL Databases support ultimate consistency . for example , In the diagram above , If the network with New York The connection is temporarily broken , There are two options :

  • Stop processing — But New York's availability has been affected
  • Accept read / Write operation — Eliminate differences after network connection is restored . But the risk of doing so is to provide expired or wrong results , You may need to solve the problem of writing

obviously ,NoSQL Databases trade consistency for availability .

3.3 Flexibility and scalability And Oracle and DB2 Compared with general relational database ,NoSQL The database is relatively flexible Bad ,( for example ) I won't support it Join( Connect ) operation . Except for a lot of people who don't support SQL Language database , Some databases ( for example Neo4J and MongoDB) It's designed to support specific problems — Graph processing and JSON data structure . Even if like HBase、Cassandra and Redis Such a database , Also abandon the relational join operation , But many also restrict access to a single primary key , And it doesn't support secondary indexes . Many databases claim that 100% Support ACID Business , Actually provide formal ACID There are few guarantors . — Peter Bailis Doctor ( Stanford university )

04 ACID Consistent with the final

Extended aspects of database solutions , One of the main challenges is to maintain ACID Uniformity . Amazon uses DynamoDB database , Relax the consistency constraint , In exchange for speed , This solves the performance problem , This has led to a large number of NoSQL database . in addition , The most successful database ( Include Oracle) It doesn't provide real ACID Isolation, . This paper studies 18 A database , The default support SerializabilITy( Serializability ) There are only three databases of (VoltDB、Ingres and Berkeley DB). The main reason is that it is difficult to support serializability while maintaining performance . In the end, consistency is a particularly weak pattern .

The system can return any data , We can still be consistent in the end . — Peter Bailis Doctor ( Stanford ) On the other hand , Final consistency provides little guarantee of consistency . The following figure illustrates the problem of final consistency . A user deducts money from a bank account 100 Thousands of dollars , But before the account changes are copied , Another user checks the balance of this account . The only guarantee is , As long as there is no further write operation , The system will eventually provide consistent results . What's the use of this ? To be accepted, let alone .  Insert picture description here

▲ Cassandra — Final consistency

05 Rethink OLTP database

Ten years ago ,Michael Stonebraker The doctor wrote 《 The end of the architecture era 》(The End of an ArchITectural Era) This article , Think Oracle、 Microsoft and IBM Proposed 1970 The database architecture of the S is out of date . He put forward OLTP The database should have the following characteristics :

  • Dedicated to solving a problem — Quick execution of short predefined ( Not improvised ) Business , The query plan is relatively simple . In short , It's special OLTP platform .
  • accord with ACID standard — All transactions are single threaded , All serializability is provided by default . Always available — Using data replication ( Not hot standby ) To provide high availability , Almost no increase in cost .
  • Geographically dispersed — Run seamlessly on a grid of scattered machines ( Further improve resilience , And locally improve performance )
  • No shared architecture — Multiple machines are connected through a peer-to-peer grid , Share the load . Adding machines is a seamless operation that does not cause downtime , And the loss of one node only causes a slight performance degradation , Instead of shutting down the whole system .
  • Memory based — All in memory , To increase absolute speed , The durability is guaranteed by in memory data replication to other nodes .
  • Eliminate bottlenecks — Completely redesign the database internals , Implementation of single thread running , At the same time, eliminate redo (Redo) Logging and the need for locking and locking — These are the most significant constraints on database performance .

To prove the possibility of the above , He built a prototype , namely H-Store database , And prove using the same hardware , TPC-C Benchmark performance is that of a business competitor 82 times .H-Store The prototype is excellent , It realizes processing every second 70,000 One transaction , And despite a lot of effort by database administrators to tune , A business competitor only 850 individual .

06 Nothing is difficult in the world !

Stonebraker The doctor's achievements are impressive . Previous TCP-C The world record for every CPU The core is about 1,000 One transaction , but H-Store Dual core 2.8GHz Desktop computer , The speed is the original world record 35 times . He was in 2008 Articles from 《 Probe into OLTP 》(OLTP through the Looking Glass) Explains why business databases ( Include Oracle) Why is the performance so poor .  Insert picture description here

▲ Processing resource consumption of relational database

Shown above , Yes 93% System overhead is used for traditional ( Historical legacy ) Database system of , Including locking 、 Latch and cache management . The total is only 7% The machine resource is dedicated to the task at hand . H-Store Just by eliminating these bottlenecks , Use memory based processing instead of disk based processing , To achieve the seemingly impossible task , That is, comprehensive ACID Transaction consistency , It has increased the speed by several orders of magnitude .

07 NewSQL Database technology

 Insert picture description here

VoltDB First published in 2010 year , yes H-Store Commercial products of prototypes , Belong to the exclusive use of OLTP platform , be used for Web Transaction processing and real-time analysis . As this information graph shows , There are 250 A commercial database solution , Only one 13 Species are classified as NewSQL The ranks of Technology .

08 VoltDB

And others NewSQL The database is the same ,VoltDB Designed to run completely in memory , Provides the option to take periodic disk snapshots . It can run locally on 64 position Linux, You can also use AWS、 Google and Azure Cloud services to run , Adopt a horizontally scalable architecture . Traditional relational databases write data to disk based log files .VoltDB Otherwise , It is to modify multiple machines in memory at the same time . for example , Even if two machines fail , K-Safety The coefficient is 2 It can guarantee no data loss , Because the data is stored in at least three memory nodes . Business as Java stored procedure (stored procedure) Submit , It can be executed asynchronously in the database , And the data is automatically partitioned ( Fragmentation ), Assigned to nodes in the system , Although benchmark data can be replicated to maximize connection performance .VoltDB It's a little unusual , That is to say JSON The form of the data structure , Support semi-structured data . In terms of performance ,2015 A benchmark test conducted in 1998 showed that ,VoltDB The processing speed is at least NoSQL database Cassandra Twice as many , But the cost is only AWS Six times the cost of cloud processing One . Last ,VoltDB 6 .4 Version passed the extremely harsh Jepsen Distributed security testing . by comparison , I was right before NoSQL database Riak The tests carried out show that , Even with the strongest one Sex setting , Writing will also drop 30-70%. meanwhile , When using lightweight transactions ,Cas- sandra At the most 5% Writing .

09 MemSQL

And VoltDB The same thing ,MemSQL It is a horizontally extended memory distributed database , Designed for fast data acquisition and real-time analysis . in addition , It can run locally , It can also run on the cloud , And it can automatically partition between different nodes , At every CPU Parallel execution of queries on the core .  Insert picture description here

▲ Processing resource consumption of relational database Despite the VoltDB There are many similarities , But the figure above shows an important difference .MemSQL Try to find a balance between the conflicting requirements of real-time transaction and data warehouse historical data processing . So ,MemSQL Store in rows (row store) To store data in memory , And use column oriented disk storage as backup , So it's going to be real-time ( lately ) Data is combined with historical results . This makes it in OLTP And data warehouse (Data Warehouse) The field has gained a solid position , Although both solutions are aimed at the real-time data acquisition and analysis market .

10 Which applications need NewSQL technology ?

The acquisition speed and response speed are required to be very fast ( Average 1-2 millisecond ), Simultaneous requirements ACID Any application that guarantees the accuracy of the transaction provided — For example, customer billing . Typical applications include :

  • Real time authorization — for example , Verify for analysis and billing 、 Recording and authorizing mobile phone calls . Usually ,99 .999% All database operations must be in 50 Complete in milliseconds .
  • Real time fraud detection — Used to perform complex analysis queries , Before the transaction is authorized , Accurately determine the possibility of fraud .
  • Game Analysis — It is used according to the player's ability and the player's typical behavior , Real time dynamic modification of game difficulty . The goal is to keep existing players , And turning free customers into paid players . At speed 、 In the case of high availability and accuracy requirements , By using these means, a customer , Increased player spending on games 40%.
  • Individualization Web advertisement — Real time dynamic selection based on Web Personalized advertising , Record ad presentation events for billing purposes , At the same time, the advertising results are recorded for subsequent analysis .

With the vast majority OLTP Application comparison , None of this looks impressive at first , But every week 7 God 、 Every day 24 The world of the hour Internet , These provide new frontiers for real-time analysis , And with the rise of the Internet of things , It also brings great opportunities .

11 Conclusion

although Hadoop More closely related to big data , And it's got a lot of attention lately , But database technology is anything IT The cornerstone of the system . Similarly ,NoSQL Database provides a fast alternative to relational databases 、 Scalable options Choose , But despite the temptation to license free open source databases , In fact, it's still a dime a coin . in addition , just as VoltDB As shown , In fact, in the long run , Maybe it's better than NoSQL Class selection is cheaper . On the whole , If there is Web scale 、OLTP and ( or ) Requirements for real-time analysis , You need to think about it NewSQL Class database .

If you are right about VoltDB Industrial Internet of things big data low latency solution 、 Real time data platform management in the whole life cycle , Welcome private message , Enter our official communication group .


Scroll to Top