b********r 发帖数: 620 | 1 我现在准备上的一个项目,没有什么头绪,或者说是头绪太多。恳请大牛斧正和指教。
现在开发环境是C/C++,Oracle.下面用英语,描述起来方便些。
there are 2 services, service_a and service_b.
service_a majorly does calculation for the given entity. after that if the
calculated result is different to what's persisted in Oracle DB, then update
the corresponding rows for the given entity in Oracle DB. entities are
received continuously from some queuing services.
service_b majorly does read from Oracle DB periodically, let us say every 4
hours. it basically reads all updated entities done by service_a
in the past 4 hours, then outputs them in a certain text file format to
Amazon's S3 for storage.
a little bit more background, in the past (before 2012) service_b
read all entities, not just updated ones by service_a in the past 4 hours to
create a complete snapshot of all entities for various clients. then it
turned out Oracle could not handle that much concurrent write and reads at
the same time, or degraded too much, so right before I joined this small
factory someone changed the logic to have service_b only read delta (updated
entities) in the past 4 hours. they created another process to merge the
delta into base to create the final snapshot somewhere else.
now, manager wants to move even further to get rid of Oracle completely,
maybe because Oracle is too costly to maintain. we are looking if there are
some open-sourced Sql/no-sql DB such as mysql, cassandra, dynamodb, etc.,
which can replace Oracle, such that both service_a and service_b can work
without interruption.
furthermore, it's even better if, after Oracle is replaced, service_b can
read directly from the new DB for ALL entities, not just UPDATED entities,
to create complete snapshot without going through the current steps of
creating delta, merging with baseline, etc. basically service_b goes back to
the old time (before 2012) when it used to work.
any good suggestions for the DB candidate? if it has very good / quick way
to create current DB replica, that would be very nice. the new DB should
also support high data consistency and availability. even if the consistency
might not be matchable to Oracle we hope it can get as close as possible.
currently we are considering the following candidates: cassandra, dynamodb,
mysql. are there other candidates? |
b********r 发帖数: 620 | 2 自己顶一下。给大牛们跪了。
update
4
【在 b********r 的大作中提到】 : 我现在准备上的一个项目,没有什么头绪,或者说是头绪太多。恳请大牛斧正和指教。 : 现在开发环境是C/C++,Oracle.下面用英语,描述起来方便些。 : there are 2 services, service_a and service_b. : service_a majorly does calculation for the given entity. after that if the : calculated result is different to what's persisted in Oracle DB, then update : the corresponding rows for the given entity in Oracle DB. entities are : received continuously from some queuing services. : service_b majorly does read from Oracle DB periodically, let us say every 4 : hours. it basically reads all updated entities done by service_a : in the past 4 hours, then outputs them in a certain text file format to
|
p*****2 发帖数: 21240 | 3
update
4
size多大?
throughput,latency什么要求?
【在 b********r 的大作中提到】 : 我现在准备上的一个项目,没有什么头绪,或者说是头绪太多。恳请大牛斧正和指教。 : 现在开发环境是C/C++,Oracle.下面用英语,描述起来方便些。 : there are 2 services, service_a and service_b. : service_a majorly does calculation for the given entity. after that if the : calculated result is different to what's persisted in Oracle DB, then update : the corresponding rows for the given entity in Oracle DB. entities are : received continuously from some queuing services. : service_b majorly does read from Oracle DB periodically, let us say every 4 : hours. it basically reads all updated entities done by service_a : in the past 4 hours, then outputs them in a certain text file format to
|
b********r 发帖数: 620 | 4 size不小。整个的database complete snapshot用BerkeleyDB格式可能是1T左右,当然
每次的updated entities没那么多,在1~100M左右。2012以前,是每次都从Oracle读整
个的当前snapshot,对系统的影响比较大。后来就不从Oracle直接读,建立了service_
b来work around。当然这时数据stale有可能出现,但目前这个问题还不用太关心。
latency希望不要差Oracle太多,1,2倍问题不大。差个数量级是不行的。
【在 p*****2 的大作中提到】 : : update : 4 : size多大? : throughput,latency什么要求?
|
g*****g 发帖数: 34805 | 5 MySQL with a readonly replica. Front your DB with Memcached.
update
4
【在 b********r 的大作中提到】 : 我现在准备上的一个项目,没有什么头绪,或者说是头绪太多。恳请大牛斧正和指教。 : 现在开发环境是C/C++,Oracle.下面用英语,描述起来方便些。 : there are 2 services, service_a and service_b. : service_a majorly does calculation for the given entity. after that if the : calculated result is different to what's persisted in Oracle DB, then update : the corresponding rows for the given entity in Oracle DB. entities are : received continuously from some queuing services. : service_b majorly does read from Oracle DB periodically, let us say every 4 : hours. it basically reads all updated entities done by service_a : in the past 4 hours, then outputs them in a certain text file format to
|
p*****2 发帖数: 21240 | 6 他这个用cassandra也可以吧
【在 g*****g 的大作中提到】 : MySQL with a readonly replica. Front your DB with Memcached. : : update : 4
|
s**x 发帖数: 7506 | 7 are data for different client totally independent ?
If so, simply shard the database based on client id.
You can build your own sharding map. Oracle to MySQl should be easy
conversion.
Sharding or partitioning is the key for many distributed systems. |
g*****g 发帖数: 34805 | 8 从 Oracle来的,不要那么激进。从实现到工具,转 MySQL都容易得多。就他的case也
够用了。
【在 p*****2 的大作中提到】 : 他这个用cassandra也可以吧
|
m*****k 发帖数: 731 | 9 很好奇你们公司start service b 的时候为何没考虑DB master slave replica, if
oracle is too expensive for that on cost, switch to mysql immediately,
instead of re-invent the wheel by creating the delta + merge
(很可能service A 写数据库的code是oracle depended, nobody wanted to change. ) |
j**********3 发帖数: 3211 | |
|
|
p*****2 发帖数: 21240 | 11 他说的不是replication 是 backup
)
【在 m*****k 的大作中提到】 : 很好奇你们公司start service b 的时候为何没考虑DB master slave replica, if : oracle is too expensive for that on cost, switch to mysql immediately, : instead of re-invent the wheel by creating the delta + merge : (很可能service A 写数据库的code是oracle depended, nobody wanted to change. )
|
m*****k 发帖数: 731 | 12 他说了backup吗?
好虫也说的是replication for service b to read from 吧
【在 p*****2 的大作中提到】 : 他说的不是replication 是 backup : : )
|
p*****2 发帖数: 21240 | 13
oracle不支持replication吗?我在仔细读读。
【在 m*****k 的大作中提到】 : 他说了backup吗? : 好虫也说的是replication for service b to read from 吧
|
p*****2 发帖数: 21240 | 14
听起来像是backup呀?不是存到S3吗?
【在 p*****2 的大作中提到】 : : oracle不支持replication吗?我在仔细读读。
|
m*****k 发帖数: 731 | 15 看到s3了,
这么说来他们的servcie b + merge process 就是要把oracle 存个snapshot到s3,的
确像是个backup,但这之前因该有不同的clients 用这个数据,他提到了
“a little bit more background, in the past (before 2012) service_b
read all entities, not just updated ones by service_a in the past 4 hours to
create a complete snapshot of all entities for various clients.”
http://serverfault.com/questions/33760/oracle-real-time-databas |
b********r 发帖数: 620 | 16 太感激大牛们!!各位的意见很有启发!
不好意思,有的地方我说的不是很清楚。这里的snapshot主要是for replica,not for
backup.我不知道具体的原因为什么当时没有用oracle replica,可能是由于钱的原因
,也可能是应为不想create replica的时候影响到oracle的读写。人已经离开,没法去
问了。
现在mysql也是oracle所有,我们有点担心oracle会做些手脚,让Mysql在性能上落后
oracle越来越远,否则谁还会去用oracle?
1)在什么情况下mysql的performance会相比oracle有明显恶化?比如,table has
more than 1 m rows.
2) how often we can ask mysql to create readonly replica without impacting
its regular write/read performance? hourly, every 10 minutes, every 1 minute?
3) 假设我们相信oracle的性能在任何情况下(用同样的cache,index, partition等)
都比mysql要好,我们如何说服人换到mysql是个准确的选择(出去费用上的考虑)
4)大牛们又没有做过oracle vs mysql side by side comparison,但相同的Load下?
或者我们只能用数量取胜,比如oracle我们只有2个instances,但Mysql我们可以用4/8
,反正是很便宜?
to
【在 m*****k 的大作中提到】 : 看到s3了, : 这么说来他们的servcie b + merge process 就是要把oracle 存个snapshot到s3,的 : 确像是个backup,但这之前因该有不同的clients 用这个数据,他提到了 : “a little bit more background, in the past (before 2012) service_b : read all entities, not just updated ones by service_a in the past 4 hours to : create a complete snapshot of all entities for various clients.” : http://serverfault.com/questions/33760/oracle-real-time-databas
|
g*****g 发帖数: 34805 | 17 read replica的做法都是基于commit log asynchronously replay, 可能会有几秒延迟
,对性能不会有影响。
几个M记录的表对MySQL不算什么。性能通常都是由架构和设计决定的,Oracle和MySQL
的性能差异极其有限,如果不是MySQL更快的话。
for
【在 b********r 的大作中提到】 : 太感激大牛们!!各位的意见很有启发! : 不好意思,有的地方我说的不是很清楚。这里的snapshot主要是for replica,not for : backup.我不知道具体的原因为什么当时没有用oracle replica,可能是由于钱的原因 : ,也可能是应为不想create replica的时候影响到oracle的读写。人已经离开,没法去 : 问了。 : 现在mysql也是oracle所有,我们有点担心oracle会做些手脚,让Mysql在性能上落后 : oracle越来越远,否则谁还会去用oracle? : 1)在什么情况下mysql的performance会相比oracle有明显恶化?比如,table has : more than 1 m rows. : 2) how often we can ask mysql to create readonly replica without impacting
|
c******3 发帖数: 296 | 18 S3和oracle差了4小时,不象replica,更象是给第3方用的?
写个service-c吧。service-a把数据写入oracle之后立即(async)call service-c把同
样的数据写入S3。如果service-c fail,service-c就把同样数据从oracle中再读出来
写入S3。
service-b可以退休了。
换成MySQL也一样,如果MySQL to MySQL replica 太慢。service-c写入MySQL。 |
b********r 发帖数: 620 | 19 相关问题,有大牛用过mariadb没有?相比mysql如何?
【在 c******3 的大作中提到】 : S3和oracle差了4小时,不象replica,更象是给第3方用的? : 写个service-c吧。service-a把数据写入oracle之后立即(async)call service-c把同 : 样的数据写入S3。如果service-c fail,service-c就把同样数据从oracle中再读出来 : 写入S3。 : service-b可以退休了。 : 换成MySQL也一样,如果MySQL to MySQL replica 太慢。service-c写入MySQL。
|