大牛们看过来：system design讨论 - JobHunting版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

JobHunting版 - 大牛们看过来：system design讨论

相关主题
● 非常常见的面试题：数据太多，用MySQL查询太慢该怎么办？	● 初学者求前辈指导SQL学习资料
● [转]Alibaba全球招华人技术牛人！！！ (转载)	● Q：实现上一个，下一个，产品功能
● 请教Amazon选组	● 从就业市场需求看，数据库学哪一个？谢谢
● ［内推］HERE map 西雅图/芝加哥office 高薪急召多个核心研发	● 有公司用Oracle吗
● 这里CS的大侠多,我想请教一个DATABASE的问题	● Job Opening: Windows Software Engineer
● 零起点Oracle/MySQL/Weblogic培训，可以帮助中文专业的女生拿到9万美金的专业工作！	● My Microsoft Phone Interview
● Java必然败在oracle身上 (转载)	● 为什么FACEBOOK 的面试都是问算法呢？而不是PHP MYSQL?
● DBA position (Oracle/MySQL/Mongo) in Los Angeles, CA (转载)	● 请教：哪里去找 IT Telecommute 工作呢？

相关话题的讨论汇总
话题: oracle话题: service话题: db话题: mysql话题: entities

进入JobHunting版参与讨论

(共1页)

b********r
发帖数: 620

我现在准备上的一个项目，没有什么头绪，或者说是头绪太多。恳请大牛斧正和指教。
现在开发环境是C/C++,Oracle.下面用英语，描述起来方便些。
there are 2 services, service_a and service_b.
service_a majorly does calculation for the given entity. after that if the
calculated result is different to what's persisted in Oracle DB, then update
the corresponding rows for the given entity in Oracle DB. entities are
received continuously from some queuing services.
service_b majorly does read from Oracle DB periodically, let us say every 4
hours. it basically reads all updated entities done by service_a
in the past 4 hours, then outputs them in a certain text file format to
Amazon's S3 for storage.
a little bit more background, in the past (before 2012) service_b
read all entities, not just updated ones by service_a in the past 4 hours to
create a complete snapshot of all entities for various clients. then it
turned out Oracle could not handle that much concurrent write and reads at
the same time, or degraded too much, so right before I joined this small
factory someone changed the logic to have service_b only read delta (updated
entities) in the past 4 hours. they created another process to merge the
delta into base to create the final snapshot somewhere else.
now, manager wants to move even further to get rid of Oracle completely,
maybe because Oracle is too costly to maintain. we are looking if there are
some open-sourced Sql/no-sql DB such as mysql, cassandra, dynamodb, etc.,
which can replace Oracle, such that both service_a and service_b can work
without interruption.
furthermore, it's even better if, after Oracle is replaced, service_b can
read directly from the new DB for ALL entities, not just UPDATED entities,
to create complete snapshot without going through the current steps of
creating delta, merging with baseline, etc. basically service_b goes back to
the old time (before 2012) when it used to work.
any good suggestions for the DB candidate? if it has very good / quick way
to create current DB replica, that would be very nice. the new DB should
also support high data consistency and availability. even if the consistency
might not be matchable to Oracle we hope it can get as close as possible.
currently we are considering the following candidates: cassandra, dynamodb,
mysql. are there other candidates?

b********r
发帖数: 620

自己顶一下。给大牛们跪了。

update
4

【在 b********r 的大作中提到】

: 我现在准备上的一个项目，没有什么头绪，或者说是头绪太多。恳请大牛斧正和指教。
: 现在开发环境是C/C++,Oracle.下面用英语，描述起来方便些。
: there are 2 services, service_a and service_b.
: service_a majorly does calculation for the given entity. after that if the
: calculated result is different to what's persisted in Oracle DB, then update
: the corresponding rows for the given entity in Oracle DB. entities are
: received continuously from some queuing services.
: service_b majorly does read from Oracle DB periodically, let us say every 4
: hours. it basically reads all updated entities done by service_a
: in the past 4 hours, then outputs them in a certain text file format to

p*****2
发帖数: 21240

update
4
size多大？
throughput，latency什么要求？

【在 b********r 的大作中提到】

b********r
发帖数: 620

size不小。整个的database complete snapshot用BerkeleyDB格式可能是1T左右，当然
每次的updated entities没那么多，在1~100M左右。2012以前，是每次都从Oracle读整
个的当前snapshot，对系统的影响比较大。后来就不从Oracle直接读，建立了service_
b来work around。当然这时数据stale有可能出现，但目前这个问题还不用太关心。
latency希望不要差Oracle太多，1,2倍问题不大。差个数量级是不行的。

【在 p*****2 的大作中提到】

:
: update
: 4
: size多大？
: throughput，latency什么要求？

g*****g
发帖数: 34805

MySQL with a readonly replica. Front your DB with Memcached.

update
4

【在 b********r 的大作中提到】

p*****2
发帖数: 21240

他这个用cassandra也可以吧

【在 g*****g 的大作中提到】

: MySQL with a readonly replica. Front your DB with Memcached.
:
: update
: 4

s**x
发帖数: 7506

are data for different client totally independent ?
If so, simply shard the database based on client id.
You can build your own sharding map. Oracle to MySQl should be easy
conversion.
Sharding or partitioning is the key for many distributed systems.

g*****g
发帖数: 34805

从 Oracle来的，不要那么激进。从实现到工具，转 MySQL都容易得多。就他的case也
够用了。

【在 p*****2 的大作中提到】

: 他这个用cassandra也可以吧

m*****k
发帖数: 731

很好奇你们公司start service b 的时候为何没考虑DB master slave replica, if
oracle is too expensive for that on cost, switch to mysql immediately,
instead of re-invent the wheel by creating the delta + merge
(很可能service A 写数据库的code是oracle depended, nobody wanted to change. )

j**********3
发帖数: 3211

mark

相关主题
● 零起点Oracle/MySQL/Weblogic培训，可以帮助中文专业的女生拿到9万美金的专业工作！	● 初学者求前辈指导SQL学习资料
● Java必然败在oracle身上 (转载)	● Q：实现上一个，下一个，产品功能
● DBA position (Oracle/MySQL/Mongo) in Los Angeles, CA (转载)	● 从就业市场需求看，数据库学哪一个？谢谢
进入JobHunting版参与讨论

p*****2
发帖数: 21240

他说的不是replication 是 backup

)

【在 m*****k 的大作中提到】

: 很好奇你们公司start service b 的时候为何没考虑DB master slave replica, if
: oracle is too expensive for that on cost, switch to mysql immediately,
: instead of re-invent the wheel by creating the delta + merge
: (很可能service A 写数据库的code是oracle depended, nobody wanted to change. )

m*****k
发帖数: 731

他说了backup吗？
好虫也说的是replication for service b to read from 吧

【在 p*****2 的大作中提到】

: 他说的不是replication 是 backup
:
: )

p*****2
发帖数: 21240

oracle不支持replication吗？我在仔细读读。

【在 m*****k 的大作中提到】

: 他说了backup吗？
: 好虫也说的是replication for service b to read from 吧

p*****2
发帖数: 21240

听起来像是backup呀？不是存到S3吗？

【在 p*****2 的大作中提到】

:
: oracle不支持replication吗？我在仔细读读。

m*****k
发帖数: 731

看到s3了，
这么说来他们的servcie b + merge process 就是要把oracle 存个snapshot到s3，的
确像是个backup，但这之前因该有不同的clients 用这个数据，他提到了
“a little bit more background, in the past (before 2012) service_b
read all entities, not just updated ones by service_a in the past 4 hours to
create a complete snapshot of all entities for various clients.”
http://serverfault.com/questions/33760/oracle-real-time-databas

b********r
发帖数: 620

太感激大牛们！！各位的意见很有启发！
不好意思，有的地方我说的不是很清楚。这里的snapshot主要是for replica,not for
backup.我不知道具体的原因为什么当时没有用oracle replica，可能是由于钱的原因
，也可能是应为不想create replica的时候影响到oracle的读写。人已经离开，没法去
问了。
现在mysql也是oracle所有，我们有点担心oracle会做些手脚，让Mysql在性能上落后
oracle越来越远，否则谁还会去用oracle?
1）在什么情况下mysql的performance会相比oracle有明显恶化？比如，table has
more than 1 m rows.
2) how often we can ask mysql to create readonly replica without impacting
its regular write/read performance? hourly, every 10 minutes, every 1 minute?
3) 假设我们相信oracle的性能在任何情况下（用同样的cache，index, partition等）
都比mysql要好，我们如何说服人换到mysql是个准确的选择（出去费用上的考虑）
4）大牛们又没有做过oracle vs mysql side by side comparison，但相同的Load下？
或者我们只能用数量取胜，比如oracle我们只有2个instances,但Mysql我们可以用4/8
，反正是很便宜？

to

【在 m*****k 的大作中提到】

: 看到s3了，
: 这么说来他们的servcie b + merge process 就是要把oracle 存个snapshot到s3，的
: 确像是个backup，但这之前因该有不同的clients 用这个数据，他提到了
: “a little bit more background, in the past (before 2012) service_b
: read all entities, not just updated ones by service_a in the past 4 hours to
: create a complete snapshot of all entities for various clients.”
: http://serverfault.com/questions/33760/oracle-real-time-databas

g*****g
发帖数: 34805

read replica的做法都是基于commit log asynchronously replay, 可能会有几秒延迟
，对性能不会有影响。
几个M记录的表对MySQL不算什么。性能通常都是由架构和设计决定的，Oracle和MySQL
的性能差异极其有限，如果不是MySQL更快的话。

for

【在 b********r 的大作中提到】

: 太感激大牛们！！各位的意见很有启发！
: 不好意思，有的地方我说的不是很清楚。这里的snapshot主要是for replica,not for
: backup.我不知道具体的原因为什么当时没有用oracle replica，可能是由于钱的原因
: ，也可能是应为不想create replica的时候影响到oracle的读写。人已经离开，没法去
: 问了。
: 现在mysql也是oracle所有，我们有点担心oracle会做些手脚，让Mysql在性能上落后
: oracle越来越远，否则谁还会去用oracle?
: 1）在什么情况下mysql的performance会相比oracle有明显恶化？比如，table has
: more than 1 m rows.
: 2) how often we can ask mysql to create readonly replica without impacting

c******3
发帖数: 296

S3和oracle差了4小时，不象replica，更象是给第3方用的？
写个service-c吧。service-a把数据写入oracle之后立即（async)call service-c把同
样的数据写入S3。如果service-c fail，service-c就把同样数据从oracle中再读出来
写入S3。
service-b可以退休了。
换成MySQL也一样，如果MySQL to MySQL replica 太慢。service-c写入MySQL。

b********r
发帖数: 620

相关问题，有大牛用过mariadb没有？相比mysql如何？

【在 c******3 的大作中提到】

: S3和oracle差了4小时，不象replica，更象是给第3方用的？
: 写个service-c吧。service-a把数据写入oracle之后立即（async)call service-c把同
: 样的数据写入S3。如果service-c fail，service-c就把同样数据从oracle中再读出来
: 写入S3。
: service-b可以退休了。
: 换成MySQL也一样，如果MySQL to MySQL replica 太慢。service-c写入MySQL。

(共1页)

进入JobHunting版参与讨论

相关主题
● 请教：哪里去找 IT Telecommute 工作呢？	● 这里CS的大侠多,我想请教一个DATABASE的问题
● 大家帮我看看我的背景找工作有戏吗？	● 零起点Oracle/MySQL/Weblogic培训，可以帮助中文专业的女生拿到9万美金的专业工作！
● 大家对Facebook的opening怎么看？	● Java必然败在oracle身上 (转载)
● job openning-Test and Quality Engineer,MySQL Database Administrator	● DBA position (Oracle/MySQL/Mongo) in Los Angeles, CA (转载)
● 非常常见的面试题：数据太多，用MySQL查询太慢该怎么办？	● 初学者求前辈指导SQL学习资料
● [转]Alibaba全球招华人技术牛人！！！ (转载)	● Q：实现上一个，下一个，产品功能
● 请教Amazon选组	● 从就业市场需求看，数据库学哪一个？谢谢
● ［内推］HERE map 西雅图/芝加哥office 高薪急召多个核心研发	● 有公司用Oracle吗

相关话题的讨论汇总
话题: oracle话题: service话题: db话题: mysql话题: entities

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天