再来继续比较，芒果和redis各什么时候用比较好？ - JobHunting版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

JobHunting版 - 再来继续比较，芒果和redis各什么时候用比较好？

相关主题
● 求牛人解答一个Amazon 设计问题	● [提供内推] Senior DBA （SFO市区, mysql, cassandra, redis, h (转载)
● FB设计题求教。	● 问一个设计题
● 脸家设计题，设计游戏排名系统	● 多个数据中心保持数据一致
● 内推苹果itunes部门	● Pinterest陶涛：三个教训和三个发展选择 (转载)
● G家店面design题目	● 我的System Design总结
● 问个snapchat的设计题	● 来道A设计题大家头脑风暴一下
● interview design question: how to design a high through put queue system	● 秒杀设计题
● fb设计题	● FYI, 做kafka的startup confluent刚成立

相关话题的讨论汇总
话题: cap话题: partition话题: kv

进入JobHunting版参与讨论

(共1页)

j**********3
发帖数: 3211

mongodb: document oriented, json，disk
redis, key-value pair, in memory
是不是说，芒果可以存很大的数据量，redis 因为是in memory，得量小点才行？
好像大家都不太喜欢芒果，不稳定, 那啥时候用mongo呢？
一起来比较一些其他的数据库？
菜鸟弱问，大牛们轻拍, 请各抒己见 :)

p*****2
发帖数: 21240

write heavy, very low latency, probably have to use redis

j**********3
发帖数: 3211

二爷，还是你好！
有个问题：redis既然是in memory，如果大量数据，memory不就装不下了么？
还有，什么时候redis不好？

【在 p*****2 的大作中提到】

: write heavy, very low latency, probably have to use redis

t*********r
发帖数: 387

芒果就是个傻逼玩具
什么情况都不能用，除非你不介意丢数据

G*****m
发帖数: 5395

这俩在G里相当于啥？

【在 j**********3 的大作中提到】

: mongodb: document oriented, json，disk
: redis, key-value pair, in memory
: 是不是说，芒果可以存很大的数据量，redis 因为是in memory，得量小点才行？
: 好像大家都不太喜欢芒果，不稳定, 那啥时候用mongo呢？
: 一起来比较一些其他的数据库？
: 菜鸟弱问，大牛们轻拍, 请各抒己见 :)

w****r
发帖数: 15252

Redis在内存里面，一旦断电啥的，都没有了，你再重新load进memory也不是要老半天。
现在内存很便宜了，所以容量可以不考虑，64GB的内存，你占个一半来做也可以

p*****2
发帖数: 21240

可以做sharding
redis不是scalable的这是最大的问题

【在 j**********3 的大作中提到】

: 二爷，还是你好！
: 有个问题：redis既然是in memory，如果大量数据，memory不就装不下了么？
: 还有，什么时候redis不好？

j**********3
发帖数: 3211

上边都在说芒果丢数据，这是不是太可怕了?

【在 p*****2 的大作中提到】

: 可以做sharding
: redis不是scalable的这是最大的问题

a*****u
发帖数: 1712

redis可以shard的，一台装不下了就sharding。

二爷，还是你好！有个问题：redis既然是in memory，如果大量数据，memory不就装不
下了么？还有，什么时候redis不好？

【在 j**********3 的大作中提到】

: 上边都在说芒果丢数据，这是不是太可怕了?

p*****2
发帖数: 21240

不知道怎么来的
我用了两年没发生过
mongo nosql排名第一呀

【在 j**********3 的大作中提到】

: 上边都在说芒果丢数据，这是不是太可怕了?

相关主题
● 问个snapchat的设计题	● [提供内推] Senior DBA （SFO市区, mysql, cassandra, redis, h (转载)
● interview design question: how to design a high through put queue system	● 问一个设计题
● fb设计题	● 多个数据中心保持数据一致
进入JobHunting版参与讨论

b*****n
发帖数: 618

个人意见。
芒果更像数据库，redis一般做cache不会直接当数据库用。
不过一般scale到一定程度芒果比较多用来做index。
芒果dynamic schema，可以支持更加灵活的query，用起来比较方便，这个是为什么用
的人多的原因。
如果只是kv store，redis速度更快，不过redis也可以选择persistent存数据到disk。
除非真的非常需要dynamic schema，否则芒果比较鸡肋，对于一般的use case sharded
mysql就够用了，如果需要cache就加一层redis或者memcache

j**********3
发帖数: 3211

谢谢大牛！

sharded

【在 b*****n 的大作中提到】

: 个人意见。
: 芒果更像数据库，redis一般做cache不会直接当数据库用。
: 不过一般scale到一定程度芒果比较多用来做index。
: 芒果dynamic schema，可以支持更加灵活的query，用起来比较方便，这个是为什么用
: 的人多的原因。
: 如果只是kv store，redis速度更快，不过redis也可以选择persistent存数据到disk。
: 除非真的非常需要dynamic schema，否则芒果比较鸡肋，对于一般的use case sharded
: mysql就够用了，如果需要cache就加一层redis或者memcache

t*********r
发帖数: 387

db最基本的要求就是strong consistency，在这方面个芒果default是eventual
consistency。说白了，就是说没consistency.
一般来说，DB的要求是data durability -- 如果commit过了即使机子挂了数据应该在
另一个replica上,如果没有replica好歹也该在本地硬盘上flush to disk
我不知道现在怎么样了，但很多版本之前芒果远程replica没有做，甚至连本机都没
flush到硬盘就给客户端发acknowledgement
这种玩意你敢用？
其实芒果的确可以改很多config/write requirements来达到这些要求，但那种设置一
般来说直接整个系统就龟速了
一个号称DB的产品居然连out of box连最基本DB的要求都达不到
呵呵，呵呵

p*****2
发帖数: 21240

现在NOSQL在CAP的取舍上一般是满足AP。你要C的话上就上SQL呀。Mongo本来也不是要
代替SQL的。

【在 t*********r 的大作中提到】

: db最基本的要求就是strong consistency，在这方面个芒果default是eventual
: consistency。说白了，就是说没consistency.
: 一般来说，DB的要求是data durability -- 如果commit过了即使机子挂了数据应该在
: 另一个replica上,如果没有replica好歹也该在本地硬盘上flush to disk
: 我不知道现在怎么样了，但很多版本之前芒果远程replica没有做，甚至连本机都没
: flush到硬盘就给客户端发acknowledgement
: 这种玩意你敢用？
: 其实芒果的确可以改很多config/write requirements来达到这些要求，但那种设置一
: 般来说直接整个系统就龟速了
: 一个号称DB的产品居然连out of box连最基本DB的要求都达不到

t*********r
发帖数: 387

来来来
跟我老解释一下啥叫CAP

【在 p*****2 的大作中提到】

:
: 现在NOSQL在CAP的取舍上一般是满足AP。你要C的话上就上SQL呀。Mongo本来也不是要
: 代替SQL的。

r****c
发帖数: 2585

Consistency
Availability
Partition Tolerance
P 基本都可以

b*****n
发帖数: 618

有多少use case真的需要这么强的consistency?
eventually consistent一般情况下足够好了，反正总要tradeoff
另外remote replica要保证flush to disk是那么简单的一件事情么。。
这方面的各种讨论不要太多。
如果按照你的说法很多所谓强C的系统都达不到要求
很多情况都不会选择把flush到disk才算真正写到persistent storage里面，否则性能
达不到要求。
估计现在这些DB产品里面你能看得上的可能也就spanner，可惜狗家外面没有相匹配的。

【在 t*********r 的大作中提到】

t*********r
发帖数: 387

不等你回复了，我老直接开喷吧
常人对CAP的理解无非是这样的：CAP里面三挑二，你丫的不能搭一个又consistent, 又
available，又能partition tolerant的系统。
要是问为啥，有些人会说人家伯克利叫兽发呸破那样说的，还有人说MIT苦逼僧发呸破
证明的
其实都是忽悠人的。骗术很简单：他们骗你说，你有两台机子。如果容忍partition,那
么他们要么consistent but not available （not service request while
partitioned)，要么available but not consistent（service request while
partitioned, but with no guarantee of serializability)。这个是原呸破“证明
CAP”的思路 (感兴趣的同学可以看这个：http://webpages.cs.luc.edu/~pld/353/gilbert_lynch_brewer_proof.pdf）
其实这个“证明”偷换了一个概念：仅仅因为一个partition导致整个分布系统一部分
不能继续处理request,并不代表整个系统不能运行。如果有partition,只要有一个
partition能够证明自己是majority,整个系统可以抛弃少数的partition而更新整个系
统的membership set。这样，即使在partition的情况下，consistency和availibility
还是可以满足。具体实现可以用request redirect或者client proxy来解决
把CA和P放到一起本来就是个伪概念。CA是一个可以明确定义的系统的属性，partition
则不是。但是我老在此明确说明一点：在有partition的情况下，一个分布式系统的总
体可以达到CA
前几年是个人就能写一些垃圾NOSQL KV store, 都尼玛被一群不懂分布系统的人泛滥了
。搞得现在码农都以为这些不合格的垃圾系统都是最好的可能了

【在 p*****2 的大作中提到】

:
: 现在NOSQL在CAP的取舍上一般是满足AP。你要C的话上就上SQL呀。Mongo本来也不是要
: 代替SQL的。

t*********r
发帖数: 387

不要拿use case来搪塞eventual consistency,把缺陷说成是tradeoff
remote server flush to disk之后再给原server发acknowledgement很难么？现在市场
上很多KV store本来就是垃圾

的。

【在 b*****n 的大作中提到】

: 有多少use case真的需要这么强的consistency?
: eventually consistent一般情况下足够好了，反正总要tradeoff
: 另外remote replica要保证flush to disk是那么简单的一件事情么。。
: 这方面的各种讨论不要太多。
: 如果按照你的说法很多所谓强C的系统都达不到要求
: 很多情况都不会选择把flush到disk才算真正写到persistent storage里面，否则性能
: 达不到要求。
: 估计现在这些DB产品里面你能看得上的可能也就spanner，可惜狗家外面没有相匹配的。

r****c
发帖数: 2585

Consistency
Availability
Partition Tolerance
P 基本都可以

相关主题
● Pinterest陶涛：三个教训和三个发展选择 (转载)	● 秒杀设计题
● 我的System Design总结	● FYI, 做kafka的startup confluent刚成立
● 来道A设计题大家头脑风暴一下	● 老年马工赶快去 fb
进入JobHunting版参与讨论

t*********r
发帖数: 387

其实要说remote一定flush to disk倒也不一定，但好歹确定majority发送了
acknowledgement才能满足一个合格的KV store。
像mongo这种没收到acknowledgement就给client发acknowledgement,呵呵，呵呵

b*********n
发帖数: 26

呵呵，小愤青

g*****g
发帖数: 34805

Kafka does exactly that. But you can't avoid data loss during partition, in
other words, availability is sacrificed. There's no silver bullet.

【在 t*********r 的大作中提到】

: 不等你回复了，我老直接开喷吧
: 常人对CAP的理解无非是这样的：CAP里面三挑二，你丫的不能搭一个又consistent, 又
: available，又能partition tolerant的系统。
: 要是问为啥，有些人会说人家伯克利叫兽发呸破那样说的，还有人说MIT苦逼僧发呸破
: 证明的
: 其实都是忽悠人的。骗术很简单：他们骗你说，你有两台机子。如果容忍partition,那
: 么他们要么consistent but not available （not service request while
: partitioned)，要么available but not consistent（service request while
: partitioned, but with no guarantee of serializability)。这个是原呸破“证明
: CAP”的思路 (感兴趣的同学可以看这个：http://webpages.cs.luc.edu/~pld/353/gilbert_lynch_brewer_proof.pdf）

t*********r
发帖数: 387

> But you can't avoid data loss during partition
Example? I'm not convinced.

in

【在 g*****g 的大作中提到】

: Kafka does exactly that. But you can't avoid data loss during partition, in
: other words, availability is sacrificed. There's no silver bullet.

b*****n
发帖数: 618

This is a well known issue for Kafka
基本每个用Kafka的公司都会遇到，Jay自己的一篇文章讨论这个问题：
http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafk

【在 t*********r 的大作中提到】

: > But you can't avoid data loss during partition
: Example? I'm not convinced.
:
: in

t*********r
发帖数: 387

不说明道理，贴个帖子有意思么？kakfa就是王道真理？F家前一段还发呸破鄙视了某家
的kafka
你自己发的东西里面也写了：
> there is no correct algorithm for guaranteeing consistency in the face of
f failures with fewer than 2f+1 servers
反之，there are correct algorithms for guaranteeing consistency (with
availability) in the face of f failures with 2f+1 or more servers, e.g. some
partition with a majority.
说到某家，我一学长之前看某家的人写blog吹kafka多快，还叫大家下载核实。结果这
位学长下了一个benchmark script一看，呵呵
某家的script量的是async send有多快，而不是真正量的ascknowledgement的速度
学术/工业界大忽悠到处都是，做不出来好的系统推卸责任说不可能
呵呵，呵呵

【在 b*****n 的大作中提到】

: This is a well known issue for Kafka
: 基本每个用Kafka的公司都会遇到，Jay自己的一篇文章讨论这个问题：
: http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafk

g*****g
发帖数: 34805

Maybe you should listen to the creator, eh?
By design, committed messages are always preserved during leadership change
whereas some uncommitted data could be lost. The leader and the ISR for each
partition are also stored in Zookeeper and are used during the failover of
the controller. Both the leader and the ISR are expected to change
infrequently since failures are rare.

【在 t*********r 的大作中提到】

: > But you can't avoid data loss during partition
: Example? I'm not convinced.
:
: in

b*****n
发帖数: 618

Kafka不是王道，这个是为了解释partition的时候出现master election就会出现data
loss，我不知道你怎么看的这篇文章，Jay要解释的是这种情况无法避免，唯一解决的
方法就是要么放弃A，要么就忍这个data loss。
我也不知道你对这些系统有没有真的用过体验过，你的意思是你凭空就能搞一个牛逼的
系统出来，还是说你比Jay这些人还牛逼。
最后再教给你怎么看文章，这个文章里面第一行就说明了，这篇文章是基于另外一片文
章的一篇follow up：
https://aphyr.com/posts/293-call-me-maybe-kafka
这个文章里面有详细的什么时候会出现data loss的解释。
刚才就想说混淆CAP概念的人是你，没好意思说而已。
你要是真牛逼就给按照你的想法做个比现在市面上都牛逼的kv store出来，然后给个
benchmark证明不管从latency，scalability发面都能达到要求，不用高，就用spanner
的标准，read／write lookup就web app level，50ms latency就行，然后什么复杂
query，transaction支持都不需要，就是必须要beat CAP。
你这种理论到了实践基本都通不过，真心不知道到底是谁在瞎逼逼。

of
some

【在 t*********r 的大作中提到】

: 不说明道理，贴个帖子有意思么？kakfa就是王道真理？F家前一段还发呸破鄙视了某家
: 的kafka
: 你自己发的东西里面也写了：
: > there is no correct algorithm for guaranteeing consistency in the face of
: f failures with fewer than 2f+1 servers
: 反之，there are correct algorithms for guaranteeing consistency (with
: availability) in the face of f failures with 2f+1 or more servers, e.g. some
: partition with a majority.
: 说到某家，我一学长之前看某家的人写blog吹kafka多快，还叫大家下载核实。结果这
: 位学长下了一个benchmark script一看，呵呵

g*****g
发帖数: 34805

市面上的DB可以做到所谓的tuneable consistency，让你一会放弃这个一会放弃那个的
，但还没有一个敢号称beat CAP theorem的。数学上证明过的东西还要Beat，就跟做永
动机一个意思。

data

【在 b*****n 的大作中提到】

: Kafka不是王道，这个是为了解释partition的时候出现master election就会出现data
: loss，我不知道你怎么看的这篇文章，Jay要解释的是这种情况无法避免，唯一解决的
: 方法就是要么放弃A，要么就忍这个data loss。
: 我也不知道你对这些系统有没有真的用过体验过，你的意思是你凭空就能搞一个牛逼的
: 系统出来，还是说你比Jay这些人还牛逼。
: 最后再教给你怎么看文章，这个文章里面第一行就说明了，这篇文章是基于另外一片文
: 章的一篇follow up：
: https://aphyr.com/posts/293-call-me-maybe-kafka
: 这个文章里面有详细的什么时候会出现data loss的解释。
: 刚才就想说混淆CAP概念的人是你，没好意思说而已。

t*********r
发帖数: 387

My comment was referring to the second half of your comment. I don't think
kafka is representative of necessarily the best design space.
In particular, consider the case where kafka uses a quorum rather than
primary/backup for its replication scheme (as it applies to providing data
loss guarantees during partition).
Also, what do you think should be the provided guarantees for *uncommitted*
data?

change
each
of

【在 g*****g 的大作中提到】

: Maybe you should listen to the creator, eh?
: By design, committed messages are always preserved during leadership change
: whereas some uncommitted data could be lost. The leader and the ISR for each
: partition are also stored in Zookeeper and are used during the failover of
: the controller. Both the leader and the ISR are expected to change
: infrequently since failures are rare.

相关主题
● dropbox一道题	● FB设计题求教。
● Uber-NY onsite (zz)	● 脸家设计题，设计游戏排名系统
● 求牛人解答一个Amazon 设计问题	● 内推苹果itunes部门
进入JobHunting版参与讨论

t*********r
发帖数: 387

> Kafka不是王道，这个是为了解释partition的时候出现master election就会出现
data loss
> 我不知道你怎么看的这篇文章，Jay要解释的是这种情况无法避免，唯一解决的
方法就是要么放弃A，要么就忍这个data loss。
> 你的意思是你凭空就能搞一个牛逼的系统出来，还是说你比Jay这些人还牛逼。
我可没说我如何，呵呵
我发帖只是说CAP是个伪概念拿来忽悠人说C和A不可兼得的。我也不认为kafka design
导致的tradeoff是必要的。
来个具体例子吧：
> The issue Kyle demonstrates makes for a good illustration. In this
scenario Kyle kills off all but one node in the ISR, then writes to the
remaining node (which is now the leader), then kills this node and brings
back the other nodes. I actually think the issue here is what we call “
unclean leader election” rather than our approach to quorums or anything
specific to network partitions.
你觉得这一段一点问题都没有？我觉得问题很大。没有majority强行make progress本
身是个问题,有data loss只能说用ISR制度和有f+1个机子试图容f个错是kafka自己的
design作死。
反之：
> An equivalent scenario can be constructed for a majority vote quorum. For
example consider using a majority vote quorum with 3 nodes (so you can
tolerate 1 failure). Now say that one of your nodes in this three node
quorum is obliterated. If you accept a write with only two servers the
failure of another server breaks the quorum property so you will no longer
be able to elect a new master or guarantee consistency.
相比之下很明显，quorum要比前一个情况好很多。如果有个机制能补机子这个基本不是
问题。
> https://aphyr.com/posts/293-call-me-maybe-kafka
> 这个文章里面有详细的什么时候会出现data loss的解释。
> 刚才就想说混淆CAP概念的人是你，没好意思说而已。
有毛不好意思的。。。版上发帖而已
我之前也提过，有majority partition的情况下CAP都可以达到，这跟你最后发的帖子
没异议吧？没有majority partition的情况下的确没办法，但这不是cap的定义。
我之前都发帖说过了，CAP概念本身就是个忽悠人的伪概念。我对CAP的论点是从之前链
接的"经典"呸破里作为出发点，你要是不认同这个定义，我也没办法。
> 你要是真牛逼就给按照你的想法做个比现在市面上都牛逼的kv store出来，然后给个
benchmark证明不管从latency，scalability发面都能达到要求，不用高，就用spanner
的标准，read／write lookup就web app level，50ms latency就行，然后什么复杂
query，transaction支持都不需要，就是必须要beat CAP。
我喷的是芒果这种劣品,但市面上有靠谱的东西可以直接拿来用。
你直接拉cassandra,设置调成quorum即可beat CAP的个例。
你如果非要一个能容只有minority partitions的话无解，但这不是Ｐ的标准定义。

data

【在 b*****n 的大作中提到】

g*****g
发帖数: 34805

If uncommitted data can be lost, and not all data can be committed all the
time, e.g. when leader change. It's easy to see guaranteed high availability
is a pipedream. I don't know what you can argue with.

【在 t*********r 的大作中提到】

: My comment was referring to the second half of your comment. I don't think
: kafka is representative of necessarily the best design space.
: In particular, consider the case where kafka uses a quorum rather than
: primary/backup for its replication scheme (as it applies to providing data
: loss guarantees during partition).
: Also, what do you think should be the provided guarantees for *uncommitted*
: data?
:
: change
: each

g*****g
发帖数: 34805

Cassandra doesn't support high consistency. It doesn't support Atomicity
beyond single row. And on single row it can has false negative, namely a
failed write can write to the storage on partition. I am a big fan of C* but
let's be realistic here, it doesn't beat CAP.

design

【在 t*********r 的大作中提到】

: > Kafka不是王道，这个是为了解释partition的时候出现master election就会出现
: data loss
: > 我不知道你怎么看的这篇文章，Jay要解释的是这种情况无法避免，唯一解决的
: 方法就是要么放弃A，要么就忍这个data loss。
: > 你的意思是你凭空就能搞一个牛逼的系统出来，还是说你比Jay这些人还牛逼。
: 我可没说我如何，呵呵
: 我发帖只是说CAP是个伪概念拿来忽悠人说C和A不可兼得的。我也不认为kafka design
: 导致的tradeoff是必要的。
: 来个具体例子吧：
: > The issue Kyle demonstrates makes for a good illustration. In this

b*****n
发帖数: 618

quorum system本身就不是A的。
Cassandra tunnable consistency也不是你说的这种C，tune来tune去的结果是只能让
得到的结果是正确的概率更高而已，小数点后面几个9的差别。跟你想要的C根本不是一
回事。
A这个东西其实是你能接受的latency到底有多少，可以完全在有machine failure的这
段时间停止process任何request，然后等到完全恢复之后再继续，如果无限大的
latency还可以被认为是A，那就能实现A。关键是看这种cost有多大，能不能接受。
分布系统处理failure困难的一个地方就在于，在跟remote发信息的时候如果没收到ack
，不能分辨出到底是remote挂了还是network partition的，所以如果想要所谓的C，A
跟P基本上不能共存。Kafka design的时候有一个重要的假设，就是所有机器在同一个
network里面，假设不会出现partition。
即使是所谓的CP，CA系统，也不是完全100%的CP，CA，总有个别的case能break，只是
侧重点在什么上面，说来说去其实都是tradeoff。
芒果不等remote ack这个就是tradeoff的结果，Kafka也通过config可以选择这样做，
也可以选择等remote ack之后再ack，都是由use case决定的，没有绝对的事情。
芒果这种情况，首先看有多大的丢数据的几率，如果是critical的数据确实危险，大部
分情况下如果能忍的话就无所谓。既然芒果用户这么多，说明基本上还是能满足大部分
人的要求。

design

【在 t*********r 的大作中提到】

t*********r
发帖数: 387

> Cassandra tunnable consistency也不是你说的这种C，tune来tune去的结果是只能
让得到的结果是正确的概率更高而已，小数点后面几个9的差别。跟你想要的C根本不是
一回事。
你确定跑个QUORUM也不能达到我所说的C？来个例子举证。
当然所有机子挂了我也没办法。
> A这个东西其实是你能接受的latency到底有多少，可以完全在有machine failure的
这段时间停止process任何request，然后等到完全恢复之后再继续，如果无限大的
latency还可以被认为是A，那就能实现A。
理论上A跟LATENCY没半毛钱关系。我不喜欢胡扯，直接引用文献：http://webpages.cs.luc.edu/~pld/353/gilbert_lynch_brewer_proof.pdf
Available Data Objects
For a distributed system to be continuously available, every request
received by a non-failing node in the system must result in a response.
That is, any algorithm used by the service must eventually terminate. In
some ways this is a weak definition of availability: it puts no bound on how
long the algorithm may run before terminating, and therefore allows
unbounded computation.
QUORUM完全可以达到这一点。如你所说，“然后等到完全恢复之后再继续，如果无限大
的latency还可以被认为是A，那就能实现A。” 现实中这个恢复可以做的非常快，
AMAZON这一点做的很好。
> A这个东西其实是你能接受的latency到底有多少，可以完全在有machine failure的
这段时间停止process任何request，然后等到完全恢复之后再继续，如果无限大的
latency还可以被认为是A，那就能实现A。关键是看这种cost有多大，能不能接受。
你描述的纯粹是PERFORMANCE，跟A没半毛钱关系。
> 分布系统处理failure困难的一个地方就在于，在跟remote发信息的时候如果没收到
ack，不能分辨出到底是remote挂了还是network partition的
为啥需要分清？在QUORUM里面只要是minority partition一律像挂了一样处理。
> 芒果不等remote ack这个就是tradeoff的结果，Kafka也通过config可以选择这样做
，也可以选择等remote ack之后再ack，都是由use case决定的，没有绝对的事情。
芒果如果为了BENCHMARK和PERFOMANCE不ACK可以，但这样还自称是DB就有挂羊头卖狗肉
了。引用CAP来说不可能CA就更呵呵了。
KAFKA里面IRS设计对when to commit的问题我之前说过，不再喷了。

ack
A

【在 b*****n 的大作中提到】

: quorum system本身就不是A的。
: Cassandra tunnable consistency也不是你说的这种C，tune来tune去的结果是只能让
: 得到的结果是正确的概率更高而已，小数点后面几个9的差别。跟你想要的C根本不是一
: 回事。
: A这个东西其实是你能接受的latency到底有多少，可以完全在有machine failure的这
: 段时间停止process任何request，然后等到完全恢复之后再继续，如果无限大的
: latency还可以被认为是A，那就能实现A。关键是看这种cost有多大，能不能接受。
: 分布系统处理failure困难的一个地方就在于，在跟remote发信息的时候如果没收到ack
: ，不能分辨出到底是remote挂了还是network partition的，所以如果想要所谓的C，A
: 跟P基本上不能共存。Kafka design的时候有一个重要的假设，就是所有机器在同一个

t*********r
发帖数: 387

Kafka's ISR together with its f failure tolerance in f+1 nodes is a bad
guarantee, which is why data can be lost.

availability

【在 g*****g 的大作中提到】

: If uncommitted data can be lost, and not all data can be committed all the
: time, e.g. when leader change. It's easy to see guaranteed high availability
: is a pipedream. I don't know what you can argue with.

t*********r
发帖数: 387

It's not an issue if you just use it as a KV store -- that is what was asked
of originally.
The second issue is implementation specific. I looked closer in the
documentation, and conceded this is the case in implementation. But do you
agree that that if you take a base quorum protocol, you can get a C* system?
Ensure that you operate only on a majority, and you get A?

but

【在 g*****g 的大作中提到】

: Cassandra doesn't support high consistency. It doesn't support Atomicity
: beyond single row. And on single row it can has false negative, namely a
: failed write can write to the storage on partition. I am a big fan of C* but
: let's be realistic here, it doesn't beat CAP.
:
: design

g*****g
发帖数: 34805

A KV store is not a general purpose DB. Yes, KV store, when key is properly
distributed, can guarantee timeline consistency on key level and tolerate
some failures. But that's not equal to C in CAP. Next time tell me something
I don't already know.

asked
system?

【在 t*********r 的大作中提到】

: It's not an issue if you just use it as a KV store -- that is what was asked
: of originally.
: The second issue is implementation specific. I looked closer in the
: documentation, and conceded this is the case in implementation. But do you
: agree that that if you take a base quorum protocol, you can get a C* system?
: Ensure that you operate only on a majority, and you get A?
:
: but

t*********r
发帖数: 387

C in CAP is atomicity/serializability. Are you sure we agree on that?
I don't see why you don't think a KV store cannot achieve this level of
guarantee.

properly
something

【在 g*****g 的大作中提到】

: A KV store is not a general purpose DB. Yes, KV store, when key is properly
: distributed, can guarantee timeline consistency on key level and tolerate
: some failures. But that's not equal to C in CAP. Next time tell me something
: I don't already know.
:
: asked
: system?

g*****g
发帖数: 34805

It means atomic consistency, and when KV store cannot do that beyond single
key, it's certainly not C in CAP.
The CAP Theorem is based on three trade-offs, one of which is "atomic
consistency" (shortened to "consistency" for the acronym), about which the
authors note, "Discussing atomic consistency is somewhat different than
talking about an ACID database, as database consistency refers to
transactions, while atomic consistency refers only to a property of a single
request/response operation sequence. And it has a different meaning than
the Atomic in ACID, as it subsumes the database notions of both Atomic and
Consistent."[1]

【在 t*********r 的大作中提到】

: C in CAP is atomicity/serializability. Are you sure we agree on that?
: I don't see why you don't think a KV store cannot achieve this level of
: guarantee.
:
: properly
: something

相关主题
● 内推苹果itunes部门	● interview design question: how to design a high through put queue system
● G家店面design题目	● fb设计题
● 问个snapchat的设计题	● [提供内推] Senior DBA （SFO市区, mysql, cassandra, redis, h (转载)
进入JobHunting版参与讨论

t*********r
发帖数: 387

I fail to see how you derived the statement from your citation:
"when KV store cannot do that beyond single key, it's certainly not C in CAP
."
Apart from whether it is feasible or not, why do you assert that multi key
support a requirement for consistency?

single
single

【在 g*****g 的大作中提到】

: It means atomic consistency, and when KV store cannot do that beyond single
: key, it's certainly not C in CAP.
: The CAP Theorem is based on three trade-offs, one of which is "atomic
: consistency" (shortened to "consistency" for the acronym), about which the
: authors note, "Discussing atomic consistency is somewhat different than
: talking about an ACID database, as database consistency refers to
: transactions, while atomic consistency refers only to a property of a single
: request/response operation sequence. And it has a different meaning than
: the Atomic in ACID, as it subsumes the database notions of both Atomic and
: Consistent."[1]

g*****g
发帖数: 34805

So I need to help you interpret what it means now? The database below means
traditional RDBMS. So C in CAP means AC in RDBMS. Can RDMBS support AC on
multiple rows? I am sorry you really disappoint me, I thought I could learn
something from you but all I figure out is you are lacking of basics. You
don't beat CAP but interpreting it differently from what people understand.
And it has a different meaning than the Atomic in ACID, as it subsumes the
database notions of both Atomic and Consistent."[1]

CAP

【在 t*********r 的大作中提到】

: I fail to see how you derived the statement from your citation:
: "when KV store cannot do that beyond single key, it's certainly not C in CAP
: ."
: Apart from whether it is feasible or not, why do you assert that multi key
: support a requirement for consistency?
:
: single
: single

t*********r
发帖数: 387

Hmmm, I don't remember that C includes AC, but I will concede I'm wrong in
this regard.
Nonetheless, I disagree with your statement that it is not possible to have
multikey transaction in KV stores in general.
You already agree that atomicity for each key is possible. To add multikey
transaction, consider the following design:
-For each transaction, sort the shards they touch into some order. This
ensures that transactions that touch the same keys will send their requests
to the corresponding shards in the same order.
-For each transaction, attempt to {commit the change,retrieve the value} in
each shard. If some operation violates serializability, roll back the
changes in previous shards and either fail the request or try again.
Otherwise, hold this change into a temporary log.
-Once all shareds OK the change, signal to each shard to commit the change
permanently.
Basic idea is from here: http://hyperdex.org/papers/warp.pdf

means
learn

【在 g*****g 的大作中提到】

: So I need to help you interpret what it means now? The database below means
: traditional RDBMS. So C in CAP means AC in RDBMS. Can RDMBS support AC on
: multiple rows? I am sorry you really disappoint me, I thought I could learn
: something from you but all I figure out is you are lacking of basics. You
: don't beat CAP but interpreting it differently from what people understand.
: And it has a different meaning than the Atomic in ACID, as it subsumes the
: database notions of both Atomic and Consistent."[1]
:
: CAP

g*****g
发帖数: 34805

You could simply use Zookeeper to guarantee exclusivity to achieve
distributed transaction and the performance will go down to the knees. There
's a reason most KV stores don't even support rollback.
Once a transaction crosses shards, your write performance will be worse than
a RDBMS. It's just physics.

have
requests
in

【在 t*********r 的大作中提到】

: Hmmm, I don't remember that C includes AC, but I will concede I'm wrong in
: this regard.
: Nonetheless, I disagree with your statement that it is not possible to have
: multikey transaction in KV stores in general.
: You already agree that atomicity for each key is possible. To add multikey
: transaction, consider the following design:
: -For each transaction, sort the shards they touch into some order. This
: ensures that transactions that touch the same keys will send their requests
: to the corresponding shards in the same order.
: -For each transaction, attempt to {commit the change,retrieve the value} in

t*********r
发帖数: 387

Let's address the issues one at a time.
Do you concede that C and A are both achievable along with P with the
provision that during a partition, every shard has a surviving majority?
Bring it up if you disagree, but I don't think that performance is relevant
in the discussion of "can you achieve it?"

There
than

【在 g*****g 的大作中提到】

: You could simply use Zookeeper to guarantee exclusivity to achieve
: distributed transaction and the performance will go down to the knees. There
: 's a reason most KV stores don't even support rollback.
: Once a transaction crosses shards, your write performance will be worse than
: a RDBMS. It's just physics.
:
: have
: requests
: in

g*****g
发帖数: 34805

I feel you are keeping patching your definition of CAP along with your DB
design. And you concede your best design is deeply flawed with no
performance to begin with.
Performance issue means slow access, and slow access means low availability.
If all your clients try to write the same keys, does it scale? C* will do
to a certain extent, with eventual consistency, and yours will not. Now that
's basics.

relevant

【在 t*********r 的大作中提到】

: Let's address the issues one at a time.
: Do you concede that C and A are both achievable along with P with the
: provision that during a partition, every shard has a surviving majority?
: Bring it up if you disagree, but I don't think that performance is relevant
: in the discussion of "can you achieve it?"
:
: There
: than

t*********r
发帖数: 387

Performance: using ZK implies pessimistic CC, which IMO will probably
perform slightly worse in the general case.
I concede cross shard transactions will incur a latency cost, but scaling
RDBMS to very large scales is a pain in the ass. For smaller data sets,
crossing a few shards isn't going to be *that* bad.
I never made an argument that there is no tradeoff between C/A against *
performance*. What I do have a problem is claiming that you can't have both
C and A (since P is widely claimed and assumed as a part of life). If
anything, P is a poorly defined metric that can range anywhere between all
nodes being disjoint to only one node being cut off. It's not a useful
metric to reason about a system relative to C and A.

There
than

【在 g*****g 的大作中提到】

t*********r
发帖数: 387

I try to stick with the definitions as defined by the paper. For the case
previously, I've conceded it as a mistake and adjusted my presentation
accordingly.
Availability is a property. You have it, or you don't. It doesn't come for
free.
Do you agree with this definition:
"in some ways this is a weak definition of availability: it puts no bound on
how long the algorithm may run before terminating, and therefore allows
unbounded computation."
If all your client tries to to write to the same key, it can slows to a
crawl (or gets an abort response). But low performance/{denied,aborted}
response is not equivalent to no availability.
EDIT: "no performance" is a hard assertion. I don't see why it can't perform
reasonably well unless everyone is writing to the same key.
In that case, you might as well have clients use some kind of local cache
before accessing a DB, because reasoning about code correctness when you can
't reliably predict your DB values is difficult.

availability.
that

【在 g*****g 的大作中提到】

: I feel you are keeping patching your definition of CAP along with your DB
: design. And you concede your best design is deeply flawed with no
: performance to begin with.
: Performance issue means slow access, and slow access means low availability.
: If all your clients try to write the same keys, does it scale? C* will do
: to a certain extent, with eventual consistency, and yours will not. Now that
: 's basics.
:
: relevant

g*****g
发帖数: 34805

Availability is pretty well defined. It's pretty much equivalent to linear
scalability. With more traffic you can maintain latency with more machines,
you have high availability.
And I said keys, not key. C* is susceptible to hot row issues. But generally
you can design around it, the end result is that many clients to write to a
set of rows can still scale. In your case, the set is reduced to one row
once you support transaction, and the effect is very obvious.

on

【在 t*********r 的大作中提到】

: I try to stick with the definitions as defined by the paper. For the case
: previously, I've conceded it as a mistake and adjusted my presentation
: accordingly.
: Availability is a property. You have it, or you don't. It doesn't come for
: free.
: Do you agree with this definition:
: "in some ways this is a weak definition of availability: it puts no bound on
: how long the algorithm may run before terminating, and therefore allows
: unbounded computation."
: If all your client tries to to write to the same key, it can slows to a

t*********r
发帖数: 387

Please point me to a reference that defines availability the way you have
described in the context of CAP.
My understanding is the following:
"For a distributed system to be continuously available, every request
received by a non-failing node in the system must result in a response [...]
this is a weak definition of availability: it puts no bound on how long the
algorithm may run before terminating, and therefore allows unbounded
computation."
From the same proof paper before. I'm also under the impression that the
node is permitted to reply an abort response the client. In any case it is
not consistent with your description.
I'm sure there are contexts where high/low availability refers to
scalability while maintaining some latency threshold, but IMO this is not
the same discussion.

,
generally
a

【在 g*****g 的大作中提到】

: Availability is pretty well defined. It's pretty much equivalent to linear
: scalability. With more traffic you can maintain latency with more machines,
: you have high availability.
: And I said keys, not key. C* is susceptible to hot row issues. But generally
: you can design around it, the end result is that many clients to write to a
: set of rows can still scale. In your case, the set is reduced to one row
: once you support transaction, and the effect is very obvious.
:
: on

相关主题
● 问一个设计题	● 我的System Design总结
● 多个数据中心保持数据一致	● 来道A设计题大家头脑风暴一下
● Pinterest陶涛：三个教训和三个发展选择 (转载)	● 秒杀设计题
进入JobHunting版参与讨论

g*****g
发帖数: 34805

Honestly I only care what the industry wants to achieve, and not how to
define the terms to prove one's right or wrong. And in this industry high
availability includes, but not limited to linear scalability.

.]
the

【在 t*********r 的大作中提到】

: Please point me to a reference that defines availability the way you have
: described in the context of CAP.
: My understanding is the following:
: "For a distributed system to be continuously available, every request
: received by a non-failing node in the system must result in a response [...]
: this is a weak definition of availability: it puts no bound on how long the
: algorithm may run before terminating, and therefore allows unbounded
: computation."
: From the same proof paper before. I'm also under the impression that the
: node is permitted to reply an abort response the client. In any case it is

j**********3
发帖数: 3211

为什么不讲中文！看到英文我脑袋都大了啊

t*********r
发帖数: 387

This is bullshit.
When I make a mistake I concede my errors openly and objectively. When you
make a mistake you make excuses. You were citing the same source earlier,
and now you go back on it.
If you said something like "there is a tradeoff between consistency and
performance/linear scalability," I might think about it for a bit and then
agree with you. But you were ranting earlier about how disappointed you were
when I got my terms wrong, and how you think I don't know even the basics.
Now you make the same mistake and you say you don't really care.
I am disappointed in your attitude.
-------EDIT-------
These terms mean very different things. It's not about right or wrong, but
about reasoning the properties and capabilities of the system.
I could have easily said I don't really care about the C in ACID like you
mentioned earlier since KV stores don't provide that interface, but I
conceded anyways since people find it useful and it was part of the original
definition.
It's really not OK to make claims about CAP when you don't even use the same
definitions from the original context.

【在 g*****g 的大作中提到】

: Honestly I only care what the industry wants to achieve, and not how to
: define the terms to prove one's right or wrong. And in this industry high
: availability includes, but not limited to linear scalability.
:
: .]
: the

g*****g
发帖数: 34805

You come up with a product that beats CAP, you will be billionaire. You come
up with a peer review paper that does it. You don't need to worry about
next job. It's your duty to prove yourself when you have a such bold claim.
I can be totally wrong and that doesn't prove you beat CAP, and we can agree
to disagree, so what are you trying to achieve here?

were
.

【在 t*********r 的大作中提到】

: This is bullshit.
: When I make a mistake I concede my errors openly and objectively. When you
: make a mistake you make excuses. You were citing the same source earlier,
: and now you go back on it.
: If you said something like "there is a tradeoff between consistency and
: performance/linear scalability," I might think about it for a bit and then
: agree with you. But you were ranting earlier about how disappointed you were
: when I got my terms wrong, and how you think I don't know even the basics.
: Now you make the same mistake and you say you don't really care.
: I am disappointed in your attitude.

t*********r
发帖数: 387

Because this is a solved problem and is not particularly interesting at this
point in academia or industry. I've
presented an argument outlining why the CAP tradeoff is not a universal
requirement if you're willing to compromise on latency and maintain specific
failure requirements. I've linked you a paper that achieves the three
properties as defined by peer reviewed definitions, and yet you still assert
I have not provided sufficient proof. It's sad to see you resort to ad
hominem arguments at this point.
Take a look here:
https://foundationdb.com/key-value-store/white-papers/the-cap-theorem
These guys built a system that more or less circumvented CAP and got bought
by Apple.

come
.
agree

【在 g*****g 的大作中提到】

: You come up with a product that beats CAP, you will be billionaire. You come
: up with a peer review paper that does it. You don't need to worry about
: next job. It's your duty to prove yourself when you have a such bold claim.
: I can be totally wrong and that doesn't prove you beat CAP, and we can agree
: to disagree, so what are you trying to achieve here?
:
: were
: .

g*****g
发帖数: 34805

The paper repeats many times it didn't beat CAP theorem. If during partition
, a minority cluster can retain write and come out with integrity after
union, then that's something. But it simply says this cannot achieve
majority and it'll stop writing. i.e. Losing write availability during
partition.
It's not a wrong design, but it's not a new design. I guess that's why they
never became mainstream and had an early exit.

this
specific
assert

【在 t*********r 的大作中提到】

: Because this is a solved problem and is not particularly interesting at this
: point in academia or industry. I've
: presented an argument outlining why the CAP tradeoff is not a universal
: requirement if you're willing to compromise on latency and maintain specific
: failure requirements. I've linked you a paper that achieves the three
: properties as defined by peer reviewed definitions, and yet you still assert
: I have not provided sufficient proof. It's sad to see you resort to ad
: hominem arguments at this point.
: Take a look here:
: https://foundationdb.com/key-value-store/white-papers/the-cap-theorem

t*********r
发帖数: 387

I really wish you follow what I've written, but I'll clarify anyway.
First, while the whitepaper has a good solution, their definition of A is
likewise incorrect. Reconsider their approach with the peer reviewed
definition.
Secondly, I've stated before that only the majority partition should proceed
during a partition. The minority N in a 2N+1 configuration should inform
the client to redirect to the majority. This process neither violates C, A (
note only a response is required, not necessarily positive. A "service
temporarily unavailable" or a redirect is perfectly acceptable by the formal
definition), or P.

g*****g
发帖数: 34805

I really wish you can just give a solid presentation, and not this with 10
patches will work. If Foundation DB s official website didn't say they beat
it, Id rather just believe it as is and not to believe one more patch will
do wonder. Don't forget it can be 3 split and no write can be done at all.

proceed
(
formal

【在 t*********r 的大作中提到】

: I really wish you follow what I've written, but I'll clarify anyway.
: First, while the whitepaper has a good solution, their definition of A is
: likewise incorrect. Reconsider their approach with the peer reviewed
: definition.
: Secondly, I've stated before that only the majority partition should proceed
: during a partition. The minority N in a 2N+1 configuration should inform
: the client to redirect to the majority. This process neither violates C, A (
: note only a response is required, not necessarily positive. A "service
: temporarily unavailable" or a redirect is perfectly acceptable by the formal
: definition), or P.

(共1页)

进入JobHunting版参与讨论

相关主题
● FYI, 做kafka的startup confluent刚成立	● G家店面design题目
● 老年马工赶快去 fb	● 问个snapchat的设计题
● dropbox一道题	● interview design question: how to design a high through put queue system
● Uber-NY onsite (zz)	● fb设计题
● 求牛人解答一个Amazon 设计问题	● [提供内推] Senior DBA （SFO市区, mysql, cassandra, redis, h (转载)
● FB设计题求教。	● 问一个设计题
● 脸家设计题，设计游戏排名系统	● 多个数据中心保持数据一致
● 内推苹果itunes部门	● Pinterest陶涛：三个教训和三个发展选择 (转载)

相关话题的讨论汇总
话题: cap话题: partition话题: kv

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天