Cassandra Rewritten In C++, Ten Times Faster - Programming版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Programming版 - Cassandra Rewritten In C++, Ten Times Faster

相关主题
● 还是别争了，从旁观者角度看，两个方案没准都能工作	● Cassandra VS ElasticSearch 一般 logging 哪个好
● big data怎么搞定商业报表？	● 开源的轮子
● mongoDB跟传统关系数据库比有什么优势?	● Why You Should Never Use MongoDB
● Re: 问Zhaoce个问题 (转载)	● wwzz来讲讲cassandra吧
● 请大牛来谈谈对Solr的看法	● 去了Java one 两天的感想。
● 清净版：写一个Complete Failover Handbook吧	● 【新手问题】Cassandra-Spark 哪个connector最好？
● 鄙视芒果的被打脸了	● 请教真正了解nosql的大牛个问题
● oracle coherence	● Cassandra vs MongoDB

相关话题的讨论汇总
话题: c++话题: cassandra话题: times话题: faster话题: java

进入Programming版参与讨论

(共1页)

T********i
发帖数: 2416

At Cassandra Summit opening today, Avi Kivity and Dor Laor (who had
previously written KVM and OSv) announced ScyllaDB — an open-source C++
rewrite of Cassandra, the popular NoSQL database. ScyllaDB claims to achieve
a whopping 10 times more throughput per node than the original Java code,
with sub-millisecond 99%ile latency. They even measured 1 million
transactions per second on a single node. The performance of the new code is
attributed to writing it in Seastar — a C++ framework for writing complex
asynchronous applications with optimal performance on modern hardware.

l*******m
发帖数: 1096

很多地方的头条

achieve
is
complex

【在 T********i 的大作中提到】

: At Cassandra Summit opening today, Avi Kivity and Dor Laor (who had
: previously written KVM and OSv) announced ScyllaDB — an open-source C++
: rewrite of Cassandra, the popular NoSQL database. ScyllaDB claims to achieve
: a whopping 10 times more throughput per node than the original Java code,
: with sub-millisecond 99%ile latency. They even measured 1 million
: transactions per second on a single node. The performance of the new code is
: attributed to writing it in Seastar — a C++ framework for writing complex
: asynchronous applications with optimal performance on modern hardware.

z****e
发帖数: 54598

"a C++ framework for writing complex asynchronous applications with optimal
performance on modern hardware"
"The First Rule of Program Optimization: Don't do it. The Second Rule of
Program Optimization (for experts only!): Don't do it yet." — Michael A.
Jackson

g*****g
发帖数: 34805

这是好事情，C*要是一开始就拿C++写，估计到现在都没1.0呢。有了钱满满优化才是王
道。
不过说快10倍估计就是吹嘘了，能平均快一倍就不错了。up to 10 times跟up to 90%
off都是相似的。

T********i
发帖数: 2416

ScyllaDB claims to achieve a whopping 10 times more throughput per node than
the original Java code, with sub-millisecond 99%ile latency.
你在哪里看到up to的？

%

【在 g*****g 的大作中提到】

: 这是好事情，C*要是一开始就拿C++写，估计到现在都没1.0呢。有了钱满满优化才是王
: 道。
: 不过说快10倍估计就是吹嘘了，能平均快一倍就不错了。up to 10 times跟up to 90%
: off都是相似的。

h**********c
发帖数: 4120

单纯讲io的话，可以把jvm的file discriptor socket pipe到os的descriptor，从而本
质上与外界物理特性无关。
async能快还是取决于kernel对于select的信号处理，这个应该只和kernel有关，和编
成语言应该没有任何关系

T********i
发帖数: 2416

你错了

【在 h**********c 的大作中提到】

: 单纯讲io的话，可以把jvm的file discriptor socket pipe到os的descriptor，从而本
: 质上与外界物理特性无关。
: async能快还是取决于kernel对于select的信号处理，这个应该只和kernel有关，和编
: 成语言应该没有任何关系

h**********c
发帖数: 4120

本来就是kernel重写了，偏拉上java比，那比宽字符处理，日期函数

【在 T********i 的大作中提到】

: 你错了

g*****g
发帖数: 34805

http://www.eweek.com/database/scylladb-database-emerges-out-of-
"We're building a really fast database for NoSQL workloads," Kivity told
eWEEK. "ScyllaDB is 100 percent compatible with Cassandra, and applications
will run up to 10 times faster."

than

【在 T********i 的大作中提到】

: ScyllaDB claims to achieve a whopping 10 times more throughput per node than
: the original Java code, with sub-millisecond 99%ile latency.
: 你在哪里看到up to的？
:
: %

d****n
发帖数: 1637

哥是老中医专治吹牛逼头疼脑热血压低跟我没关系你要吹牛逼不如打飞机又省钱
来又过瘾还没有压力吃点没关系喝点没关系吹牛逼地那些人都没有实力有人吹牛逼
就找老中医一顿五毒拍逼掌脑袋打放屁手拿大哥大我腰挎BB机身穿一条大裤衩特
么嘴里叼玉溪
哥是老中医我出门也打的有时候也找俩女孩特么玩一把3P哥是老中医开着拖拉机
全国各地四处跑专治吹牛逼
哥是老中医整天笑嘻嘻听见有人吹牛逼就是一顿踢天天吹牛逼早晚让雷劈雷电要
是劈不死还有老中医
这位老中医他就是锦州地锦州喷神韦小宝我就专治吹牛逼哥是老中医专治吹牛逼
头疼脑热血压低跟我没关系不要吹牛逼谁吹谁挨踢大哥就是老中医专治吹牛逼

applications

【在 g*****g 的大作中提到】

: http://www.eweek.com/database/scylladb-database-emerges-out-of-
: "We're building a really fast database for NoSQL workloads," Kivity told
: eWEEK. "ScyllaDB is 100 percent compatible with Cassandra, and applications
: will run up to 10 times faster."
:
: than

相关主题
● 清净版：写一个Complete Failover Handbook吧	● Cassandra VS ElasticSearch 一般 logging 哪个好
● 鄙视芒果的被打脸了	● 开源的轮子
● oracle coherence	● Why You Should Never Use MongoDB
进入Programming版参与讨论

w**z
发帖数: 8232

明天去瞅瞅。不是说明天才正式开始summit 嘛。

achieve
is
complex

【在 T********i 的大作中提到】

l*********s
发帖数: 5409

couchbase C++写的，比C*快6倍。重写能快10倍也没啥。

%

【在 g*****g 的大作中提到】

z****e
发帖数: 54598

erlang

【在 l*********s 的大作中提到】

: couchbase C++写的，比C*快6倍。重写能快10倍也没啥。
:
: %

z****e
发帖数: 54598

interesting thing is
some one here cares more about " Ten Times Faster"
while the same guy tries to tell me don't use udp
is this a joke?

z****e
发帖数: 54598

the io between server and the db is not that important
since u already have async mechanism to return rather than
sitting there waiting for the result

g*****g
发帖数: 34805

http://www.scylladb.com/technology/cassandra-vs-scylla-benchmar
看了这个benchmark，目测10倍是没有，8倍？但关键是misleading，没有人会单机跑C*
的，你最少也上3个节点。我怀疑replication会由Network IO latency主导，从而大幅
缩小差距，有可能会然并卵。举个例子就是network latency 1ms, 你kernel bypass
节省了0.1ms，绝对是然并卵。
吹嘘的单机sharding优化没有意义，一个机器当了就全当了，用C*的目的之一就是追求
high availability. 一份数据多机备份是必须的，这反过来直接说明cluster里，
replication factor > 1立马要大打折扣。
在cluster benchmark出来之前，我对这东西保持怀疑态度。

z****e
发帖数: 54598

C*
这个是好事，即便是单机也不错，api统一了之后，从single node -> cluster迁移就
容易了
可以丢掉mongo咯

【在 g*****g 的大作中提到】

: http://www.scylladb.com/technology/cassandra-vs-scylla-benchmar
: 看了这个benchmark，目测10倍是没有，8倍？但关键是misleading，没有人会单机跑C*
: 的，你最少也上3个节点。我怀疑replication会由Network IO latency主导，从而大幅
: 缩小差距，有可能会然并卵。举个例子就是network latency 1ms, 你kernel bypass
: 节省了0.1ms，绝对是然并卵。
: 吹嘘的单机sharding优化没有意义，一个机器当了就全当了，用C*的目的之一就是追求
: high availability. 一份数据多机备份是必须的，这反过来直接说明cluster里，
: replication factor > 1立马要大打折扣。
: 在cluster benchmark出来之前，我对这东西保持怀疑态度。

N*****m
发帖数: 42603

GC pause是个问题
连你们自己得dynamite都提到了，所以没有上java
不过，总体来说c*够用了

C*

【在 g*****g 的大作中提到】

w***g
发帖数: 5958

这个牛B。我有空要测测这个。
我觉得总结总结C*的经验教训，调整下架构细节，10倍是有肯能的。未必全是C++的
功劳。不过一个轮子API啥的全都稳定下来以后，要追求极致的性能，基本上就是
拿C++重写这条路。

achieve
is
complex

【在 T********i 的大作中提到】

a9
发帖数: 21638

哈哈，光整个空架子，快10倍没什么稀奇的，等把所有功能都实现了，能快一倍不？

achieve
is
complex

【在 T********i 的大作中提到】

相关主题
● wwzz来讲讲cassandra吧	● 请教真正了解nosql的大牛个问题
● 去了Java one 两天的感想。	● Cassandra vs MongoDB
● 【新手问题】Cassandra-Spark 哪个connector最好？	● 谁用过Playorm 连nosql数据库
进入Programming版参与讨论

S*******e
发帖数: 525

From Slashdot:
Rewrites are easier than the first strike (Score:5, Insightful)
Wow, two years ago everyone here told us that NoSQL is evil and tried to
convince us that we should stick to MySQL.
Now everyone tells us Java is evil, because a rewrite in C++ is faster.
What a surprise.
If I would rewrite Cassandra from scratch, in Java, it also would be faster
than the actual code.
Why? Because all the learning the original team did over a course of a
decade I can reuse and improve on.
Keep in mind, the rewrite uses a new framework and new concepts for
concurrency. Concurrency is one of the core areas where computing in future
will certainly make lots of progress.
I for my part I'm waiting for a Lucene rewrite, regardless in what language.
Probably the worst OSS code I have ever see ... actually the worst code
regardless of OSS or closed source.
Flag as Inappropriate

l*******m
发帖数: 1096

分析有些道理，他们自己也说optimal performance on modern hardware。估计用10G
网卡才有优势

C*

【在 g*****g 的大作中提到】

d****i
发帖数: 4809

数据库这种东西，确实还得用native code来写，就像所有的RDBMS全部都是C，C++写的
一样，NoSQL要减少latency和增加throughput的话，还是用native code来写更合适，
然后提供应用层语言的API接口PHP, Python, Java, Node各写一套，所以这个也不奇怪
。

t*****n
发帖数: 4908

所以说，语言对性能的追求是永无止境的。Java这种带套的东西，只适合startup 的糙
蒙快。

【在 d****i 的大作中提到】

: 数据库这种东西，确实还得用native code来写，就像所有的RDBMS全部都是C，C++写的
: 一样，NoSQL要减少latency和增加throughput的话，还是用native code来写更合适，
: 然后提供应用层语言的API接口PHP, Python, Java, Node各写一套，所以这个也不奇怪
: 。

h*i
发帖数: 3446

同意这个。nosql这种东西的好处就是多机，而多机主要的问题是网络的延迟和不可靠
，追求单机性能不是本末倒置么？
话说C*的分布式设计是有问题，具体见https://aphyr.com/posts/294-call-me-maybe-
cassandra/
但这些设计问题不是用C++写就能解决的。
这个对分布设计的Jepsen测试，目前测出来没问题的软件就只有zookeeper，其他被测
过的，C*, Mongo, Kafka，ES, Riak, Areospike, 等等，全都有问题，partition都
会造成inconsistency。
常用的分布数据库里面，couchbase还没被测过，这个是用erlang写的网络部分，可能
没问题吧？还有就是被Apple买了的FoundationDB, 自己的测试就包括Jepsen，估计也
没有问题。

C*

【在 g*****g 的大作中提到】

l*********s
发帖数: 5409

re，浏览器也是一个道理。java不是做不了，而是没有用户。

【在 d****i 的大作中提到】

g*****g
发帖数: 34805

单机肯定 C++快。但 NoSQL的架构必须跑 cluster. 我觉得他不做 cluster的
benchmark 很可能是没显著提高。RF=2, client要等 coordinated node写到另一个
node上并返回。这个 round trip latency远大于 kernel bypass之类能减少的
latency.
还是那句话，单机 benchmark是然并卵。拿个 3 node, rf=2 才有说服力。

【在 d****i 的大作中提到】

N*****m
发帖数: 42603

C*和Riak不能算问题，本来就是eventually consistent
Riak如果把allow-mult打开，是可以保证strong consistency的
另外，ectd和consul也没有问题

maybe-

【在 h*i 的大作中提到】

: 同意这个。nosql这种东西的好处就是多机，而多机主要的问题是网络的延迟和不可靠
: ，追求单机性能不是本末倒置么？
: 话说C*的分布式设计是有问题，具体见https://aphyr.com/posts/294-call-me-maybe-
: cassandra/
: 但这些设计问题不是用C++写就能解决的。
: 这个对分布设计的Jepsen测试，目前测出来没问题的软件就只有zookeeper，其他被测
: 过的，C*, Mongo, Kafka，ES, Riak, Areospike, 等等，全都有问题，partition都
: 会造成inconsistency。
: 常用的分布数据库里面，couchbase还没被测过，这个是用erlang写的网络部分，可能
: 没问题吧？还有就是被Apple买了的FoundationDB, 自己的测试就包括Jepsen，估计也

e*******o
发帖数: 4654

感觉跟 mongo 是苹果之于橘子啊

【在 z****e 的大作中提到】

:
: C*
: 这个是好事，即便是单机也不错，api统一了之后，从single node -> cluster迁移就
: 容易了
: 可以丢掉mongo咯

N*****m
发帖数: 42603

感觉他们现在的实现还没做到cluster-level的drop-in

【在 g*****g 的大作中提到】

: 单机肯定 C++快。但 NoSQL的架构必须跑 cluster. 我觉得他不做 cluster的
: benchmark 很可能是没显著提高。RF=2, client要等 coordinated node写到另一个
: node上并返回。这个 round trip latency远大于 kernel bypass之类能减少的
: latency.
: 还是那句话，单机 benchmark是然并卵。拿个 3 node, rf=2 才有说服力。

相关主题
● 今天Cassandra summit 的感想。	● big data怎么搞定商业报表？
● 奉劝一句那些动不动就谈架构的傻逼，谨言慎行	● mongoDB跟传统关系数据库比有什么优势?
● 还是别争了，从旁观者角度看，两个方案没准都能工作	● Re: 问Zhaoce个问题 (转载)
进入Programming版参与讨论

h*i
发帖数: 3446

丢数据不能算是eventually consistent。数据都给我丢了，你自己eventual
consistent对我有什么用处？否则的话，100%数据丢失＝100% guaranteed
consistency.
"No. Cassandra lightweight transactions are not even close to correct.
Depending on throughput, they may drop anywhere from 1-5% of acknowledged
writes–and this doesn’t even require a network partition to demonstrate.
It’s just a broken implementation of Paxos. "

【在 N*****m 的大作中提到】

: C*和Riak不能算问题，本来就是eventually consistent
: Riak如果把allow-mult打开，是可以保证strong consistency的
: 另外，ectd和consul也没有问题
:
: maybe-

l*********s
发帖数: 5409

100%数据丢失＝100% guaranteed consistency.
解说深入浅出，赞！

【在 h*i 的大作中提到】

: 丢数据不能算是eventually consistent。数据都给我丢了，你自己eventual
: consistent对我有什么用处？否则的话，100%数据丢失＝100% guaranteed
: consistency.
: "No. Cassandra lightweight transactions are not even close to correct.
: Depending on throughput, they may drop anywhere from 1-5% of acknowledged
: writes–and this doesn’t even require a network partition to demonstrate.
: It’s just a broken implementation of Paxos. "

N*****m
发帖数: 42603

C*丢数据那个是bugs，看那两个jira应该已经改了
eventuall consistent可以看应用，如果你的操作是CRDT的，没啥问题
所有这些都是看你的use case

【在 h*i 的大作中提到】

J****R
发帖数: 373

刚跟他们ceo聊了几句。貌似明年一月份才出稳定版本。再等等看吧。

c********l
发帖数: 125

cassandra summit为什么不搞成blizzardcon那样，直接在twitchtv上直播，可以在线
看，可以卖网络视频票，多高大上啊。

【在 J****R 的大作中提到】

: 刚跟他们ceo聊了几句。貌似明年一月份才出稳定版本。再等等看吧。

J****R
发帖数: 373

这点做的是不好。有些hot session放在小屋子里，根本挤不进去。也没看到有录像的
，估计以后网上也看不到。

【在 c********l 的大作中提到】

: cassandra summit为什么不搞成blizzardcon那样，直接在twitchtv上直播，可以在线
: 看，可以卖网络视频票，多高大上啊。

g*****g
发帖数: 34805

你得买200刀的priority pass就能进去。

【在 J****R 的大作中提到】

: 这点做的是不好。有些hot session放在小屋子里，根本挤不进去。也没看到有录像的
: ，估计以后网上也看不到。

z****e
发帖数: 54598

the biggest latency is network io
that is why goodbug says need to set rf = 2
then c the benchmark again
it is all distributed sys. today

【在 d****i 的大作中提到】

c***n
发帖数: 809

allow_multi 还是eventual consistency. 只是允许sibling而已， sibling什么
时候sync到还是eventual

【在 N*****m 的大作中提到】

: C*和Riak不能算问题，本来就是eventually consistent
: Riak如果把allow-mult打开，是可以保证strong consistency的
: 另外，ectd和consul也没有问题
:
: maybe-

w**z
发帖数: 8232

我以前发过coupon, 半价的。

【在 g*****g 的大作中提到】

: 你得买200刀的priority pass就能进去。

相关主题
● Re: 问Zhaoce个问题 (转载)	● 鄙视芒果的被打脸了
● 请大牛来谈谈对Solr的看法	● oracle coherence
● 清净版：写一个Complete Failover Handbook吧	● Cassandra VS ElasticSearch 一般 logging 哪个好
进入Programming版参与讨论

p*u
发帖数: 2454

cluster benchmarks just came out:
http://www.scylladb.com/technology/cassandra-vs-scylla-benchmar

【在 z****e 的大作中提到】

: the biggest latency is network io
: that is why goodbug says need to set rf = 2
: then c the benchmark again
: it is all distributed sys. today

l*********s
发帖数: 5409

SeaStar framework looks interesting and modern.

【在 p*u 的大作中提到】

: cluster benchmarks just came out:
: http://www.scylladb.com/technology/cassandra-vs-scylla-benchmar

p*u
发帖数: 2454

it has an implementation of "future" for async programming...

【在 l*********s 的大作中提到】

: SeaStar framework looks interesting and modern.

(共1页)

进入Programming版参与讨论

相关主题
● Cassandra vs MongoDB	● 请大牛来谈谈对Solr的看法
● 谁用过Playorm 连nosql数据库	● 清净版：写一个Complete Failover Handbook吧
● 今天Cassandra summit 的感想。	● 鄙视芒果的被打脸了
● 奉劝一句那些动不动就谈架构的傻逼，谨言慎行	● oracle coherence
● 还是别争了，从旁观者角度看，两个方案没准都能工作	● Cassandra VS ElasticSearch 一般 logging 哪个好
● big data怎么搞定商业报表？	● 开源的轮子
● mongoDB跟传统关系数据库比有什么优势?	● Why You Should Never Use MongoDB
● Re: 问Zhaoce个问题 (转载)	● wwzz来讲讲cassandra吧

相关话题的讨论汇总
话题: c++话题: cassandra话题: times话题: faster话题: java

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天