由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Programming版 - HOW TO DELETE IN KEY-VALUE STORE
相关主题
问个C++中重复删除指针的问题好热闹啊
为什么Cache LRU多用doubly linked list而不是single linked list来实现呢?Indiana大学的牛人
没有data store的model有什么用吗?说说12306需要多少台机器
小公司的网站也要用memcached之类的cache吗?mongo,dynamo,cassandra,hbase ,谁会是赢家,谁会落寞?
我的一个客户案例(high traffic),请大家批判分析指点分布式系统back end需要哪些知识?
git的官方文档真叫一个烂Spark 和 Dynamodb 之间 如何 连接
Android memory leaknode- mongoose的思考
linux 能查到 deleted file list 吗有多少人的公司再用卡三爪?跟呆拿魔比 有什么优势么?
相关话题的讨论汇总
话题: delete话题: log话题: value话题: key话题: store
进入Programming版参与讨论
1 (共1页)
k****r
发帖数: 807
1
To DANIU men,
I am trying to study distributed key-value store. And have a question.
Like Dynamo and Voldemort, the data is append-only written, and the index is
caches. I understand the write and retrieve procedures. However, how to
deal with delete in these systems?
Thanks,
g*****g
发帖数: 34805
2
Isn't delete just a write? Delete operation is appended in commit log,
during compaction, the row is removed.

is

【在 k****r 的大作中提到】
: To DANIU men,
: I am trying to study distributed key-value store. And have a question.
: Like Dynamo and Voldemort, the data is append-only written, and the index is
: caches. I understand the write and retrieve procedures. However, how to
: deal with delete in these systems?
: Thanks,

w***g
发帖数: 5958
3
我不知道具体系统怎么实现的。log structure storage的惯用做法应该是
先把对应的记录定位了,打上叉叉表示删掉了。(能search就能定位记录)
然后用打叉叉的记录多了以后用garbage collection来回收空间。

is

【在 k****r 的大作中提到】
: To DANIU men,
: I am trying to study distributed key-value store. And have a question.
: Like Dynamo and Voldemort, the data is append-only written, and the index is
: caches. I understand the write and retrieve procedures. However, how to
: deal with delete in these systems?
: Thanks,

k****r
发帖数: 807
4
re: Isn't delete just a write? Delete operation is appended in commit log,
during compaction, the row is removed.
If delete is one special type of write, can I understand the it as a rewrite
on a key?
BTW, how to swap the page cache for keys? Lets say, the older data file and
index file are like keyA-valueA1, keyB-valueB1, keyC-valueC1, and the new
coming ones are: keyA-valueA2, keyC-valueC2. Then, what is swap doing on
cache? it should still be like keyA...keyB...keyC, but only value of A and B
are updated, right? Then, if the new operation is delB, the keyB value is
set as a special value?
re: 我不知道具体系统怎么实现的。log structure storage的惯用做法应该是
先把对应的记录定位了,打上叉叉表示删掉了。(能search就能定位记录)
然后用打叉叉的记录多了以后用garbage collection来回收空间。
garbage collection is operated during SWAP to new version of data?
g*****g
发帖数: 34805
5
Every row/column has a flag, you mark it as invalid, that's a delete. All
write operations will also invalidate and/or update the cache for given row.

rewrite
and
B

【在 k****r 的大作中提到】
: re: Isn't delete just a write? Delete operation is appended in commit log,
: during compaction, the row is removed.
: If delete is one special type of write, can I understand the it as a rewrite
: on a key?
: BTW, how to swap the page cache for keys? Lets say, the older data file and
: index file are like keyA-valueA1, keyB-valueB1, keyC-valueC1, and the new
: coming ones are: keyA-valueA2, keyC-valueC2. Then, what is swap doing on
: cache? it should still be like keyA...keyB...keyC, but only value of A and B
: are updated, right? Then, if the new operation is delB, the keyB value is
: set as a special value?

w***g
发帖数: 5958
6
按goodbug说的,日志里就是
keyA-valueA1
keyB-valueB1
keyC-valueC1
keyA-valueA2,
keyC-valueC2.
delB
读的时候往回找,先找到啥是啥。看到delB就表示B已经没了。
这就真是纯log structure了。
不过我没想明白on-disk索引怎么做,所以说了打叉叉那个办法。
log structure当时被想出来的假设是大内存能够保证足够高的cache hit rate,
所以读磁盘的效率差并不重要,而按log的顺序写入能保证最大化写入吞吐量。
cache在内存中,更新和log顺序没有关系。如果索引在内存中,也和log没关系。
其实以后SSD多了,顺序写的优势也就不再那么重要了。

rewrite
and
B

【在 k****r 的大作中提到】
: re: Isn't delete just a write? Delete operation is appended in commit log,
: during compaction, the row is removed.
: If delete is one special type of write, can I understand the it as a rewrite
: on a key?
: BTW, how to swap the page cache for keys? Lets say, the older data file and
: index file are like keyA-valueA1, keyB-valueB1, keyC-valueC1, and the new
: coming ones are: keyA-valueA2, keyC-valueC2. Then, what is swap doing on
: cache? it should still be like keyA...keyB...keyC, but only value of A and B
: are updated, right? Then, if the new operation is delB, the keyB value is
: set as a special value?

k****r
发帖数: 807
7
Thank you for your reply. And it should make sense, especially for the
system with the LRU cache.
However, I don't remember there is related field in Dynamo/Voldemort systems
. Also, I though its caching for versions is not similar with LRU....

row.

【在 g*****g 的大作中提到】
: Every row/column has a flag, you mark it as invalid, that's a delete. All
: write operations will also invalidate and/or update the cache for given row.
:
: rewrite
: and
: B

w***g
发帖数: 5958
8
log-structured file system是1990年左右出现的,在HDD时代是一个很牛B的发现。
其实现在SSD盛行后已经没啥优势了。之所有目前盛行,我觉得主要是industry还没缓
过劲来,
还有就是实现比较容易。类似的,event-vs-thread以及nginx的牛B是前multi-core时
代的事情。
现在核越来越多了,单核计算能力也越来越强,未来的方向是明显利于thread的。
但是industry还没缓过劲来,所以会阶段性地出现continuation passing style大行其
道的局面。
node.js之类的应该火不了几年了。

【在 w***g 的大作中提到】
: 按goodbug说的,日志里就是
: keyA-valueA1
: keyB-valueB1
: keyC-valueC1
: keyA-valueA2,
: keyC-valueC2.
: delB
: 读的时候往回找,先找到啥是啥。看到delB就表示B已经没了。
: 这就真是纯log structure了。
: 不过我没想明白on-disk索引怎么做,所以说了打叉叉那个办法。

k****r
发帖数: 807
9
Also, what is the relations between the log, and the data in disk/memory.'
Thanks,
g*****g
发帖数: 34805
10
分布式系统里删除特殊的地方在于不是所有节点都会收到这条指令,可能会丢。如果你
真的删除了数据,你怎么确定是一个写操作这个节点没收到,还是这个删除别的节点没
收到?
所以 Cassandra一类的系统是写入一个特殊值叫 tombstone,只在一定时间之后做
compaction,这样出错的概率就很低。

【在 w***g 的大作中提到】
: 按goodbug说的,日志里就是
: keyA-valueA1
: keyB-valueB1
: keyC-valueC1
: keyA-valueA2,
: keyC-valueC2.
: delB
: 读的时候往回找,先找到啥是啥。看到delB就表示B已经没了。
: 这就真是纯log structure了。
: 不过我没想明白on-disk索引怎么做,所以说了打叉叉那个办法。

相关主题
git的官方文档真叫一个烂好热闹啊
Android memory leakIndiana大学的牛人
linux 能查到 deleted file list 吗说说12306需要多少台机器
进入Programming版参与讨论
g*****g
发帖数: 34805
11
Every system is different, my knowledge is on Cassandra and it's not
necessary accurate for other system. Think of it as one possible solution.

systems

【在 k****r 的大作中提到】
: Thank you for your reply. And it should make sense, especially for the
: system with the LRU cache.
: However, I don't remember there is related field in Dynamo/Voldemort systems
: . Also, I though its caching for versions is not similar with LRU....
:
: row.

w***g
发帖数: 5958
12
传统数据库的数据主要存在一个磁盘上的B+树(或者hash)表中,log是一个为了保证数据
完整性的机制。更新B+树之前先把对应的操作写入log中。这样如果B+树更新到一半系统
断电导致数据结构损坏,可以通过replay log的办法重建B+树。后来人们发现其实所有
的数据都在log里,要用的时候去log里也能找出数据来,就觉得可以把B+树扔掉了。
然后log就变成了磁盘上唯一的数据结构。数据库的log和一般程序的日志不一样。
数据库的log存的是数据本身,所以必须存在于磁盘上。内存中只能是索引或者cache。

【在 k****r 的大作中提到】
: Also, what is the relations between the log, and the data in disk/memory.'
: Thanks,

k****r
发帖数: 807
13
Many thanks for DANIUMEN!!! I think I still have a lot of things to learn:)
w**z
发帖数: 8232
14
In case of C*, make sure to run repair at least once within GC grade period.
Otherwise, tombstones may come back to live. As goodbug said, deletes might
not get to all the nodes because of the eventual consistency, and repair
will fix that.
Read this blog if you want to know more, delete in distributes system is
tricky.
http://thelastpickle.com/blog/2011/05/15/Deletes-and-Tombstones

【在 g*****g 的大作中提到】
: 分布式系统里删除特殊的地方在于不是所有节点都会收到这条指令,可能会丢。如果你
: 真的删除了数据,你怎么确定是一个写操作这个节点没收到,还是这个删除别的节点没
: 收到?
: 所以 Cassandra一类的系统是写入一个特殊值叫 tombstone,只在一定时间之后做
: compaction,这样出错的概率就很低。

w**z
发帖数: 8232
15
commit log is for durability. C* writes to memtable along with commit log.
In case node crashes before the memtable is flushed to disk, it will recover
from commit log.
For reads, it doesn't go to commit log, it goes to memtable and sstable and
merge them.

【在 w***g 的大作中提到】
: 按goodbug说的,日志里就是
: keyA-valueA1
: keyB-valueB1
: keyC-valueC1
: keyA-valueA2,
: keyC-valueC2.
: delB
: 读的时候往回找,先找到啥是啥。看到delB就表示B已经没了。
: 这就真是纯log structure了。
: 不过我没想明白on-disk索引怎么做,所以说了打叉叉那个办法。

k****r
发帖数: 807
16
Nice information. Let me study first :P

period.
might

【在 w**z 的大作中提到】
: In case of C*, make sure to run repair at least once within GC grade period.
: Otherwise, tombstones may come back to live. As goodbug said, deletes might
: not get to all the nodes because of the eventual consistency, and repair
: will fix that.
: Read this blog if you want to know more, delete in distributes system is
: tricky.
: http://thelastpickle.com/blog/2011/05/15/Deletes-and-Tombstones

1 (共1页)
进入Programming版参与讨论
相关主题
有多少人的公司再用卡三爪?跟呆拿魔比 有什么优势么?我的一个客户案例(high traffic),请大家批判分析指点
为什么大牛说hbase是strong consistency的?git的官方文档真叫一个烂
关于aws问goodbug老师一个问题Android memory leak
AWS早上有挂了linux 能查到 deleted file list 吗
问个C++中重复删除指针的问题好热闹啊
为什么Cache LRU多用doubly linked list而不是single linked list来实现呢?Indiana大学的牛人
没有data store的model有什么用吗?说说12306需要多少台机器
小公司的网站也要用memcached之类的cache吗?mongo,dynamo,cassandra,hbase ,谁会是赢家,谁会落寞?
相关话题的讨论汇总
话题: delete话题: log话题: value话题: key话题: store