第9页 - 关于indexed的讨论汇总 - 话题女王

全部话题 - 话题: indexed

B*****g
发帖数: 34098

来自主题: Database版 - 请教 sql server index问题

这个date使用到什么程度才能需要建个cluster index，而不是在pk上建cluster index。

z***y
发帖数: 7151

来自主题: Database版 - 请教 sql server index问题

对，在pk上建立clustered index 并不总是合理的。一切在于怎样运用这个field。

index。

t****n
发帖数: 263

来自主题: Database版 - 问个Index的问题

Which database are you using? If it is SQL Server, it looks like accountID+
date is a good candidate for the clustered index. In SQL Server, a clustered
index is actually the table itself, so have no fear of inserting.

z*3
发帖数: 33

来自主题: Database版 - 问个Index的问题

这个个人感觉还是一个比较复杂的问题，如果一般的index的话是指b+ tree，并不是每
次insert都会产生balance tree的操作。只有某个node全充满的时候才会，进行
balance tree操作，或许还有left rotation和right rotation的操作。
如果使用mysql的话，可以设定经常插入的数据库的engine是inndo，不要每次insert都
commit，先insert若干条，再commit，然后用master-slave replication把数据同步到
，经常查询的一个数据库，而这个数据库的engine是myisam。
oracle的话，个人感觉可以对date，建立一个bitmap index，这样的话，其实不会有
tree的运算，每次只是没有操作或是更新一下某个gap的up或low bound。然后建一个
job，每隔一段时间dbms_stats.gather_stats,这样的话，可以尽快更新execution
plan,达到更好的效果。

w*******e
发帖数: 1622

来自主题: Database版 - 问个Index的问题

不太同意
要是clustered index的话, 经常insert会产生fragmentation的(internal and
external), 对吧?
所以才有rebuild or reorganize index.....
(不是"no fear of insertting")

clustered

s******e
发帖数: 493

来自主题: Database版 - 问个Index的问题

bitmap index on date field, if date is really timestamp, that is abusing
bitmap index.

j*****n
发帖数: 1781

来自主题: Database版 - 问个Index的问题

less fragmentation will improve performance, right?
accountid is in the clustered index, index seek will perform.

i****a
发帖数: 36252

来自主题: Database版 - SQL 2000 create index 問題

no, they are INT. I was actually able to create index one-by-one by
issuing SQL commands
I found out doing this via managment studio interface will create all 3
indexes in one shoot in a single transaction... that's why it was taking
forever and filling up the log file

your

B*****g
发帖数: 34098

来自主题: Database版 - mysql index优化求助 (转载)

just test in our develop server.
3xxM count 2 secs.
where on no index take 12 secs.
where on index take 0.078 ms

c*****d
发帖数: 6045

来自主题: Database版 - mysql index优化求助 (转载)

ft，怪不得select count(*) from table只用3秒
这个plan走pk，当然快了
你的试验说明在你们的机器上，3M数据全表扫描要10秒左右，稍微慢了点
select count(*) from table -- 3 sec pk index scan
select count(*) from table where indexcol -- 0.087 sec index scan
select count(*) from table where noindexcol -- 12 sec full table scan
你的select count(pk) from table在oracle里可以加hint
强制用pk，优化下来应该在3 sec以内

c*****d
发帖数: 6045

来自主题: Database版 - mysql index优化求助 (转载)

unique不能保证not null
optimizer如果用index fast full scan
怎么知道count(unique index col)中null值的个数？
虽然在你这个case里col原来曾经是pk

v****s
发帖数: 1112

来自主题: Database版 - mysql index优化求助 (转载)

because there is no index for intVal in my case! the index btree is built
ontop of varchar type.
but i don't know how mysql recognize this type error and convert it secretly
. that's why it took 3 sec,
converting int to varchar

i****a
发帖数: 36252

来自主题: Database版 - When should I reorganize Index/Rebuuild Index?

Check index fragmentation.

B*****g
发帖数: 34098

来自主题: Database版 - When should I reorganize Index/Rebuuild Index?

Are you a administrator?

index

i****a
发帖数: 36252

来自主题: Database版 - When should I reorganize Index/Rebuuild Index?

then only rebuild the index that's fragmented

g***l
发帖数: 18555

来自主题: Database版 - When should I reorganize Index/Rebuuild Index?

是的，可以写个SP CHECK FRAMENTATION所有的INDEX，然后REBUILD FRAMENTATION到一
定程度，比如40%+的。注意REBUILD要在系统IDLE的时候半夜鸡叫的时候，不要有其它
JOB或者BACKUP RUN，看你的时间那么长，你的数据可能需要ARCHIVE了

a9
发帖数: 21638

来自主题: Database版 - When should I reorganize Index/Rebuuild Index?

index碎片影响到底有多大？感觉实际上没什么太大影响吧？

c*****d
发帖数: 6045

来自主题: Database版 - Doubts about clustered index

不要被名字迷惑了
oracle cluster-index table和ms sql server中的cluster index不是一个概念

g***l
发帖数: 18555

来自主题: Database版 - How to delete 40 millions records in a 400 millions indexed table fast?

建一个一模一样的TABLE名字是TABLE_TEMP，注意PK, INDEXES， BULK INSERT（或者BCP）你想留住
的RECORD INTO TABLE_TEMP，DROP OLD TABLE, RENAME TABLE_TEMP TO TABLE,
RECOMPILE TABLE,没做好INDEX，又没弄PARTITION的，只嫩用这种方法

s*****o
发帖数: 303

来自主题: Database版 - How to delete 40 millions records in a 400 millions indexed table fast?

试过disable index，然后 rebuild 么？不是drop index

g***l
发帖数: 18555

来自主题: Database版 - How to delete 40 millions records in a 400 millions indexed table fast?

没有INDEX不删的更慢么，DELETE WHERE上要有INDEX，然后BATCH的删

b******t
发帖数: 10

来自主题: Database版 - how to remove fulltext index?

table里有
fulltext(field1,field2);
如何把这个index去掉啊？
试了 ALTER TABLE table DROP INDEX/FULLTEXT fields ... 啥的，都不行

l******t
发帖数: 9

来自主题: Database版 - sql的index怎么用啊？

Index 其实就是一个b-tree, 类似binary search tree. 不用一个一个row去scan，查
数据时会快好多. Google "SQL index" 有好多教怎么用

a*****i
发帖数: 215

来自主题: DotNet版 - difference between property and indexer

i don't know what index you're talking about. do you mean indexed properties,
like the common Item property from collections? they allow you to get back
a colletion of values by optionally specifying a key or a combination of keys.

n*********g
发帖数: 75

来自主题: DotNet版 - difference between property and indexer

there is one indexer in C#.
indexer is a smart array just like property is smart field.

b******e
发帖数: 1861

来自主题: Java版 - solr shared index file solution

如果不用single solr server，而是每一个jvm都有一个solr service,有什么好方案解
决共享lucene index文件的读写问题？infinispan只支持inmemory的index。也没有其
他opensource的机遇文件的？

b******e
发帖数: 1861

来自主题: Java版 - solr shared index file solution

l*****k
发帖数: 587

来自主题: Programming版 - Google Index machine kills server

【以下文字转载自 Internet 讨论区】
发信人: leohawk (leohawk), 信区: Internet
标题: Google Index machine kills server
发信站: BBS 未名空间站 (Thu Oct 4 11:27:57 2007), 站内
I just rerealized that google index and query could
really hit a server so hard it kills it.
1. Google crawl is fine.
2. however if someone use the google appliance to do a query, the
security model request google check if current user can access
each and every returned pages, that will send a burst of
requests to apache, apache will send al

v****s
发帖数: 1112

来自主题: Programming版 - mysql index优化求助

目前的一个project需要在mysql里面查询两个node之间的value,
table columns:
LOOKUPTABLE (INT id, VARCHAR node1, VARCHAR node2, INT value)
问题是这个table有 10 millions rows, 一次select query时间大概是0.8 sec：
select value from LOOKUPTABLE where node1 = 'a' and node2 = 'b';
尝试用index来优化query,但是不知道最优的index应该是哪种？谢谢！包子有赏！

F********g
发帖数: 475

来自主题: Programming版 - TIOBE INDEX 靠谱不，最近PYTHON势头大减？

http://www.tiobe.com/index.php/content/paperinfo/tpci/index.htm

w***g
发帖数: 5958

来自主题: Programming版 - DynamoDB 只能在 create Table 时候建 indexes

就是像你说的这样，create table的时候把key设计好。如果不行的话就重新create
table再把数据导过去。Cassandra这类分布式的key-value store和传统数据库设计理
念不一样，所以用法也是不一样的。传统的key-value store的index一般就是B+-tree
或者hash table。这两者都假设random disk access，一旦cache不够用了并行读写甚
至单线程读写也就完蛋了。重新导一遍几十G的数据库都很费时费力了。而Cassandra的数
据据我的理解是按log方式存储的，也就是说新的数据来了就往文件最后面添加。这种
情况下就增加了建index的难度和性能。好处则是数据写入非常有效，而且因为有多台
机器多个硬盘同时读写，重新导一遍数据就跟玩似的。而且因为用的廉价硬盘，空间极
大，不在乎多保存几个copy的数据。新兴的互联网公司有点前途的都是指数增长的，也
就是说一个时间段新增的数据量基本和之前所有积累的数据量相当，所以隔断时间重新
导一下可以作为一个常态。
MongoDB跟Cassandra很不一样，更接近传统数据库的设计，... 阅读全帖

e********2
发帖数: 495

来自主题: Programming版 - 有谁能讲讲Cassandra secondary index的？

关键是符合secondary index的使用条件，对应与每个index value的rows至少有200个。
成千上万都有可能。

g*****g
发帖数: 34805

来自主题: Programming版 - 有谁能讲讲Cassandra secondary index的？

简单一句话，数据量很大的话，不要用secondary index。自己弄个index CF.

何。

l******9
发帖数: 579

来自主题: Programming版 - create a unique primary key that can be indexed IBM netezza (转载)

【以下文字转载自 Database 讨论区】
发信人: light009 (light009), 信区: Database
标题: create a unique primary key that can be indexed IBM netezza sql
发信站: BBS 未名空间站 (Sun Nov 16 23:34:32 2014, 美东)
I need to create a unique primary key that can also be indexed in IBM
netezza sql server ?
I find that netezza does not support UNIQUE key.
Are there some ways to get around this problem ?
Thanks

z****e
发帖数: 54598

来自主题: Programming版 - TIOBE Index for December 2015

不只，js什么也都过了，那个index比去年还下降了0.7%
看热度今年3月是一个高潮，年中泡沫破灭
跟这里讨论的topics差不多，年初还比较多，年中就基本上都销声匿迹了
上一次java开始爬升就是.com泡沫破灭的时候，这一次也差不多
java的index感觉就是一个泡沫反指，只要低潮，说明泡沫正high
反过来，说明泡沫破灭，回归传统价值的时候到了
明年不出意外的话，共和党将会上台，法国那边极右翼都有可能出总统
全世界范围内右转，估计又要打战了，qe一结束，投资人会更加保守
传统的大公司会有一阵春天

w*s
发帖数: 7227

来自主题: Programming版 - Pandas 的index真他妈的奇特

rawx = pd.DataReader("agio", "yahoo", start, end)['Low']
print "-----00------"
print rawx[0:2]
print rawx[0:2].index
print rawx[0:2].index.values
出来结果
-----00------
Date
2015-12-01 62.330002
2015-12-02 63.730000
Name: Low, dtype: float64
DatetimeIndex(['2015-12-01', '2015-12-02'], dtype='datetime64[ns]', name=u'
Date', freq=None)
['2015-11-30T19:00:00.000000000-0500' '2015-12-01T19:00:00.000000000-0500']
最后2行为啥时间不一样？

n*********u
发帖数: 1030

来自主题: Programming版 - Index PDF和doc 是elasticsearch还是solr

两个背后的搜索核心都是lucene，搜索功能上两者基本一样。
ES更容易入门一点，dump进去的数据他会自动识别加index。关键是所有任务都是可以
用rest api解决。
solr就有点古老，index设定必须要写好config文件才可以往里加数据，config文件则
是非常昂长的xml文件，前段时间公司的东西加了个config，其实真正只有20行的新设
定，硬是出现了个上千行的diff。
设定好了后，搜索方面两者工作量差不多。

d******c
发帖数: 2407

来自主题: Programming版 - Jeff Dean新花样，deep learning做index

概念上挺有意思，了解数据的特点可以选择更合适的index数据结构，就好比不同类型
的数据有不同的最佳压缩算法一样。然后靠DP来选择。把index当成model。
以后靠ML来代替engineer的手工调试是一个大趋势。等到ML本身的调试也能用ML实现就
更进一步。
看来搞backend以后不了解ML是不行了

H****J
发帖数: 326

来自主题: TeX版 - 如何自定义INDEX的列数？

用\printindex生成的Index都是两列的。
有什么办法可以自定义INDEX的列数，比如三列？
谢谢。

l******n
发帖数: 9344

来自主题: TeX版 - latex index问题

想把index写成下面的形式
chapter 1 XXX.......................1
chapter 1.1 XXXXX....................2
index.sty可以解决吗?
还有标题的大小写,怎么弄?

T*******n
发帖数: 493

来自主题: TeX版 - latex index问题

Shouldn't this be in the table of contents and not duplicate in the index?
I don't think I've ever seen chapter/section entries in an index.

M*******s
发帖数: 108

来自主题: Biology版 - 问一下，关于教授的H-index

102 references between 1967 and 2007
102 references cited (6363 citations)
62.38 citations per reference (median=30.00)
62.38 citations per cited reference (median=30.00)
h-index: 40 (a=3.98, m=1.00)
g-index: 78
如述，这个教授算不算学术上很牛阿？？？

w********h
发帖数: 12367

来自主题: Biology版 - 院士候选人的H-index

H-index要和citation结合着看。
有的文章少些，H-index就上不去，但citation主要靠牛文来挣。

y******8
发帖数: 1764

来自主题: Biology版 - 要是饶毅H-index真的只有34...

H-index只适用于一般人对于不了解的领域学者的评论和文献研究。
据我所知，没有什么知名荣誉的评选是基于H-index或者citation的。做科学，原创性
还是最重要的吧。

l*h
发帖数: 4124

来自主题: Biology版 - Re: 要是饶毅H-index真的只有34... (转载)

【以下文字转载自 Joke 讨论区】
发信人: lmh (Low, Medium, High), 信区: Joke
标题: Re: 要是饶毅H-index真的只有34... (转载)
发信站: BBS 未名空间站 (Sat Sep 10 21:45:09 2011, 美东)
发信人: peoplem (我爱我家), 信区: Biology
标题: Re: 要是饶毅H-index真的只有34...
发信站: BBS 未名空间站 (Wed Aug 31 17:14:04 2011, 美东)
我不知道你就是随便一说还是知道谁真的“浪费了4-5年” 说实话饶毅别的言行我也
有不尽同意的地方但是说到怎么做mentor 我来美国这么多年在国内也呆过n个不同的
机构对学生的照顾和支持方面还从来没有觉得谁能超过饶毅
哪？

b**z
发帖数: 1351

来自主题: Biology版 - H-Index 到底代表什么意思啊?

发表文章数和引用数，数字交叉数值
比如，你发表20篇文章，有15篇每篇都超过15次引用，则H-INDEX＝15
申请教职，H-INDEX=15基本是个标准
各学术领域
大牛级 >50
小牛级 >30

g**********t
发帖数: 475

来自主题: Biology版 - H-Index 到底代表什么意思啊?

这得看领域，还有教授是否愿意灌水。我做的这行当的顶级大牛，开山鼻祖，H-index
也就不到三十的样子。不过他的文章的总引用数可是两万五千以上的。H-index比较低
的原因是文章少，一辈子只有六十篇左右的文章。

b******k
发帖数: 2321

来自主题: Biology版 - H-index 多高算比较好的PI

这个确实
说老实话我一直不明白H-index有什么意义。我要是每篇paper都引用之前自己所有的
paper,发20篇平顶山煤矿学院学报 H-index也有10。。

z*******6
发帖数: 679

来自主题: Biology版 - H-index 多高算比较好的PI

难道大家都不是用Web of science看H-index的吗？我以为那是权威的...
H-index不是指“N篇paper引用率在N以上“吗？... 大家所说的不同算法都是什么啊？
不过貌似确实一般都不排除合作paper... 但是如果想算独立的paper也是可以算的...

Z******5
发帖数: 435

来自主题: Biology版 - H-index 多高算比较好的PI

我也觉得H-index应该专门搞一个只包括第一作者或通讯作者的文章的指标，可以叫FC-
index，而且应该注明从发第一篇文章算起的年份，比如5年的，就是FC-index5，这样
信息更准确一些。

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天