由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Database版 - 怎么初始化大表?
相关主题
[转载] Can anyone interpret this simple SQL?请教高手,包子谢
How to get other columns after UNION?也問 Common Table Expression 问题
correlated subqueryurgent help! insert value into table
About INSERT IGNOREA sql question
问一个SQL Server的问题MS T-SQL 问题
sql的2个问题 (转载)How to replace 0 with empty?
sql server 面试题 (6)Import 50GB data from multiple .txt file into MS SQL database
question about MS SQL mirroring...请问如何实现这样一个mysql的query, 谢谢
相关话题的讨论汇总
话题: select话题: table话题: insert话题: logging话题: minimal
进入Database版参与讨论
1 (共1页)
i********r
发帖数: 12113
1
SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有
D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。
我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建
primary key要快一些,还是慢。
有什么方法能快速建这个表?建Partitioned table有用么?
p********l
发帖数: 279
2
据我的经验,大多数情况下,SELECT INTO貌似比INSERT要快一点,还有就是预先分配
足够的空间给这个表所在的文件组。

【在 i********r 的大作中提到】
: SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有
: D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。
: 我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建
: primary key要快一些,还是慢。
: 有什么方法能快速建这个表?建Partitioned table有用么?

y****9
发帖数: 144
3
1. Why 20 min is not good enough?If this is a one time job, I would put
more emphasis on correctness of implementation than speed
2. If speed is a concern, I am thinking of in SQL server, clustered index is
preferred to heap table. So PK is usually your clusterd index, so if you
can inset into the table with data ordered by (A, B, C), may be it can be
faster. i.e create the table structure with clustered index , then insert
with orderd records.
just my 2 cents.

【在 i********r 的大作中提到】
: SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有
: D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。
: 我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建
: primary key要快一些,还是慢。
: 有什么方法能快速建这个表?建Partitioned table有用么?

y*****g
发帖数: 677
4
如果是你的数据文件已经分配好了,我就不知道还能如何提高速度,
除了先插数据,then build INDEX。
通常是很快了。 50 M 行在20分钟完成,不算赖!
s**********o
发帖数: 14359
5
SELECT INTO是最快的了,因为没有LOG,
如果速度还慢说明你的内存和IO有问题,考虑
上更多的内存和速度快的STORAGE
i********r
发帖数: 12113
6
SELECT INTO快多了,现在只要6,7分钟

【在 p********l 的大作中提到】
: 据我的经验,大多数情况下,SELECT INTO貌似比INSERT要快一点,还有就是预先分配
: 足够的空间给这个表所在的文件组。

y****9
发帖数: 144
7
when you use 'select into', how do you populate null to the D, E, F etc
column?
when I use, for example, select ...,xxx as C, null as D into xxx from ...,
I find that D will have int data type. If I want D to be other data type,
how to do it?

【在 i********r 的大作中提到】
: SELECT INTO快多了,现在只要6,7分钟
B*****g
发帖数: 34098
8
不能把表建好后再加column吗?

.,
,

【在 y****9 的大作中提到】
: when you use 'select into', how do you populate null to the D, E, F etc
: column?
: when I use, for example, select ...,xxx as C, null as D into xxx from ...,
: I find that D will have int data type. If I want D to be other data type,
: how to do it?

y****9
发帖数: 144
9
I checked the BOL:
-- abount SELECT ... INTO clause
The amount of logging for SELECT...INTO depends on the recovery model in
effect for the database. Under the simple recovery model or bulk-logged
recovery model, bulk operations are minimally logged. With minimal logging,
using the SELECT… INTO statement can be more efficient than creating a
table and then populating the table with an INSERT statement.
---
But it seems that we can use minimal logging for isnert into select as well
as long as the following requirement is met:
--- from BOL http://technet.microsoft.com/en-us/library/ms174335(v=SQL.100).aspx
Using INSERT INTO…SELECT to Bulk Load Data with Minimal Logging
You can use INSERT INTO SELECT FROM
to efficiently transfer a large number of rows from one table, such as a
staging table, to another table with minimal logging. Minimal logging can
improve the performance of the statement and reduce the possibility of the
operation filling the available transaction log space during the transaction
.
Minimal logging for this statement has the following requirements:
The recovery model of the database is set to simple or bulk-logged.
The target table is an empty or nonempty heap.
The target table is not used in replication.
The TABLOCK hint is specified for the target table.
-------------------
So if we ensure minimal logging condition is met for both case, I don't
understand why SELECT ... INTO will be faster than INSERT INTO ... SELECT

【在 s**********o 的大作中提到】
: SELECT INTO是最快的了,因为没有LOG,
: 如果速度还慢说明你的内存和IO有问题,考虑
: 上更多的内存和速度快的STORAGE

y****9
发帖数: 144
10
valid option, but I am not sure to initailize 50 million row with NULL don't
take siginifcant time. If I remeber correctly, in Oracle it will take time.
I will test if get chance

【在 B*****g 的大作中提到】
: 不能把表建好后再加column吗?
:
: .,
: ,

相关主题
sql的2个问题 (转载)请教高手,包子谢
sql server 面试题 (6)也問 Common Table Expression 问题
question about MS SQL mirroring...urgent help! insert value into table
进入Database版参与讨论
y****9
发帖数: 144
11
I think I got wrong impression. Both in Oracle and SQL Server, when adding a
column to an existing table, if not specifying default value, it will be
NULL and it won't need to set the value row by row, so almost no time to
take.
So for OP, using 'select into' to create table with A, B, C column, then add
E, D, F is a valid option.

't
time.

【在 y****9 的大作中提到】
: valid option, but I am not sure to initailize 50 million row with NULL don't
: take siginifcant time. If I remeber correctly, in Oracle it will take time.
: I will test if get chance

y****9
发帖数: 144
12
In terms of the transaction log generated, I did some test. My observation/
conclusion:
----
in terms of the amount of transaction log used, using "INSERT INTO ...
SELECT" with "TABLOCK" hint is essentially same as using "SELECT ... INTO"
regardless of recovery mode. Without the "TABLOCK" hint, regardless of the
recovery mode, using "INSERT INTO ... SELECT" will generate same amount of
transaction log and 4 times more than that in the case of FULL recovery mode
with the hint ( i.e. 125406 KB vs 32958 KB in my test case). People think "
INSERT INTO ... SELECT" is slow may be because they don't always remember to
add this hint when appropriate.
---
Detailed test see : http://oracle-study-notes.blogspot.com/2012/11/sql-server-insert-into-select-vs-select.html

,
well

【在 y****9 的大作中提到】
: I checked the BOL:
: -- abount SELECT ... INTO clause
: The amount of logging for SELECT...INTO depends on the recovery model in
: effect for the database. Under the simple recovery model or bulk-logged
: recovery model, bulk operations are minimally logged. With minimal logging,
: using the SELECT… INTO statement can be more efficient than creating a
: table and then populating the table with an INSERT statement.
: ---
: But it seems that we can use minimal logging for isnert into select as well
: as long as the following requirement is met:

i********r
发帖数: 12113
13
SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有
D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。
我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建
primary key要快一些,还是慢。
有什么方法能快速建这个表?建Partitioned table有用么?
p********l
发帖数: 279
14
据我的经验,大多数情况下,SELECT INTO貌似比INSERT要快一点,还有就是预先分配
足够的空间给这个表所在的文件组。

【在 i********r 的大作中提到】
: SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有
: D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。
: 我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建
: primary key要快一些,还是慢。
: 有什么方法能快速建这个表?建Partitioned table有用么?

y****9
发帖数: 144
15
1. Why 20 min is not good enough?If this is a one time job, I would put
more emphasis on correctness of implementation than speed
2. If speed is a concern, I am thinking of in SQL server, clustered index is
preferred to heap table. So PK is usually your clusterd index, so if you
can inset into the table with data ordered by (A, B, C), may be it can be
faster. i.e create the table structure with clustered index , then insert
with orderd records.
just my 2 cents.

【在 i********r 的大作中提到】
: SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有
: D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。
: 我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建
: primary key要快一些,还是慢。
: 有什么方法能快速建这个表?建Partitioned table有用么?

y*****g
发帖数: 677
16
如果是你的数据文件已经分配好了,我就不知道还能如何提高速度,
除了先插数据,then build INDEX。
通常是很快了。 50 M 行在20分钟完成,不算赖!
s**********o
发帖数: 14359
17
SELECT INTO是最快的了,因为没有LOG,
如果速度还慢说明你的内存和IO有问题,考虑
上更多的内存和速度快的STORAGE
i********r
发帖数: 12113
18
SELECT INTO快多了,现在只要6,7分钟

【在 p********l 的大作中提到】
: 据我的经验,大多数情况下,SELECT INTO貌似比INSERT要快一点,还有就是预先分配
: 足够的空间给这个表所在的文件组。

y****9
发帖数: 144
19
when you use 'select into', how do you populate null to the D, E, F etc
column?
when I use, for example, select ...,xxx as C, null as D into xxx from ...,
I find that D will have int data type. If I want D to be other data type,
how to do it?

【在 i********r 的大作中提到】
: SELECT INTO快多了,现在只要6,7分钟
B*****g
发帖数: 34098
20
不能把表建好后再加column吗?

.,
,

【在 y****9 的大作中提到】
: when you use 'select into', how do you populate null to the D, E, F etc
: column?
: when I use, for example, select ...,xxx as C, null as D into xxx from ...,
: I find that D will have int data type. If I want D to be other data type,
: how to do it?

相关主题
A sql questionImport 50GB data from multiple .txt file into MS SQL database
MS T-SQL 问题请问如何实现这样一个mysql的query, 谢谢
How to replace 0 with empty?user_objects 与CAT view
进入Database版参与讨论
y****9
发帖数: 144
21
I checked the BOL:
-- abount SELECT ... INTO clause
The amount of logging for SELECT...INTO depends on the recovery model in
effect for the database. Under the simple recovery model or bulk-logged
recovery model, bulk operations are minimally logged. With minimal logging,
using the SELECT… INTO statement can be more efficient than creating a
table and then populating the table with an INSERT statement.
---
But it seems that we can use minimal logging for isnert into select as well
as long as the following requirement is met:
--- from BOL http://technet.microsoft.com/en-us/library/ms174335(v=SQL.100).aspx
Using INSERT INTO…SELECT to Bulk Load Data with Minimal Logging
You can use INSERT INTO SELECT FROM
to efficiently transfer a large number of rows from one table, such as a
staging table, to another table with minimal logging. Minimal logging can
improve the performance of the statement and reduce the possibility of the
operation filling the available transaction log space during the transaction
.
Minimal logging for this statement has the following requirements:
The recovery model of the database is set to simple or bulk-logged.
The target table is an empty or nonempty heap.
The target table is not used in replication.
The TABLOCK hint is specified for the target table.
-------------------
So if we ensure minimal logging condition is met for both case, I don't
understand why SELECT ... INTO will be faster than INSERT INTO ... SELECT

【在 s**********o 的大作中提到】
: SELECT INTO是最快的了,因为没有LOG,
: 如果速度还慢说明你的内存和IO有问题,考虑
: 上更多的内存和速度快的STORAGE

y****9
发帖数: 144
22
valid option, but I am not sure to initailize 50 million row with NULL don't
take siginifcant time. If I remeber correctly, in Oracle it will take time.
I will test if get chance

【在 B*****g 的大作中提到】
: 不能把表建好后再加column吗?
:
: .,
: ,

y****9
发帖数: 144
23
I think I got wrong impression. Both in Oracle and SQL Server, when adding a
column to an existing table, if not specifying default value, it will be
NULL and it won't need to set the value row by row, so almost no time to
take.
So for OP, using 'select into' to create table with A, B, C column, then add
E, D, F is a valid option.

't
time.

【在 y****9 的大作中提到】
: valid option, but I am not sure to initailize 50 million row with NULL don't
: take siginifcant time. If I remeber correctly, in Oracle it will take time.
: I will test if get chance

y****9
发帖数: 144
24
In terms of the transaction log generated, I did some test. My observation/
conclusion:
----
in terms of the amount of transaction log used, using "INSERT INTO ...
SELECT" with "TABLOCK" hint is essentially same as using "SELECT ... INTO"
regardless of recovery mode. Without the "TABLOCK" hint, regardless of the
recovery mode, using "INSERT INTO ... SELECT" will generate same amount of
transaction log and 4 times more than that in the case of FULL recovery mode
with the hint ( i.e. 125406 KB vs 32958 KB in my test case). People think "
INSERT INTO ... SELECT" is slow may be because they don't always remember to
add this hint when appropriate.
---
Detailed test see : http://oracle-study-notes.blogspot.com/2012/11/sql-server-insert-into-select-vs-select.html

,
well

【在 y****9 的大作中提到】
: I checked the BOL:
: -- abount SELECT ... INTO clause
: The amount of logging for SELECT...INTO depends on the recovery model in
: effect for the database. Under the simple recovery model or bulk-logged
: recovery model, bulk operations are minimally logged. With minimal logging,
: using the SELECT… INTO statement can be more efficient than creating a
: table and then populating the table with an INSERT statement.
: ---
: But it seems that we can use minimal logging for isnert into select as well
: as long as the following requirement is met:

i********r
发帖数: 12113
25
select CAST(NULL AS yourtype) as D

.,
,

【在 y****9 的大作中提到】
: when you use 'select into', how do you populate null to the D, E, F etc
: column?
: when I use, for example, select ...,xxx as C, null as D into xxx from ...,
: I find that D will have int data type. If I want D to be other data type,
: how to do it?

1 (共1页)
进入Database版参与讨论
相关主题
请问如何实现这样一个mysql的query, 谢谢问一个SQL Server的问题
user_objects 与CAT viewsql的2个问题 (转载)
Need help on a strange SQL server problemsql server 面试题 (6)
请教Load table的问题question about MS SQL mirroring...
[转载] Can anyone interpret this simple SQL?请教高手,包子谢
How to get other columns after UNION?也問 Common Table Expression 问题
correlated subqueryurgent help! insert value into table
About INSERT IGNOREA sql question
相关话题的讨论汇总
话题: select话题: table话题: insert话题: logging话题: minimal