关于ncol的讨论汇总 - 话题女王

i**********e
发帖数: 1145

来自主题: JobHunting版 - 新鲜onsite面经

我写的 boggle 游戏算法，DFS + trie.
一秒以内给出所有 5x5 的答案。
#include
#include
#include
#include
#include
#include
#include
using namespace std;
struct Trie {
bool end;
Trie *children[26];
Trie() {
end = false;
memset(children, NULL, sizeof(children));
}
void insert(const char *word) {
const char *s = word;
Trie *p = this;
while (*s) {
int j = *s-'A';
assert(0 <= j && j < 26);
if (!p->childre... 阅读全帖

M**********s
发帖数: 8

来自主题: JobHunting版 - 这个题用四维DP怎么做呢？

以下讨论都先设m==n，也就是资料为方阵，complexity写起来比较简单
如果非方阵的话设N=max(m, n)，big O还是对的
这题细节满多
我原本以为自己可以一次写对O(N^3)的解，但OJ发现有错
建议觉得自己能一次写对的人都试着自己写写看 http://soj.me/1767

四维DP的time complexity O(N^4)显然可以更好
如同前面说的三维可以达到time complexity O(N^3)
这题就是同时歸划两条左上角到右下角的不重疊路线，并最大化得分
要訣是每次同时考虑两条路线的同一步
设d为离左上角的曼哈頓距离
d=1第一步，考虑(0, 1), (1, 0)
只能选左线(0, 1)右线(1, 0)
d=2时一步，考虑(0, 2), (1, 1), (2, 0)
有左(0, 2)右(1, 1)、左(0, 2)右(2, 0)、左(1, 1)右(2, 0)三种选择
可以定义DP表dp[d][c1][c2]为
考虑左线进行到(r1, c1)，右线进行到(r2, c2)时的最大得分
r1 = d - c1, r2 = d - c2
这也是前面有人提到用对... 阅读全帖

M**********s
发帖数: 8

来自主题: JobHunting版 - twitter 一题

注意
1.Up[0]+...Up[i-1] == Down[i+1]+...Down[n-1]只与i相关
2.Left[0]+..Left[j-1] == Left[j+1]...Left[m-1]只与j相关
故1式成立时，称i行为平衡行
当2式成立时，称j行为平衡列
行列的关系彼此独立，故平衡数总个数=平衡行数*平衡列数
剩下就是一些实作细节问题
其实只要记住每行每列和，在迴圈中就可判断是否为平衡行列了
下面是O(m+n) space, O(m*n) time的实作
其实还满简单的，只是稍长了些
int balancingCells(vector >& matrix) {
if (matrix.empty()) return 0;
int nRow = matrix.size(), nCol = matrix[0].size();
vector rowSum(nRow, 0), colSum(nCol, 0);
int total = 0;
for (int r=0; r for (int c=... 阅读全帖

r********g
发帖数: 1351

来自主题: JobHunting版 - request solutions to 2 questions on leetcode

写了一个，可能有点罗嗦。。
class Solution {
public:
void setZeroes(vector > &M) {
// Start typing your C/C++ solution below
// DO NOT write int main() function

int nRow = M.size();
if(nRow <= 0) return;
int nCol = M[0].size();
if(nCol <= 0) return;

bool rowZero = false;
bool colZero = false;
for(int i = 0; i < nRow; i++) if(!M[i][0]) colZero = true;
for(int j = 0; j < nCol; j++) if(!M[0][j]) rowZer... 阅读全帖

r********g
发帖数: 1351

来自主题: JobHunting版 - request solutions to 2 questions on leetcode

n******7
发帖数: 12463

来自主题: Linux版 - 请教一个打印列的问题

用awk算列数不划算，而且可能出错，如果每一行都有空白列的话，换成grep好点吧...
---
只会用shell基本功能，蛋疼的写了一下，觉得这样搞还不如用perl/python了...
#ncol=`awk '{print NF}' $1 | sort -r | head -1`
ncol=`head -1 $1 | grep -o $'\t' | wc -l`
ncol=$((ncol+1))
nrow=`wc -l $1| cut -f 1 -d ' ' `
good_cols=()
for ((i=1;i<=$ncol;i++))
do
non_blank=`cut -f $i $1 | grep -v '^$' |wc -l | cut -f 1 -d ' '`
if (( $(echo "$non_blank == $nrow" | bc -l) ))
then
good_cols=("${good_cols[@]}" $i)
fi
done
SAVE_IFS=$IFS
IFS=","
cols="${good_cols[*]... 阅读全帖

t*****w
发帖数: 254

来自主题: Statistics版 - 请问面试 R 应该怎么准备？

When I had my job interview, they always tested my SAS skill.However I use R
all the time. To help your preparation, read my R codes to see how much you
can understand it.
%in%
?keyword
a<-matrix(0,nrow=3,ncol=3,byrow=T)
a1 <- a1/(t(a1)%*%spooled%*%a1)^.5 #standadization in discrim
a1<- a>=2; a[a1]
abline(h = -1:5, v = -2:3, col = "lightgray", lty=3)
abline(h=0, v=0, col = "gray60")
abs(r2[i])>r0
aggregate(iris[,1:4], list(iris$Species), mean)
AND: &; OR: |; NOT: !
anova(lm(data1[,3]~data1[,1... 阅读全帖

n******7
发帖数: 12463

来自主题: Linux版 - 请教一个打印列的问题

发现脑残了，row number 不需要算
要是能利用上paste,也许可以不用再cut一遍
ncol=`head -1 $1 | grep -o $'t' | wc -l`
ncol=$((ncol+1))
good_cols=()
for ((i=1;i<=$ncol;i++))
do
if !(cut -f $i $1 | grep -m 1 '^$' > /dev/null )
then
good_cols=("${good_cols[@]}" $i)
fi
done
SAVE_IFS=$IFS
IFS=","
cols="${good_cols[*]}"
IFS=$SAVE_IFS
cut -f $cols $1

d*******o
发帖数: 5897

来自主题: Programming版 - c++读写多个大文件的问题

在linux下面。有几百个text输入文件，每个文件2GB，每个文件里包含一个double 矩
阵，大概几千万个elements吧，每个文件的矩阵行数相等、列数也相等,假设行数为
NRow，列数为NCol吧。
要对每个输入文件的每个elements做些计算，这些计算有下面这些特点/约束：
1. 同一个输入文件里，不同行、列号的element是互相独立的，即对某一element的计
算不需另一个element的信息；
2. 不同文件里，行号、列号分别相等的element，它们是有关系的。如果把几百个文件
排序，对一个文件里的一个element的计算，需要用到前面的文件里行号、列号分别相
等的element的信息；
3. 对同一输入文件的所有（NRow * NCol个）element计算后,因为每个element对应一
个输出element，所以会有总共NRow * NCol个输出element，这些输出element都在一个
输出文件里；
4. 最后输出文件数目和输入文件数目相等，每个输出文件也包含一个和输入文件同样
大小的矩阵
这么大的数据量，不可能一次全读进内存，所以只能分次读；因为... 阅读全帖

D******n
发帖数: 2836

来自主题: Statistics版 - a R loop question

This is definitely worth some baozis...
kf=c(0,1,0,0,0,1,1,0,0,0,0,0)
kf=matrix(kf,3)
nrow <- nrow(kf);
ncol <- ncol(kf);
oripos <- which(t(kf)==1,arr.ind=T)[,1];
nowpos <- oripos;
rowptr <- 1;
cat(nowpos,'\n');
while (rowptr<=nrow)
{
if ( nowpos[rowptr] { nowpos[rowptr] = nowpos[rowptr]+1;
rowptr = 1;
cat(nowpos,'\n');
}
else {
nowpos[rowptr] = oripos[rowptr];
rowptr = rowptr + 1;
}
}

s*******a
发帖数: 705

来自主题: Statistics版 - 请教一个关于R的问题

cbind(A,B[,names(A)])[,rep(1:ncol(A),each=2)+rep(c(0,ncol(A)),ncol(A))]

o****o
发帖数: 8077

来自主题: Statistics版 - 可有用C++的同志？

就是奔腾mobile.R对应不同的CPU有多个基于ATLAS的RBLAS.DLL
【http://cran.r-project.org/bin/windows/contrib/ATLAS/】。现在提供了给C2D编译的新RBLAS.DLL，更快一些。我把这个新的跟原来用的给老CPU的比较了一下， C2D的比原配在解SVD上快一倍：
原配（R version 2.8.1）
> x<-matrix(rnorm(1000^2), ncol=1000)
> system.time(y<-svd(x))
用户系统流逝
8.42 0.05 8.50
>
###################
P3/P2:
> x<-matrix(rnorm(1000^2), ncol=1000)
> system.time(y<-svd(x))
用户系统流逝
6.50 0.11 6.69
>
###################
P4：
> x<-matrix(rnorm(1000^2), ncol=1000)
> system.time(y<-svd(x))
用户系统流逝
6.21 0.07 6.

f***a
发帖数: 329

来自主题: Statistics版 - 请教高人如何用一个表格的列去替换另一个表格的列？

把含有missing value的原始date set存在“dat_missing.txt”,把补全表分别存成“
cp_a.txt”,"cp_b.txt","cp_c.txt", (别忘了每个文件第一行是column name).然后运
行下面的R code就完了. 输出结果在“out.txt”里面.
dat.m <- read.table("dat_missing.txt",header=T)
out <- matrix(0,nrow(dat.m),ncol(dat.m)-1)
for(lt in letters[1:3])
{
indx <- dat.m[,1]==lt
r <- dat.m[indx,-1]
cp <- read.table(paste("cp_",lt,".txt",sep=""),header=T)[,-1]
dat <- cbind(r,cp)
n <- ncol(r); na <- ncol(cp);
res <- matrix(as.numeric(apply(dat,1,function(t)
{
rr <- t[1:n];

S******y
发帖数: 1123

来自主题: Statistics版 - 为什么biglm能处理data sets larger than memory?

最近在研究R - biglm.
R document 里说 - "biglm creates a linear model object that uses only p^2
memory for p variables. It can be updated with more data using update. This
allows linear regression on data sets larger than memory."
读了下面的 source code，还是没搞懂update具体是怎么实现的。。。
> biglm::biglm
function (formula, data, weights = NULL, sandwich = FALSE)
{
tt <- terms(formula)
if (!is.null(weights)) {
if (!inherits(weights, "formula"))
stop("`weights' must be a formula")
w <- ... 阅读全帖

y****6
发帖数: 264

来自主题: Statistics版 - 大家抱怨下R的问题吧

你说的有些道理, 但一般来说对同样类型的输入我们期望得到同样类型的结果。用这个
例子来说，对同样大小和类型的矩阵，apply(A, 1, table) 可以得到三种不同的结果
，实在是无语：
> apply(matrix(c(rep(1, 10), rep(2,10)), ncol=4), 1, table)
[,1] [,2] [,3] [,4] [,5]
1 2 2 2 2 2
2 2 2 2 2 2
> apply(matrix(c(rep(2, 16), rep(3,4)), ncol=4), 1, table)
[[1]]
2
4
[[2]]
2 3
3 1
[[3]]
2 3
3 1
[[4]]
2 3
3 1
[[5]]
2 3
3 1
> apply(matrix(c(rep(2, 16), rep(2,4)), ncol=4), 1, table)
[1] 4 4 4 4 4

i**********e
发帖数: 1145

来自主题: JobHunting版 - LI这题是不是没有比linear更好的解法了？

不用binary search，从右上或者左下角开始.
右上角：
每次走左一步（如果target数字小于此数字）或走下一步（如果target大于此数字）；
如果等于则返回true。
worst case nRows + nCols 完成。
有人已经证明不可能比 O(nRows + nCols) 更好：
http://www.quora.com/You-are-given-an-MxN-matrix-of-numbers-wit

a******e
发帖数: 124

来自主题: JobHunting版 - 问一个C++ delete 节点的问题

可以在main函数里delete掉，或者建一个新的function专门delete dynamically
located pointer吧.
如下例
http://www.codeproject.com/Articles/21909/Introduction-to-dynam
template
T **AllocateDynamicArray( int nRows, int nCols)
{
T **dynamicArray;
dynamicArray = new T*[nRows];
for( int i = 0 ; i < nRows ; i++ )
dynamicArray[i] = new T [nCols];
return dynamicArray;
}
template
void FreeDynamicArray(T** dArray)
{
delete [] *dArray;
delete [] dArray;
}
int main()... 阅读全帖

K*****2
发帖数: 9308

来自主题: Joke版 - 学术一下，飞机上需要多少个亚裔才能证明没有种族歧视

> x=matrix(c(54,1,14,3),ncol=2)
> fisher.test(x)
Fisher's Exact Test for Count Data
data: x
p-value = 0.03867
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.8184615 615.7103837
sample estimates:
odds ratio
11.0593
> x=matrix(c(55,1,13,3),ncol=2)
> fisher.test(x)
Fisher's Exact Test for Count Data
data: x
p-value = 0.03225
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.8907116 674.... 阅读全帖

G***G
发帖数: 16778

来自主题: Programming版 - matlab取行数

how to get the total number of rows in a matrix in MATLAB
for example,
A=[1 2 3;
4 5 6;
7 8 9];
[nrow,ncol]=size(A);
Is there other command which can return only nrow not ncol?

s********y
发帖数: 64

来自主题: Programming版 - 问个关于R的低级问题

以下程序在执行opt <- yahoo.getAllOptions("IBM")后可以把IBM的option数据下载到
temporary file，请问如何把temporary file的数据导出到一个TXT文件？
谢谢！
------------------------------------------------------------------
require(fCalendar)
require(fredImport)
## workaround for R 2.1.1:
Sys.timezone <- function ()
as.vector(Sys.getenv("TZ"))
yahoo.getOption <- function(ticker="QQQQ",maturity="2005-12",file="
tempfile01",method="internal",get.short.rate=TRUE) {
############################################################################... 阅读全帖

k**********g
发帖数: 989

来自主题: Programming版 - 怎样把“jackie chan"的字样去掉？ (转载)

Depends on how big the area is.
More precisely, what is the maximum distance from the set of missing pixels
to the nearest valid pixel.
The general technique is called Infilling.
Here's just an example from Google search result. http://research.microsoft.com/pubs/67276/criminisi_tip2004.pdf
When the text is very thin, a masked local averaging will do the job.
img = imread( filename ) ;
[ nrows, ncols, nchan ] = size ( img ) ;
imgColor = double ( img ) .* (1 / 255) ;
imgGray = rgb2gray ( imgColor... 阅读全帖

m*****n
发帖数: 3575

来自主题: Programming版 - R似乎根本就没有认真考虑过global variable的改写问题

一个简单问题——把矩阵某一列刷成零
--------------------------------------
#先做个矩阵
A = matrix(1:10, ncol=2)
#现在将它的第二列刷成零
A[,2]=0
#检查一下
A
#还原
A = matrix(1:10, ncol=2)
======================================
我想把这种功能固化进一个函数里，从广义上说是对数据的一部分进行修改
--------------------------------------
zero_col <- function(data, coln){data[,coln]=0; return(data)}
#应用此函数
zero_col(A,2)
#再检查A是否被修改了
A
======================================
这样会出来两个结果，改的只是函数里面的data，而不是全局变量A
那全局变量如何改？有个"<<-"是改全局变量的
--------------------------------------
zero_col ... 阅读全帖

l**********1
发帖数: 5204

来自主题: Biology版 - 有谁谈谈从零开始学NGS数据分析都需要具备什么知识？

Continue:
第四乐章 Finale
找有关的PhD dissertation 里边的 R source code program
while U can debug it or even rewrite it for another task,
then you already masted NGS coding skills.
比如
http://www.dspace.cam.ac.uk/handle/1810/218542
DSpace at Cambridge
title: Genome-wide analyses using bead-based microarrays
Authors: Dunning, Mark J
Issue Date: 4-Sep-2008
Files in This Item:
File Description Size Format
dunning_thesis_.pdf 10.47 MB Adobe PDF
its Appendix B
R source Code f... 阅读全帖

g****y
发帖数: 199

来自主题: Computation版 - [合集] Matlab输出的文件怎么读进fortran？

☆─────────────────────────────────────☆
walp (暂无) 于 (Thu Jul 10 01:05:02 2008) 提到:
用下面的语句生成一个30×30的矩阵，写成matrixA.dat文件
% generate a matrix
A = delsq(numgrid('C',8));
[nrow,ncol]=size(A);
% write the matrix to a .dat file
fid=fopen('matrixA.dat','w');
for i=1:nrow
for j=1:ncol
count=fwrite(fid, A(i,j),'double');
end
end
fclose('all');
这样一个简单的文件怎么在fortran77里读进来？我试了好多次open和read的参数都不
行，错误好几种，或者"formatted I/O on unformatted unit", 或者"record is too
long" 等等。
我不太熟悉fortran77的输入输出，自己试

h****s
发帖数: 16779

来自主题: Statistics版 - SAS问题请教

这样应该行了：
proc iml;
use data;
read all var {ID} into ID;
read all var {AA} into AA;
read all var {Trt} into Trt;
close data;
ID_Trt1_AA3 = ID[loc(AA = 3 & Trt = 1)];
Num_ID_Trt1_AA3_unique = nrow(unique(ID_Trt1_AA3)) * ncol(unique(ID_Trt1_AA3
));
Num_Trt1 = nrow(Trt = 1) * ncol(Trt = 1);
print Num_ID_Trt1_AA3_unique Num_Trt1;
quit;
手头没SAS，不过大约就是这样。想要你的输出格式需要再控制输出流。

s*****n
发帖数: 2174

来自主题: Statistics版 - 请教一个概率题的思路

向量本身, 是不分行向量和列向量的, 只有在转化为矩阵的时候, 才有这个问题. R会根
据需要把它当作行或者列向量. 比如这个例子:
> c(1,2) %*% matrix(c(1,2,3,4), ncol=2)
[,1] [,2]
[1,] 5 11
> matrix(c(1,2,3,4), ncol=2) %*% c(1,2)
[,1]
[1,] 7
[2,] 10
不过向量在转化成矩阵的时候, 默认是当作列向量. 比如这个例子:
> matrix(c(1,2))
[,1]
[1,] 1
[2,] 2
如果你写个向量, 明确要求是行向量, 一般可以用转置来实现:
> t(c(1,2))
[,1] [,2]
[1,] 1 2
这里面其实是先把向量转化成矩阵, 然后进行转置.

vector?

t**i
发帖数: 688

来自主题: Statistics版 - 关于R的Simplex的错误信息

enj = rnorm(3, mean=3,sd=1)
M1 = matrix(rnorm(40,mean=0,sd=1),ncol=4,byrow=TRUE)
M2 = matrix(rnorm(40,mean=0,sd=1),ncol=4,byrow=TRUE)
require(boot)
simplex(a=enj,A1=M1[,2:4],b1=t(M1[,1]),A2=M2[,2:4],b2=t(M2[,1]),maxi=TRUE)
Linear Programming Results
Call : simplex(a = enj, A1 = M1[, 2:4], b1 = t(M1[, 1]), A2 = M2[, 2:4],
b2 = t(M2[, 1]), maxi = TRUE)
Maximization Problem with Objective Function Coefficients
x1 x2 x3
2.088811 4.114023 2.513823
No feasible solution could be f

b**********e
发帖数: 531

来自主题: Statistics版 - SAS code question, special two do loop

Please help to look at this code, what are these two do loops doing ? Dniu
help me !
* IT is a dataset the previous program has created;
%MACRO FREQ;
PROC IML;
S="SUB1":"SUB9";
USE IT (KEEP=FORM LEVEL SUB1-SUB9);
READ ALL VAR S WHERE (FORM="NEWFORM") INTO M ;
USE IT (KEEP=FORM LEVEL SUB1-SUB9);
READ ALL VAR S WHERE (FORM="OLDFORM" ) INTO K ;
FREQ=J(MAX+1,2*NCOL(M),0);
DO I=1 TO NROW(K);
DO J=1 TO NCOL(K);
IF K[I,J] ^=. THEN

y********i
发帖数: 205

来自主题: Statistics版 - 请问下SAS执行中有什么单步执行之类的命令吗？为什么我的proc iml里设置的参数t无论怎么改，结果都不变呢？

我最近写了个如下的code，原来的code在definition部分有点问题，所以就把
definition的部分改了下。改之前t值的变化会影响结果的变化，但是我做了一点改动
之后，无论如何改动t值，结果都不变了，而且即使把t删掉，结果也不变。不知道是什
么原因？每次运行，log里都显示没有问题。但是如果把t值设置成0，结果会报错，不
知道有高人能指点下或大致判断下原因不？
proc iml;
/* Used for debugging only */
*reset log print details;
start COptimal(xx)
global (Weight, Var1, b0, b1, b2, t);
/* nrow(xx)=1, ncol(xx)=Nd*2 */
x=shape(xx,nrow(xx)*ncol(xx)/2,2);

F=j(nrow(x),1) || x[,1] || x[

g**r
发帖数: 425

来自主题: Statistics版 - 这个R程序能帮改进一下吗？

我的DATA:
a=matrix(c(3,4,5,6,7,6),nrow=2,byrow=TRUE)
我想要的结果：
b=matrix(0,nrow=2,ncol=10)
for(i in 1:nrow(a))for(j in 1:ncol(a))
b[i,a[i,j]]=1+b[i,a[i,j]]
但觉得我的这个方法也太土了。R玩的不熟，大家帮忙。

s*********e
发帖数: 1051

来自主题: Statistics版 - 求问一个R apply 函数的问题

is it what you need?
summ = function(x){
mean = mean(x)
sd = sd(x)
max = max(x)
list(mean=mean, sd=sd, max=max)
}
Z <- matrix(rnorm(100), nrow = 20, ncol = 5)
library(foreach)
test <- data.frame(foreach(i = 1:ncol(Z), .combine = rbind) %do% summ(Z[, i]
), row.names = NULL)
print(test)

a****u
发帖数: 95

来自主题: Statistics版 - 在R中, 如何实现没有duplicate的rbind?

How about using rbind first and aggregate
x<-matrix(1:10,ncol=2,byrow=TRUE)
y<-matrix(1:20,ncol=2,byrow=TRUE)
z<-rbind(x,y)
aggregate(z,by=list(z[,1],z[,2]),FUN=tail,1)
If only duplicate for one column need to be removed, can try use match and
rbind
index<-match(x[,1],y[,1])
rbind(x[-index,],y)

a*****n
发帖数: 230

来自主题: Statistics版 - Huge speed increase in R for scalar intensive computation

I have been waiting for this for 10 years. But this noon, I saw this line:
The byte code compiler and interpreter now include new instructions that
allow many scalar subsetting and assignment and scalar arithmetic operations
to be handled more efficiently. This can result in significant performance
improvements in scalar numerical code.
I immediately downloaded the latest R build and benchmark against a KD-tree
code I wrote.
R Build from 20 days ago: 8.45 minutes
R build of today: 5.01 minutes
A... 阅读全帖

k*******a
发帖数: 772

来自主题: Statistics版 - R 扫描matrix

## function to find sums of all subset matrix: k_row * k_col
move_sum <- function(x, k) diff(c(0, cumsum(x)), k)
move_sum_mat <- function(mat, k_row, k_col) {
new_mat <- t(apply(mat, 1, move_sum, k = k_col))
new_mat <- apply(new_mat, 2, move_sum, k = k_row)
new_mat
}
## test matrix
mat <- matrix(sample(1:10, 100, replace=T), nrow=10, ncol=10)
## get sums for all subset matrix with 4 rows and 9 cols
mymat <- move_sum_mat(mat, 4, 9)
## find starting row and col for max sub matrix
which(mym... 阅读全帖

y****6
发帖数: 264

来自主题: Statistics版 - 大家抱怨下R的问题吧

最近用R比较多，R确实比较方便，但设计的实在不严谨，用起来惊喜很多，感觉很不爽
。这不光是各个用户自己写的库的问题，R语言本身也毛病一堆。我抛砖引玉，举个例
子，大家有什么抱怨的跟上：
apply返回结果不一致，诸如
apply(matrix(c(rep(1, 10), rep(2,10)), ncol=4), 1, table)[,1]
没问题, 但
apply(matrix(c(rep(1, 10), rep(1,10)), ncol=4), 1, table)[,1]
就出错。

l*****a
发帖数: 14598

来自主题: JobHunting版 - leetcode上zigzag converstion那题怎么才能通过large？

你弄个stringbuilder
然后一列一列append，每列也是个stringbuilder,偶数列头尾没有reverse 之后在插入
总结果
###注意最后一列填满
最后对于总的string.
for(int j=0;j //这是一行的
for(int i=0;i str.charAt(i*col+j)
}

w******4
发帖数: 488

来自主题: Tennessee版 - This is a really long^3 week

> work <- work[order(work$weekdays),]
> ncol(sapply(split(work,work$weekdays),function(Z){
tmp<-Z[1,]
tmp}))
The result is 5. And this is the first week of Year 2012...

k****f
发帖数: 3794

来自主题: Programming版 - matlab取行数

nrow=size(A,1);
ncol=size(A,2);

t***q
发帖数: 418

来自主题: Programming版 - 急问，有包子，怎样提高SCRIPT 的EFFICIENCY.

My friend's code in R:
da1=read.csv(paste(filepath,"file1.csv",sep=""))
da2=read.csv(paste(filepath,"file2.csv",sep=""))
da1 <- t(da1)
da1 <- as.vector(da1)
da2 <- t(da2)
da2 <- as.vector(da2)
info <- matrix(NA,nrow=length(da1),ncol=40)
position <- 1:length(da2)
for(i in 1:length(da1))
{
a=levenshteinSim(da1[i],da2)
pos=position[a==max(a)]
temp=c(i,pos,max(a),da1[i],da2[pos])
info[i,1:length(temp)] <- temp
if(i %% 500 ==0)
cat("#")
}

t********o
发帖数: 48

来自主题: Unix版 - [转载] about header file

【以下文字转载自 Programming 讨论区】
【原文由 tsingditto 所发表】
这个header有问题么？
为什么每次include的时候，都说我：
'struct CSCMatrix' declared inside parameter list,
its scope is only this definition or declaration, which is probably not what
you want
#ifndef CAPLOT_H
#define CAPLOT_H
typedef struct
{
int nrow;
int ncol;
int nnz;
int *colptr;
int *rowind;
double *nzval;
double *dx;
double *dy;
} CSCMatrix;
extern double bmsize(int *rind, int *cptr,int m,int n);
extern void scale(struct CSCMatrix *csc);
extern void SAnetSVD(

l**********1
发帖数: 5204

来自主题: Biology版 - 如何做microarray的scatter plot图？

pls refer
> posted on FRIDAY, JULY 6, 2012
Fix Overplotting with Colored Contour Lines
I saw this plot in the supplement of a recent paper comparing microarray
results to RNA-seq results. Nothing earth-shattering in the paper - you've
probably seen a similar comparison many times before - but I liked how they
solved the overplotting problem using heat-colored contour lines to indicate
density. I asked how to reproduce this figure using R on Stack Exchange,
and my question was quickly answered b... 阅读全帖

h*********9
发帖数: 35

来自主题: Biology版 - 请教DNA methylation 计算

这些分分析定制的东西比较多，我还不知道那些软件可以，不过用R的话，半天就可以
搞定。
基本办法是把 ChIP－Seq peak region 和 methylated sites 都编码成 R GRange
objects，然后用 R 提供的 operations on GRanges ， find and count overlaps。
如果要计算 differentially methylated sites, you can use beta-binomial
regression. If you want to identify differentially methylated regions,
hidden Markov model is a good option.
最近写了一个 regression hidden Markov model for methylation data, 还没来及测
试。
Below is some sample code:
library(GenomicRanges)
library(GenomicFeatures)
library(da... 阅读全帖

s*******s
发帖数: 1568

来自主题: Quant版 - another interesting probability question

写了个R程序，
n = 100
m = 5
p = 0.5
v1= 0:n
s0=dbinom(v1,n,p)
Pmatrix = matrix(0, nrow = n+1, ncol = n+1)
for (i in 0:n) {
v1 = 0:i
Pmatrix[i+1,1:(i+1)] = dbinom(v1,i,p)
}

for (i in 1:m) {
s0=s0%*%Pmatrix
}

E = 0;
for (i in 0:n) {
E = E + s0[i+1]*i
}

print(E)

k*******d
发帖数: 1340

来自主题: Quant版 - the third round interview for commodities group@MS

3. 我觉得他在问Volatility smile. Hull书上有，那个distribution不是严格log
normal的，在K大的地方，tail比Log normal小，K小的地方，tail大，这个是equity
option。foreign currency option是两边都是fat tail
7.
int** Darray = new int*[nRow];
for (int i = 0; i Darray[i] = new int[nCol];
int a[3][4] is not dynamic.

?
it'

s*****n
发帖数: 2174

来自主题: Statistics版 - 突然对直线拟合的R不明白起来了

你这样当然不一样了. 同样大小的noise, 对于两个model的影响不一样.
第一model本身幅度就大, 相对noise的影响就小.
第二个model本身幅度就小, noise的影响相对就大.
公平的比较应该把第二个model里面的noise term减半.
比如
x <- 0:5
result <- matrix(NA, ncol=2, nrow=1000)
for (i in 1:1000){
y1 <- c(200, 190, 180, 170, 160, 150) + rnorm(6) * 5
y2 <- c(200, 195, 190, 185, 180, 175) + rnorm(6) * 2.5
result[i, 1] <- summary(lm(y1~x))$r.squared
result[i, 2] <- summary(lm(y2~x))$r.squared
}
apply(result, 2, mean)
[1] 0.9453929 0.9474898
两者差不多.
严格来说, noise term 不是严格两倍的关系,
只是近似而已. 严

s*****n
发帖数: 2174

来自主题: Statistics版 - 如何在R里面对一整列数据进行操作？

x <- seq(from = as.Date("1998-01-01"), by = "1 month", length = 60)
data <- matrix(as.character(x), ncol=12, byrow = T)
print(data)

q**j
发帖数: 10612

来自主题: Statistics版 - 如何在R里面对一整列数据进行操作？

我今天又试了一下，接近成功了一点。
y = unlist(lapply(Data$Date, seq,length=12))
z = matrix(y,nrow=8640,ncol=12,byrow=T)
可以生成这样的一个矩阵。
但是有几个新问题。
1. lapply不让我用by="1 month"这样的参数，所以我得到了12个连续的日子，而不是
月份。
2. 生成的z是数字而非日期。我检查了，这个数字是正确从19700101开始的天数，请问
如何把这样一个数字矩阵转换成为日期矩阵？
最后一个小问题：as.date和as.Date有什么区别？看了一遍manual没有什么概念。

q**j
发帖数: 10612

来自主题: Statistics版 - 两个有关于R的小问题？

That is very good. I am not aware of the assign tool.
Another issue is:
Result = matrix(0,nrow=2,ncol=3)
for (j in 1:3)
{
colnames(Result)[j] = paste("item",j,sep="")
}
can you see why it always gives error message? but
colnames(Result) = c("Good","Better","Best")
will work as expected. I am quite puzzled here.

s*****n
发帖数: 2174

来自主题: Statistics版 - 两个有关于R的小问题？

你定义矩阵的时候, 默认的dimnames是NULL, 也就是不存在.
下面你对一个不存在的东西取[j]指标, 显然会报错. 但是你
做colnames(Result) = c("Good","Better","Best")的时候,
是等于正常赋值, 这个是可以的. 问题出在取[j]指标上.
这个就好像下面这个例子
> a[3] <- 1
Error: object "a" not found
在没有的定义a的情况下, 你就取a[3]是不行的.
> a <- c(NA, NA, 1)
你对a整体赋值, 这是可以的.
解决办法:
1. (推荐)使用data.frame. data frame 是R里面专门定义的一种特殊类型的矩阵, 比
矩阵的信息要稍微丰富一些. 能用data frame, 就不要用matrix.
Result <- data.frame(matrix(0,nrow=2,ncol=3))
for (j in 1:3){
names(Result)[j] = paste("item",j,sep="")
}
2. 如果非要用matrix, 那么就事先把dimnames的位置

s*****n
发帖数: 2174

来自主题: Statistics版 - 今天又“R”了 -- 感想和请教。

同样一个任务, SAS有SAS的方式, R有R的方式. 用任何一个东西的构架去衡量另一个东
西, 都是没有意义的. 你这些天来问的问题, 绝大都是这种情况. 我本打算都不再回答
你的问题了, 不过既然你希望能双修, 还是希望你能抛开SAS思想来考虑问题的本身.
问题1:
你说的rename, 无非是在不复制object本身的情况下,
给object换个名字. R里面完全可以这样, 没什么可难的.
A <- matrix(0, ncol=10000, nrow=10000) ## a big object
B <- A ## B points to the same memory location as A
rm(A) ## Remove the pointer of A.
R和Splus里面的绝大多数object, 再赋值给另一个object的时候, 都并不复制, 只是建
立一个pointer. 除非你后来更改value. 在这个角度讲, 用不用package无所谓. SAS里
面的rename statement, 无非也是这么操作的, 只是给你包装成一

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天