minruns <- 1 ## min number or runs maxruns <- 100 ## max number or runs snps <- 10000 ## number of SNPs samplesN <- 100 ## number of individuals avDistSnps <- 100 ## average distance in bases between SNPs avDistMinor <- 25 ## average distance between minor alleles, thus 1/avDistMinor is the average MAF noImplanted <- 1 ## how many haplotypes implanted <- 10 ## in how many samples length <- 100 ## length of haplotypes minors <- 20 ## number of tagSNVs mismatches <- 0 ## number of mismatches in hapltopye mismatchImplanted <- 0.5 # inpercent overlap <- 50 ## minimal haplotype overlap between chromosomes noOverwrite <- FALSE ## noOverwrite=TRUE ensures that a haplotype is not superimposed by another haplotype write(paste("maxruns: ",maxruns,sep=""),file="dataSimParameters.txt",ncolumns=10000000) write(paste("snps: ",snps,sep=""),file="dataSimParameters.txt",append=TRUE,ncolumns=10000000) write(paste("samplesN: ",samplesN,sep=""),file="dataSimParameters.txt",append=TRUE,ncolumns=10000000) write(paste("avDistSnps: ",avDistSnps,sep=""),file="dataSimParameters.txt",append=TRUE,ncolumns=10000000) write(paste("avDistMinor: ",avDistMinor,sep=""),file="dataSimParameters.txt",append=TRUE,ncolumns=10000000) write(paste("noImplanted: ",noImplanted,sep=""),file="dataSimParameters.txt",append=TRUE,ncolumns=10000000) write(paste("implanted: ",implanted,sep=""),file="dataSimParameters.txt",append=TRUE,ncolumns=10000000) write(paste("length: ",length,sep=""),file="dataSimParameters.txt",append=TRUE,ncolumns=10000000) write(paste("minors: ",minors,sep=""),file="dataSimParameters.txt",append=TRUE,ncolumns=10000000) write(paste("mismatches: ",mismatches,sep=""),file="dataSimParameters.txt",append=TRUE,ncolumns=10000000) if (mismatches>0) { write(paste("mismatchImplanted: ",mismatchImplanted,sep=""),file="dataSimParameters.txt",append=TRUE,ncolumns=10000000) } write(paste("overlap: ",overlap,sep=""),file="dataSimParameters.txt",append=TRUE,ncolumns=10000000) rateSnps <- 1/avDistSnps rateMinor <- 1/avDistMinor haploN <- 2*samplesN dataA <- 1:samplesN nucleotide <- c("A","T","C","G") physPos <- vector("integer",snps) snpMajor <- vector("character",snps) snpMinor <- vector("character",snps) g2 <- 2*(1:snps) g1 <- g2 -1 s2 <- 2*dataA s1 <- s2-1 id <- as.character(dataA) zeroid <- as.character(rep(0,samplesN)) famID <- id dim(famID) <- c(samplesN,1) ID <- id dim(ID) <- c(samplesN,1) patID <- zeroid dim(patID) <- c(samplesN,1) matID <- zeroid dim(matID) <- c(samplesN,1) sex <- zeroid dim(sex) <- c(samplesN,1) phen <- zeroid dim(phen) <- c(samplesN,1) snpNames <- as.character(1:snps) chr <- rep(1,snps) alleleI <- matrix(0,haploN,snps) alleleN <- matrix("N",haploN,snps) ################# for (run in minruns:maxruns) { write(noImplanted,file=paste("dataSim",run,"Impl.txt",sep=""),ncolumns=10000000) distsSnps <- rexp(snps,rateSnps) aa <- as.integer(0) for (i in 1:snps) { aa <- aa + 1 + as.integer(round(distsSnps[i])) physPos[i] <- aa mami <- sample(4,2) snpMajor[i] <- nucleotide[mami[1]] snpMinor[i] <- nucleotide[mami[2]] } for (s in 1:haploN) { for (i in 1:snps) { if (runif(1)1)&&(noOverwrite)) { for (no2Im in 1:(noIm-1)) { if ((length(intersect(impl,allimpl[[no2Im]]))>0)&&(length(intersect(inter,allinter[[no2Im]]))>0)) { notfoundIBD <- TRUE } } } } impl=sort(impl) implG <- (impl+1)%/%2 write(impl,file=paste("dataSim",run,"Impl.txt",sep=""),append=TRUE,ncolumns=10000000) write(implG,file=paste("dataSim",run,"Impl.txt",sep=""),append=TRUE,ncolumns=10000000) write(inter,file=paste("dataSim",run,"Impl.txt",sep=""),append=TRUE,ncolumns=10000000) write(implantI,file=paste("dataSim",run,"Impl.txt",sep=""),append=TRUE,ncolumns=10000000) write(implantN,file=paste("dataSim",run,"Impl.txt",sep=""),append=TRUE,ncolumns=10000000) intervalImp <- matrix(0,implanted,4) for (i in 1:implanted) { startI <- 1+sample((length-overlap)%/%2,1) endI <- length - 1 - sample((length-overlap)%/%2,1) interI <- inter[startI:endI] alleleIimp[impl[i],interI] <- implantI[startI:endI] alleleNimp[impl[i],interI] <- implantN[startI:endI] interImp <- c(inter[startI],inter[endI],startI,endI) intervalImp[i,] <- interImp write(interImp,file=paste("dataSim",run,"Impl.txt",sep=""),append=TRUE,ncolumns=10000000) } if (mismatches > 1) { posMismatcht <- 1+ sample((length-2),mismatches) posMismatch <- inter[posMismatcht] noMisSamples <- round(mismatchImplanted*implanted) for (i1 in posMismatch) { sampind <- sample(implanted,noMisSamples) for (i1S in sampind) { if ((i1>=intervalImp[i1S,3])&&(i1<=intervalImp[i1S,4])) { whichSample <- impl[i1S] if (alleleIimp[whichSample,i1]==0) { alleleIimp[whichSample,i1] <- 1 alleleNimp[whichSample,i1] <- snpMinor[i1] } else { alleleIimp[whichSample,i1] <- 0 alleleNimp[whichSample,i1] <- snpMajor[i1] } } } } } } alleleIimpGeno <- alleleIimp[s1,] + alleleIimp[s2,] dummyL <- 1:snps dummyC <- rep(0,snps) dummy <- as.character(dummyL) annot <- list() annot[[1]] <- dummyL annot[[2]] <- pos annot[[3]] <- snpNames annot[[4]] <- snpMajor annot[[5]] <- snpMinor annot[[6]] <- dummy annot[[7]] <- dummy annot[[8]] <- dummy annot[[9]] <- dummy annot[[10]] <- freq annot[[11]] <- dummyC save(snps,haploN,samplesN,dataA,pos,pos1,snpNames,snpMajor,snpMinor,freq,annot,file=paste("dataSim",run,"Anno.Rda",sep="")) save(impl,implG,inter,implantI,implantN,intervalImp,file=paste("dataSim",run,"Impl.Rda",sep="")) # BEAGLE ######## v1 <- rep(1,haploN) dim(v1) <- c(1,haploN) v0 <- as.numeric(rbind(dataA,dataA)) dim(v0) <- c(1,haploN) dataB1 <- rbind(v0,v1,t(alleleNimp)) col1 <- c("id","BC58",snpNames) dim(col1) <- c((snps+2),1) col2 <- rep("M",snps) col2 <- c("I","A",col2) dim(col2) <- c((snps+2),1) dataB <- cbind(col2,col1,dataB1) write.table(dataB,file=paste("dataSim",run,"beagle.txt",sep=""),quote=FALSE,row.names=FALSE,col.names=FALSE) # PLINK ####### dataP1 <- matrix("character",nrow=samplesN,ncol=2*snps) dataP1[,g1] <- alleleNimp[s1,] dataP1[,g2] <- alleleNimp[s2,] dataP <- cbind(plinkCols,dataP1) write.table(dataP,file=paste("dataSim",run,"plink.ped",sep=""),quote=FALSE,row.names=FALSE,col.names=FALSE) write.table(plinkmap,file=paste("dataSim",run,"plink.map",sep=""),quote=FALSE,row.names=FALSE,col.names=FALSE,sep="\t") write.table(plinkCols,file=paste("dataSim",run,"plink.fam",sep=""),quote=FALSE,row.names=FALSE,col.names=FALSE,sep="\t") #MCMC ##### write.table(alleleIimpGeno,file=paste("dataSim",run,"mcmc.genotype",sep=""),quote=FALSE,row.names=FALSE,col.names=FALSE,sep="\t") write.table(mcmc,file=paste("dataSim",run,"mcmc.posmaf",sep=""),quote=FALSE,row.names=FALSE,col.names=FALSE,sep="\t") write.table(initM,file=paste("dataSim",run,"mcmc.initz",sep=""),quote=FALSE,row.names=FALSE,col.names=FALSE,sep="\t") #RELATE ######## write.table((alleleIimpGeno+1),file=paste("dataSim",run,"relate.geno",sep=""),quote=FALSE,row.names=FALSE,col.names=FALSE) write.table(pos1,file=paste("dataSim",run,"relate.pos",sep=""),quote=FALSE,row.names=FALSE,col.names=FALSE) write.table(chr,file=paste("dataSim",run,"relate.chr",sep=""),quote=FALSE,row.names=FALSE,col.names=FALSE) #FABIA ###### write(as.integer(haploN),file=paste("dataSim",run,"fabia.txt",sep=""),ncolumns=10000000) write(as.integer(snps),file=paste("dataSim",run,"fabia.txt",sep=""),append=TRUE,ncolumns=10000000) for (i in 1:haploN) { a1 <- which(alleleIimp[i,]>0.01) al <- length(a1) b1 <- alleleIimp[i,a1] a1 <- a1 - 1 dim(a1) <- c(1,al) dim(b1) <- c(1,al) write(al,file=paste("dataSim",run,"fabia.txt",sep=""),append=TRUE,ncolumns=10000000) write(a1,file=paste("dataSim",run,"fabia.txt",sep=""),append=TRUE,ncolumns=10000000) write(format(as.double(b1),nsmall=1),file=paste("dataSim",run,"fabia.txt",sep=""),append=TRUE,ncolumns=10000000) } }