r - Count the occurence of specific combinations of characters in a list -


my question simple..but cant manage work out... have run variable selection method in r on 2000 genes using 1000 iterations , in each iteration got combination of genes. count number of times each combination of genes occurs in r. example have

# iteration 1 genes[1] "a" "b" "c" # iteration 2 genes[2] "a" "b" # iteration 3 genes[3] "a" "c" # iteration 4 genes [4] "a" "b" 

and give me

"a" "b" "c"  1 "a" "b"      2 "a"  "c"     1 

i have unlisted list , got number each gene comes interested in combination. tried create table have unequal length each gene vector. in advance.

the way think of paste them , use table follows:

genes_p <- sapply(my_genes, paste, collapse=";") freq <- as.data.frame(table(genes_p)) #    var1 freq # 1   a;b    2 # 2 a;b;c    1 # 3     c    1 

the above solution assumes genes sorted names , same gene id doesn't occur more once within element of list. if want account both, then:

# sort genes before pasting genes_p <- sapply(my_genes, function(x) paste(sort(x), collapse=";"))  # sort + unique genes_p <- sapply(my_genes, function(x) paste(sort(unique(x)), collapse=";")) 

edit: following op's question in comment, idea combinations of 2'ers (so say), wherever possible , take table. first i'll break down code , write them separate understanding. i'll group them one-liner.

# first want possible combinations of length 2 here # is, if vector is: v <- c("a", "b", "c") combn(v, 2) #      [,1] [,2] [,3] # [1,] "a"  "a"  "b"  # [2,] "b"  "c"  "c"  

this gives combinations taken 2 @ time. now, can paste similarly. combn allows function argument.

combn(v, 2, function(y) paste(y, collapse=";")) # [1] "a;b" "a;c" "b;c" 

so, each set of genes in list, can same wrapping around sapply follows:

sapply(my_genes, function(x) combn(x, min(length(x), 2), function(y)                                        paste(y, collapse=";"))) 

the min(length(x), 2) required because of gene list can 1 gene.

# [[1]] # [1] "a;b" "a;c" "b;c"  # [[2]] # [1] "a;b"  # [[3]] # [1] "c"  # [[4]] # [1] "a;b" 

now, can unlist vector , use table frequency:

table(unlist(sapply(l, function(x) combn(x, min(length(x), 2), function(y)                                             paste(y, collapse=";")))))  # a;b a;c b;c   c  #   3   1   1   1  

you can wrap in turn as.data.frame(.) data.frame:

as.data.frame(table(unlist(sapply(l, function(x) combn(x, min(length(x), 2),                       function(y) paste(y, collapse=";"))))))  #   var1 freq # 1  a;b    3 # 2  a;c    1 # 3  b;c    1 # 4    c    1 

Comments

Popular posts from this blog

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -

Enable autocomplete or intellisense in Atom editor for PHP -