r - Count the occurence of specific combinations of characters in a list -
my question simple..but cant manage work out... have run variable selection method in r on 2000 genes using 1000 iterations , in each iteration got combination of genes. count number of times each combination of genes occurs in r. example have
# iteration 1 genes[1] "a" "b" "c" # iteration 2 genes[2] "a" "b" # iteration 3 genes[3] "a" "c" # iteration 4 genes [4] "a" "b"
and give me
"a" "b" "c" 1 "a" "b" 2 "a" "c" 1
i have unlisted list , got number each gene comes interested in combination. tried create table have unequal length each gene vector. in advance.
the way think of paste
them , use table
follows:
genes_p <- sapply(my_genes, paste, collapse=";") freq <- as.data.frame(table(genes_p)) # var1 freq # 1 a;b 2 # 2 a;b;c 1 # 3 c 1
the above solution assumes genes sorted names , same gene id doesn't occur more once within element of list. if want account both, then:
# sort genes before pasting genes_p <- sapply(my_genes, function(x) paste(sort(x), collapse=";")) # sort + unique genes_p <- sapply(my_genes, function(x) paste(sort(unique(x)), collapse=";"))
edit: following op's question in comment, idea combinations of 2'ers (so say), wherever possible , take table. first i'll break down code , write them separate understanding. i'll group them one-liner.
# first want possible combinations of length 2 here # is, if vector is: v <- c("a", "b", "c") combn(v, 2) # [,1] [,2] [,3] # [1,] "a" "a" "b" # [2,] "b" "c" "c"
this gives combinations taken 2 @ time. now, can paste similarly. combn
allows function argument.
combn(v, 2, function(y) paste(y, collapse=";")) # [1] "a;b" "a;c" "b;c"
so, each set of genes in list, can same wrapping around sapply
follows:
sapply(my_genes, function(x) combn(x, min(length(x), 2), function(y) paste(y, collapse=";")))
the min(length(x), 2)
required because of gene list can 1 gene.
# [[1]] # [1] "a;b" "a;c" "b;c" # [[2]] # [1] "a;b" # [[3]] # [1] "c" # [[4]] # [1] "a;b"
now, can unlist
vector
, use table
frequency:
table(unlist(sapply(l, function(x) combn(x, min(length(x), 2), function(y) paste(y, collapse=";"))))) # a;b a;c b;c c # 3 1 1 1
you can wrap in turn as.data.frame(.)
data.frame
:
as.data.frame(table(unlist(sapply(l, function(x) combn(x, min(length(x), 2), function(y) paste(y, collapse=";")))))) # var1 freq # 1 a;b 3 # 2 a;c 1 # 3 b;c 1 # 4 c 1
Comments
Post a Comment