regex - Avoiding this use of sapply in R data.table -
i've written function remove first parentheses onwards in string:
until_parentheses <- function(string) {    1 <- stringr::str_split_fixed(string, "\\(", 2)[1, 1]    res <- stringr::str_trim(one)    return(res)  } and have data.table column looks (something) this:
messy <- paste(letters[1:10], paste0(c(" (", letters[1:2], ")"), collapse = ""))  dt <- data.table(messy) when try use until_parentheses() on messy column so
dt[, ":=" (clean = until_parentheses(messy))] the function applied first element of messy , clean column result repeated 10 times.
in order have clean column come out how want using sapply:
dt[, ":=" (clean_2 = sapply(messy, until_parentheses))] this gives result want takes long time run when dt long.
i feel there problems both until_parenthese() function , data.table method. have solution makes redundant use of sapply in instance?
thanks!
you can use gsub vectorized: 
dt[,clean_3:=gsub(' +[(].*','',messy)] ## replace after first ( blank 
Comments
Post a Comment