r - Breaking down a timed sequence into episodes -
i'm trying break down vector of event times episodes. episode must meet 2 criteria. 1) consists of 3 or more events , 2) events have inter-event times of 25 time units or less. data organized in data frame shown below.
so far, figured out can find difference between events diff(eventtime). creating logical vector corresponds events 2nd inter-event criterion, can use rle(episodetimecriterion)
total number, , length of episodes.
eventtime timedifferencebetweennextevent episodetimecriterion 25 na na 75 50 true 100 25 true 101 1 true 105 4 true 157 52 false 158 1 true 160 2 true 167 7 true 169 2 true 170 1 true 175 5 true 178 3 true 278 100 false 302 24 true 308 6 true 320 12 true 322 459 false
however, know timing of episodes , 'rle()' doesn’t let me that.
ideally generate data frame looks this:
episode eventsperepisode episodestarttime episodeendtime 1 4 75 105 2 7 158 178 3 3 302 322
i know simple problem being new r, solution can envision series of loops. there way of doing without loops? or there package lends sort of analysis?
thanks!
edited clarity. added desired outcome data fame , expanded example data make clearer.
you've got pieces need. need make variable gives each episode number/name can group it. rle(...)$length
gives run lengths, use rep
repeat number number of times:
runs <- rle(df$episodetimecriterion)$lengths # don't need variable, makes code more readable df$episode <- rep(1:length(runs), runs)
so df
looks like
> head(df) eventtime timedifferencebetweennextevent episodetimecriterion episode 1 25 na na 1 2 75 50 true 2 3 100 25 true 2 4 101 1 true 2 5 105 4 true 2 6 157 52 false 3
now use dplyr
summarize data:
library(dplyr) df2 <- df %>% filter(episodetimecriterion) %>% group_by(episode) %>% summarise(eventsperepisode = n(), episodestarttime = min(eventtime), episodeendtime = max(eventtime))
which returns
> df2 source: local data frame [3 x 4] episode eventsperepisode episodestarttime episodeendtime (int) (int) (dbl) (dbl) 1 2 4 75 105 2 4 7 158 178 3 6 3 302 320
if want episode numbers integers starting one, can clean with
df2$episode <- 1:nrow(df2)
data
if wants play data, results of dput(df)
before running above code:
df <- structure(list(eventtime = c(25, 75, 100, 101, 105, 157, 158, 160, 167, 169, 170, 175, 178, 278, 302, 308, 320, 322), timedifferencebetweennextevent = c(na, 50, 25, 1, 4, 52, 1, 2, 7, 2, 1, 5, 3, 100, 24, 6, 12, 459), episodetimecriterion = c(na, true, true, true, true, false, true, true, true, true, true, true, true, false, true, true, true, false)), .names = c("eventtime", "timedifferencebetweennextevent", "episodetimecriterion"), row.names = c(na, -18l), class = "data.frame")
Comments
Post a Comment