`sm_regression.Rmd`

The `sm_regression()`

function performs kernel weighted regression. After the the model is specified (`data = cancertx, method = "coxph", formula = Surv(time) ~ age`

), an ancillary variable is specified (`weighting_var = "marker"`

) and the model is estimated weighted on this variable.

The following examples focus on time-to-event endpoints. The example data set is simulated from a Cox regression model.

\[ h(t) = h_0(t)e^{\textbf{X}\beta} \]

with \(h_0(t) = 1\) and \(\textbf{X}\beta = 0.02 * age - 0.2 * marker\). The independent variables age and marker are independent from one another. The example data set containts the following columns:

```
time Years from cancer treatment to death
age Age at treatment
marker Marker level at treatment
survt1 True 1 year survival probability
```

```
library(sjosmooth)
library(dplyr)
# loading data from sjosmooth github page
load(url("https://github.com/ddsjoberg/sjosmooth/blob/master/examples/cancertx.rda?raw=true"))
cancertx %>% select(time, age, marker, survt1) %>% head() %>% knitr::kable()
```

time | age | marker | survt1 |
---|---|---|---|

4.396865 | 12.79050 | 10.052925 | 0.8411829 |

1.761713 | 32.70496 | 8.164047 | 0.6867428 |

4.401422 | 23.16842 | 8.725925 | 0.7576508 |

1.027328 | 30.00945 | 12.923154 | 0.8715716 |

1.979813 | 31.51574 | 9.909102 | 0.7719386 |

2.009533 | 38.03416 | 7.372298 | 0.6127544 |

The simualted data contains 10,000 observations.

Kernel smoothing can be computationally intense on large data sets. The following code was run to get the example object `sm_regression_ex1`

and the results saved to GitHub.

```
library(survival)
library(sjosmooth)
sm_regression_ex1 =
sm_regression(
data = cancertx,
method = "coxph",
formula = Surv(time) ~ age,
weighting_var = "marker",
newdata = dplyr::data_frame(marker = seq(5, 15, 0.5))
)
```

The data in the `cancertx`

data frame was simulated from a Cox regression model with linear predictor.

\[ \textbf{X}\beta = 0.02 * age - 0.2 * marker \]

In the model age and marker level are independent, and therefore, the univariate model the slope coefficient for age remains \(\beta_{age} = 0.02\)

The beta coefficient is the same across all marker levels. In the figure below, the solid black line is the estimates beta coefficient (with 95% CI), and the dashed line is the true value.

```
library(ggplot2)
library(survival)
# loading example results from sjosmooth github page
load(url("https://github.com/ddsjoberg/sjosmooth/blob/master/examples/sm_regression_ex1.rda?raw=true"))
# plotting regression model coefficient by marker level
plot_data =
sm_regression_ex1 %>%
mutate(
age.coef = purrr::map_dbl(model_obj, ~coef(.x)),
age.ci = purrr::map(model_obj, ~confint(.x) %>% as_data_frame())
) %>%
select(newdata, age.coef, age.ci) %>%
tidyr::unnest(age.ci, newdata)
ggplot(plot_data, aes(x = marker, y = age.coef)) +
geom_line() +
geom_ribbon(aes(ymin = `2.5 %`, ymax = `97.5 %`), alpha = 0.3) +
geom_hline(aes(yintercept = 0.02), linetype = "dashed") +
scale_y_continuous(limits = c(0, 0.04)) +
labs(
y = "slope for age regressed on time to death"
)
```