Introduction

The sm_regression() function performs kernel weighted regression. After the the model is specified (data = cancertx, method = "coxph", formula = Surv(time) ~ age), an ancillary variable is specified (weighting_var = "marker") and the model is estimated weighted on this variable.

The following examples focus on time-to-event endpoints. The example data set is simulated from a Cox regression model.

$h(t) = h_0(t)e^{\textbf{X}\beta}$

with $$h_0(t) = 1$$ and $$\textbf{X}\beta = 0.02 * age - 0.2 * marker$$. The independent variables age and marker are independent from one another. The example data set containts the following columns:

time    Years from cancer treatment to death
age     Age at treatment
marker  Marker level at treatment
survt1  True 1 year survival probability
library(sjosmooth)
library(dplyr)

cancertx %>% select(time, age, marker, survt1) %>% head() %>% knitr::kable()
time age marker survt1
4.396865 12.79050 10.052925 0.8411829
1.761713 32.70496 8.164047 0.6867428
4.401422 23.16842 8.725925 0.7576508
1.027328 30.00945 12.923154 0.8715716
1.979813 31.51574 9.909102 0.7719386
2.009533 38.03416 7.372298 0.6127544

The simualted data contains 10,000 observations.

Example

Kernel smoothing can be computationally intense on large data sets. The following code was run to get the example object sm_regression_ex1 and the results saved to GitHub.

library(survival)
library(sjosmooth)
sm_regression_ex1 =
sm_regression(
data = cancertx,
method = "coxph",
formula = Surv(time) ~ age,
weighting_var = "marker",
newdata = dplyr::data_frame(marker = seq(5, 15, 0.5))
)

The data in the cancertx data frame was simulated from a Cox regression model with linear predictor.

$\textbf{X}\beta = 0.02 * age - 0.2 * marker$

In the model age and marker level are independent, and therefore, the univariate model the slope coefficient for age remains $$\beta_{age} = 0.02$$

The beta coefficient is the same across all marker levels. In the figure below, the solid black line is the estimates beta coefficient (with 95% CI), and the dashed line is the true value.

library(ggplot2)
library(survival)

# plotting regression model coefficient by marker level
plot_data =
sm_regression_ex1 %>%
mutate(
age.coef = purrr::map_dbl(model_obj, ~coef(.x)),
age.ci = purrr::map(model_obj, ~confint(.x) %>% as_data_frame())
) %>%
select(newdata, age.coef, age.ci) %>%
tidyr::unnest(age.ci, newdata)

ggplot(plot_data, aes(x = marker, y = age.coef)) +
geom_line() +
geom_ribbon(aes(ymin = 2.5 %, ymax = 97.5 %), alpha = 0.3) +
geom_hline(aes(yintercept = 0.02), linetype = "dashed") +
scale_y_continuous(limits = c(0, 0.04)) +
labs(
y = "slope for age regressed on time to death"
)