Check out these statistical review tips from JTCVS Statistical Editors
Kaplan-Meier Curves and Cox Regression
Tips for reviewing manuscripts with Logistic Regression
Methods
- Indicate how all binary outcomes were defined.
- State all primary exposures and predictor variables, and definitions.
- State the variable selection methodology for multivariable logistic regression modeling. In observational studies, ensure that all measured confounding is accounted for.
- Ensure that the sample size and number of outcome events is suitable for the number of variables to be included in the multivariable logistic regression model. Consider the events per variable (EPV) ratio, and a statistical power calculation.
- Describe methods used to examine assumptions of logistic regression.
- For clustered, hierarchical, or nested data, consider using conditional logistic regression, mixed-effects modeling, or generalized estimating equations (GEE).
- Consider examining interaction terms for determining effect modification between covariates.
Results
- Provide descriptive summary statistics for all variables.
- Present measures of effect (i.e., odds ratios) with measure of variability (confidence intervals).
- Consider presenting results from logistic regression using a forest plot.
- Interpret statistical results in the context of confounding and bias.
- Evaluate model fit statistics, including the multivariable area under the curve (AUC) for model discrimination, and the Hosmer-Lemeshow goodness of fit (GOF) test for model calibration.
- For model development and validation studies, include an external validation assessment of the model.
Tips for reviewing manuscripts with Longitudinal Data Analysis
Methods
- Indicate how longitudinal data were collected, including if it is a balanced design.
- Summarize follow-up time in the cohort.
- Consider whether time-varying-covariates are applicable.
- Consider presenting the primary outcome data over time as a spaghetti plot.
- Perform longitudinal regression analysis using appropriate models that account for repeated measures over time, such as generalized estimating equations (GEE), with appropriate specifications for the distribution and link function.
- Utilize an appropriate covariance (correlation) matrix given structural assumptions of the repeated-measures data.
- State the variable selection methodology for longitudinal regression modeling. In observational studies, ensure that all measured confounding is accounted for.
- Establish methods to evaluate and handle missing data, such as complete cases analysis (suitable for studies with minimal incomplete data).
Results
- Provide descriptive summary statistics for all variables.
- Display confidence intervals for estimates of the outcome over time, as appropriate.
- Ensure high quality of clarity and interpretability of longitudinal figures, axes, labels, and legends.
- Present measures of effect (i.e., coefficient, odds ratio, or risk ratio) with measures of variability (confidence intervals).
- Consider presenting results from longitudinal regression analysis using a forest plot.
Tips for reviewing manuscripts using Meta-Analysis
Methods
- Indicate inclusion/exclusion criteria of the Systematic Review of literature and methods of data collection.
- Define outcomes and exposures of interest in the context of the primary objectives of the Meta-Analysis.
- Indicate methods to assess publication bias (e.g. funnel plot, Egger’s test).
- Indicate methods to assess heterogeneity (e.g. I-squared) and define thresholds for substantial heterogeneity.
- State an appropriate statistical approach for calculating pooled effect estimates, including selection of the appropriate measure of effect and modeling technique based on heterogeneity.
- Ensure that all additional analyses and sensitivity analyses are clearly explained and justified.
Results
- Provide a flow chart or similar diagram to indicate the selection of studies included in the Systematic Review and Meta-Analysis.
- Provide information on each included publication (author, year, sample sizes, primary analyses/conclusion). This may be included as a large supplemental table or online materials.
- Clearly summarize the results of effect sizes for the primary comparison(s) of interest from all included studies and the pooled/synthesized effect estimate. Measures of variability (e.g. confidence intervals) should be reported. This can be done using a forest plot figure.
- Ensure that the reporting of the meta-analysis adheres to guidelines for high quality reporting (PRISMA checklist, R-AMSTAR score, etc.).
- Ensure high quality of clarity and interpretability of all figures, including axes, labels, and legends.
Tips for reviewing manuscripts using Machine Learning and AI
Methods
- Indicate the overall purpose/goals of the study in the context of machine learning and AI.
- State the methods to be used for Exploratory Data Analysis (EDA).
- Define the machine learning approach and state how it is suitable for the aims of the study.
- Indicate processes for feature selection for all machine learning models.
- Define methodology for model development/training and testing/validation.
- For classification modeling, consider methods to handle substantial class imbalances.
- State the metrics that will be evaluated for determining model performance and define the criteria that would indicate good or poor performance.
- Explain how model parameters will be tuned and how optimal models will be identified (as appropriate).
- State the computing methods and software that will be implemented.
Results
- Summarize the sample sizes utilized for machine learning and AI modeling and analyses.
- Clearly report results of Exploratory Data Analysis (EDA), with focus on clinically meaningful findings.
- Report the results of machine learning models and associated model performance in the testing/validation set. Report the findings of the comparison of performance across models (if applicable).
- Present real-world implications and generalizability of the machine learning and AI analysis.
- Report feature importance in the machine learning models (as appropriate).
- Consider reporting SHapley Additive exPlanations (SHAP) values to visually explain the machine learning models by highlighting how impactful features contribute to the predictions. This tool is available for Python and R.
- Ensure high quality of clarity and interpretability of all figures, including axes, labels, and legends.
Tips for reviewing manuscripts with Kaplan-Meier Curves and Cox Regression
Methods
- Indicate how all time-to-event data were collected.
- Summarize follow-up time in the cohort.
- Include numbers at risk for all Kaplan-Meier curves. Truncate curves where the number at risk falls below a clinically meaningful number.
- State statistical methods for comparing Kaplan-Meier curves (for example, log-rank testing).
- State the variable selection methodology for Cox regression modeling. In observational studies, ensure that all measured confounding is accounted for.
- Describe methods used to examine assumptions of Cox regression, including the proportional hazards assumption.
Results
- Display confidence intervals for survival estimates, as appropriate.
- Ensure high quality of clarity and interpretability of Kaplan-Meier curve figures, axes, labels, and legends.
- Present measures of effect (i.e. hazard ratios) with measure of variability (confidence intervals).
- Consider presenting results from Cox regression using a forest plot.
Tips for reviewing manuscripts with Propensity Score Matching (PSM)
Methods
- Provide a detailed explanation of the model used to estimate propensity scores, including the type of model, variables included and the presence of any non-linear or interaction terms.
- Describe the method used to select variables for inclusion in the propensity score model.
- When using propensity score matching, explicitly describe the matching algorithm used, including the distance metric and ratio of matching.
- Describe the metric used to assess balance (often the standardized mean difference (SMD)), and indicate the threshold used for determining adequate balance.
- When using propensity score weighting, explicitly describe and provide justification for the type of weight used.
- When stratification by the propensity scores is implemented, provide justification for the strata analyzed.
- Describe the methods used to examine the assumptions of propensity score methods, including examination for adequate overlap and the presence of extreme values to assess the balance of the groups on baseline factors after matching is done.
- Analyses in propensity-matched cohorts should incorporate the matched pairs or sets, which can be done by including a random effect for the matched sets.
- Propensity matched analyses may be conducted alongside a secondary analysis using a standard multivariable regression model for comparison.
Results
- When using matching, report the sample size of the pre- and post-matching cohorts.
- Describe the balance of variables across exposure groups both in the initial cohort and following matching or weighting. Comparison of balance should not utilize p-values; standardized mean differences are the preferred metric for assessing balance in this context.
- If not referenced in the methods section, provide visual representations of the distributions of propensity scores and/or weights; this may be included in the supplemental material.
- Provide estimates of the effect of interest using both standard multivariable regression models and the selected propensity score method (e.g., provide both unweighted and weighted estimates). Include measures of uncertainty (e.g., confidence intervals).
Tips for reviewing manuscripts with Competing Risks Analysis
Methods
- Indicate how all time-to-event data were collected.
- Summarize follow-up time in the cohort.
- Summarize the number of patients experiencing the competing event (informative censoring), then number of patients experiencing the event of interest, and the number of patients non-informatively censored. These can be referred to as End States.
- Consider comparing cumulative incidence of competing risks using Gray’s test.
- Perform model-based competing risks analysis, often using the Fine-Gray model.
- State the variable selection methodology for competing risks regression modeling. In observational studies, ensure that all measured confounding is accounted for.
- Traditional time-to-event regression (i.e. Cox regression) may be performed as a sensitivity analysis for comparison to the competing risks analysis results.
Results
- Display confidence intervals for survival or cumulative incidence estimates, as appropriate.
- Ensure high quality of clarity and interpretability of time-to-event figures, axes, labels, and legends.
- Present measures of effect (i.e. hazard ratios) with measure of variability (confidence intervals).
- Consider presenting results from competing risks regression using a forest plot.