Open Agenda

SCUG, March 2018

Actual Topics

Python & R tradeoffs on the follow dimensions
- production system vs research
- computer science background vs stats background
- data manipulation vs analysis
- propagation of ideas/manuscripts to external audiences
- development costs
knitr & automated reports
- some overlap with this 2013 presentation and this 2014 presentation.
GitHub
- some overlap with this 2014 presentation and this 2014 presentation.
benefits of promoting consistency of files/patterns across projects, and using skeletons (example).
REDCap & research
- creating REDCap projects
- token security
- REDCapR
- some overlap with this 2014 presentation.

Possible Topics (that weren't covered today)

yaml & csv
- flatten/denormalize list to data.frame example
controlling long pipelines with flow files, such as osdh-flow.R
config package
- centralize your project-wide settings so it's available & consistent across multiple files.
- similar to a project-wide 'declare-globals' chunk.
text editors
- my favorites: RStudio, Atom, and Notepad++.
- find & replace across files with regexes: Atom
- easily zoom in & out is especially nice when sharing screens: tie -- Atom & Notepad++
- multicolumn select: 1st place--RStudio and 2nd place--Atom (with the Sublime-Style-Column-Selection package)
tight text control
- base::sprintf()
- glue::glue() & friends
Landing page for documentation across projects, such as BbmcResources
writing style guides with your team
- project-specific, such as the dashboard example.
- external consumption, such as the REDCap API Troubleshooting Guide.
- language-specific such as the
  - tidyverse style guide for R, which derived from the
  - Google's Style Guide for R and
  - Hadley's Style Guide for R (this one is probably more representative what your team might produce to unify your projects)
Use skeleton repos to jumpstart your projects, such as RAnalysisSkeleton
verify-values

# ---- verify-values -----------------------------------------------------------
# Sniff out problems
# OuhscMunge::verify_value_headstart(ds)
checkmate::assert_integer(ds$county_month_id    , lower=          1L              , any.missing=F, unique=T)
checkmate::assert_integer(ds$county_id          , lower=          1L   , upper=77L, any.missing=F, unique=F)
checkmate::assert_date(   ds$month              , lower="2012-01-01"              , any.missing=F)
checkmate::assert_integer(ds$region_id          , lower=          1L   , upper=20L, any.missing=F)
checkmate::assert_numeric(ds$fte                , lower=          0    , upper=40L, any.missing=F)
checkmate::assert_logical(ds$fte_approximated                                     , any.missing=F)

inequality joins with sqldf

Bounded by another table, using a join

d2 <- "
  SELECT
    o.[.record_matching_id],
    o.gender,
    o.age_months,
    o.bmi,
    p.percentile     AS percentile_lower,
    p.value
  FROM d_observed AS o
    LEFT OUTER JOIN d_pop_long AS p ON
      o.age_months = p.age_months AND
      o.gender     = p.gender     AND
      p.value      < o.bmi
  " %>%
  sqldf::sqldf(
    stringsAsFactors = FALSE
  )

Cumulation, by restricting on itself

ds_visit_cumulative_count <- "
  SELECT
    b.week, b.program_code, b.worker_name,
    count(distinct a.case_number) as     client_distinct_cumulative_by_worker
  FROM ds_visit_3 a
  JOIN ds_visit_3 b ON
    (a.week <= b.week)
    AND (a.program_code=b.program_code AND     a.worker_name=b.worker_name)
  GROUP BY b.program_code, b.worker_name, b.week
  ORDER BY b.program_code, b.worker_name, b.week
" %>%
sqldf::sqldf()

Windows of time, using a join

ds_client_week_visit_goal <- "
  SELECT
    p.case_number,
    p.program_code,
    p.worker_name_last                AS worker_name,
    p.week_start_inclusive,
    --COUNT(v.visit_date)              AS visit_week_scheduled_count,
    SUM(v.visit_completed)           AS visit_week_completed_count
  FROM ds_possible_client_week p
    LEFT JOIN ds_visit v ON (
      p.case_number=v.case_number
      AND
      (p.week_start_inclusive <= v.visit_date AND v.visit_date<p.week_stop_exclusive)
    )
  GROUP BY p.case_number, p.week_start_inclusive
  ORDER BY p.case_number, p.week_start_inclusive
" %>%
  sqldf::sqldf()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agenda-possible.md

agenda-possible.md

Open Agenda

Actual Topics

Possible Topics (that weren't covered today)

Files

agenda-possible.md

Latest commit

History

agenda-possible.md

File metadata and controls

Open Agenda

Actual Topics

Possible Topics (that weren't covered today)