flowchart LR
p[[Plan]]
sim[[Simulate]]
a[[Acquire]]
e[[Explore / Analyze]]
s[[Share]]
p --> sim --> a --> e --> s
SOCI 3040 – Quantitative Research Methods
Department of Sociology, Memorial University
January 7, 2025
My name is [John, Professor McLevey, Dr. McLevey] (he/him).
Professor & Head of Sociology
New to Memorial after 11 years at University of Waterloo
Who are you?
Do you have any previous quant courses / experience?
What are your expectations for this course?
SOCI 3040, Quantitative Research Methods, will familiarize students with the procedures for understanding and conducting quantitative social science research. It will introduce students to the quantitative research process, hypothesis development and testing, and the application of appropriate tools for analyzing quantitative data. All sections of this course count towards the HSS Quantitative Reasoning Requirement (see mun.ca/hss/qr). (PR: SOCI 1000 or the former SOCI 2000)
This section of SOCI 3440 is an introduction to quantitative research methods, from planning an analysis to sharing the final results. Following the workflow from Rohan Alexander’s (2023) Telling Stories with Data, you will learn how to:
plan an analysis and sketch your data and endpoint
simulate some data to “force you into the details”
acquire, assess, and prepare empirical data for analysis
explore and analyze data by creating visualizations and fitting models
share the results of your work with the world!
flowchart LR
p[[Plan]]
sim[[Simulate]]
a[[Acquire]]
e[[Explore / Analyze]]
s[[Share]]
p --> sim --> a --> e --> s
“A lack of clear communication sometimes reflects a failure by the researcher to understand what is going on, or even what they are doing.” (Alexander 2023)
Core foundation of quantitative research methods
Bridge between analysis and understanding
Essential skill for modern researchers

You will use this workflow in the context of learning foundational quantitative research skills, including conducting exploratory data analyses and fitting, assessing, and interpreting linear and generalized linear models. Reproducibility and research ethics are considered throughout the workflow, and the entire course.
Plan, Simulate, Acquire, Explore / Analyze, Share
deliberate, reasoned decisions
purposeful adjustments
even 10 minutes of planning is valuable
Plan, Simulate, Acquire, Explore / Analyze, Share
Forces detailed thinking
Clarifies expected data structure and distributions.
Helps with cleaning and preparation
Identifies potential issues beforehand.
Provides clear testing framework
Ensures data meets expectations.
“Almost free” with modern computing
Provides “an intimate feeling for the situation” (Hamming [1997] 2020)
Plan, Simulate, Acquire, Explore / Analyze, Share
Often overlooked but crucial stage
Many difficult decisions required: data sources, formats, permissions.
Can significantly affect statistical results (Huntington-Klein et al. 2021)
Common challenges: quantity (too little or too much data) and quality
Plan, Simulate, Acquire, Explore / Analyze, Share
Begin with descriptive statistics
Move to statistical models
Remember: Models are tools, not truth, and they reflect our previous decisions, data acquisition choices, and cleaning procedures.
Plan, Simulate, Acquire, Explore / Analyze, Share
High-fidelity communication is essential
Document all decisions
Build credibility through transparency
Include:
What was done
Why it was done
What was found
Weaknesses of the approach
Communication
Reproducibility
Ethics
Questions
Measurement
Data Collection
Data Cleaning
Exploratory Data Analysis
Modeling
Scaling

“Simple analysis, communicated well, is more valuable than complicated analysis communicated poorly.” (Alexander 2023)
“One challenge is that as you immerse yourself in the data, it can be difficult to remember what it was like when you first came to it.” (Alexander 2023)
Everything must be independently repeatable.
Requirements:
Open access to code
Data availability or simulation
Automated testing
Clear documentation
Aim for autonomous end-to-end reproducibility
“This means considering things like: who is in the dataset, who is missing, and why? To what extent will our story perpetuate the past? And is this something that ought to happen?” (Alexander 2023)
Consider the full context of the dataset (D’Ignazio and Klein 2020)
Acknowledge the social, cultural, and political forces (Crawford 2021)
Use data ethically with concern for impact and equity
Questions evolve through understanding
Challenge of operationalizing variables
Curiosity is essential, drives deeper exploration
Value of “hybrid” knowledge that combines multiple disciplines
Comfort with asking “dumb” questions
“The world is so vibrant that it is difficult to reduce it to something that is possible to consistently measure and collect.” (Alexander 2023)
Measuring even simple things is challenging (e.g., measuring height: Shoes on or off? Time of day affects height. Different tools yield different results). More complex measurements are even harder. How do we measure happiness or pain?
Measurement requires decisions and is not value-free. Context and purpose guide all measurement choices.

“Data never speak for themselves; they are the puppets of the ventriloquists that cleaned and prepared them.” (Alexander 2023)
Collection determines possibilities
What and how we measure matters.
Cleaning requires many decisions
E.g., Handling “prefer not to say” and open-text responses.
Document every step
To ensure transparency and reproducibility.
Consider implications of choices
E.g., ethics, representation.
Iterative process
Never truly complete
Shapes understanding
Tool for understanding
Not a recipe to follow
Just one representation of reality
Statistical significance \(\neq\) scientific significance
Statistical models help us explore the shape of the data; are like echolocation
Using programming languages like R and Python
Handle large datasets efficiently
Automate repetitive tasks
Share work widely and quickly
Outputs can reach many people easily
APIs can make analyses accessible in real-time
To a certain extent we are wasting our time. We have a perfect model of the world—it is the world! But it is too complicated. If we knew perfectly how everything was affected by the uncountable factors that influence it, then we could forecast perfectly a coin toss, a dice roll, and every other seemingly random process each time. But we cannot. Instead, we must simplify things to that which is plausibly measurable, and it is that which we define as data. Our data are a simplification of the messy, complex world from which they were derived.
There are different approximations of “plausibly measurable”. Hence, datasets are always the result of choices. We must decide whether they are nonetheless reasonable for the task at hand. We use statistical models to help us think deeply about, explore, and hopefully come to better understand, our data. (Alexander 2023)
Through skillful
reduction 👨‍🍳
“Ultimately, we are all just telling stories with data, but these stories are increasingly among the most important in the world.” (Alexander 2023)
Telling good stories with data is difficult but rewarding.
Develop resilience and intrinsic motivation.
Accept that failure is part of the process.
Consider possibilities and probabilities.
Learn to make trade-offs.
No perfect analysis exists.
Aim for transparency and continuous improvement.
Brightspace
Course materials website: johnmclevey.com/SOCI3040/
Before class: Complete the assigned reading
In class: Introduction to R and RStudio