The problem of knowledge
Goals of the lecture
- What is knowledge and how do we produce it?
- Perspectives on scientific knowledge.
- Epistemological and ethical challenges facing CSS.
How do we learn about the world?
There are many ways to construct knowledge.
Subjective experience
Intuition and reason
Empirical research and measurement

Why not just subjective experience?
Subjective experience refers to the thoughts and experiences an individual has.
- For many things, no substitute for personal experience.
- “Anecdotal” evidence can still function as a kind of evidence.
- Cultural knowledge (rituals, customs, etc.) also functions as a kind of evolved wisdom.
- But over-reliance on anecdotes can also lead us astray when it comes to building models of the world.
Why not just intuition and reason?
Intuition and reason rely on logic and abstract thinking to understand the world.
- Rationalist tradition: knowledge comes from reasoning logically.
- Mathematics and formal logic are powerful tools for understanding
- But reason alone can lead to elegant theories that seem empirically wrong
- Sometimes absurdly so, e.g., Zeno’s paradox
- Sometimes empiricism is intuitive, e.g., Galileo’s work on gravity.
The rise of empirical science
Empiricism is the idea that knowledge comes primarily from direct sensory experience and observation.
- Combines experience-based approach with systematic measurement and experimentation.
- Over time, systematic observations can help form theories.
- In turn, theories guide observation.
- But how exactly should we do empirical science? Many different philosophies…
Empiricism: various philosophies
Historically, scientists and philosophers of science have made different arguments about how science works, both descriptively and prescriptively.
- Statements are meaningful if and only if they can be verified.
- Problem: Many statements can’t be positively verified; problem of induction.
- Theories cannot be proven, but can be proven wrong.
- E.g., a single black swan can disprove the claim: All swans are white.
- Problem: Science doesn’t always work this way…
Beyond falsification
- Science operates under paradigms or research programmes.
- Pragmatism: Focus on producing useful explanations.
So what makes a good explanation?
One goal of science is to produce explanations of natural phenomena.
- Covering law: show how phenomena emerge from general principles
- Causal-mechanical: identify manipulable causes and trace mechanisms
- Pragmatic: serve the practical needs of the question-asker
- Also connects to deep questions about what constitutes “scientific understanding” of a phenomenon.
Prediction vs. understanding
- Increasingly, machine learning models can make accurate predictions…
- …but we don’t always know why the model works.
- A model that predicts which word you’ll say doesn’t mean we know why you said that word.
- Yet simpler, more interpretable models sometimes trade-off with predictive accuracy.
- This complexity/accuracy trade-off is pervasive in statistical modeling.
- It also connects to more general challenges facing CSS.
Epistemological challenges in CSS
Any given empirical claim can be evaluated according to several validities.
Here, we’ll focus on these validities with respect to CSS specifically.
Construct validity
Are we measuring what we think we’re measuring?
Internal validity
Can we establish causal relationships?
Statistical validity
Are our analytical methods appropriate?
External validity
Do our findings generalize beyond our sample?
Construct validity in CSS
Construct validity refers to how well a variable is operationalized.
Many variables are somewhat abstract: how do we measure them?
Examples of hard constructs to operationalize:
- Happiness and well-being.
- Social connectedness.
- Inequality and poverty.
- Political polarization.
Key questions:
- What aspects are we capturing?
- What are we missing?
Internal validity: establishing causation
Internal validity is an indication of a study’s ability to eliminate alternative explanations for the effect of interest.
- We’ve all heard correlation does not imply causation.
- Best way to establish causation is through experiments (RCTs), but that’s not always possible (or realistic) in many CSS domains.
- Example: social media use and mental health; digital campaigning and voter turnout; and much more.
- Additionally, experimental control sometimes trades off with external validity!
- If design is observational (no experiment), need to account for possible confounds.
Statistical validity in CSS
Statistical validity is the extent to which a study’s statistical conclusions are accurate.
- Could include reporting the margin of error associated with a claim (e.g., \(10 \pm 2\)).
- Also encompasses common pitfalls.
- Flexible models applies to large datasets are a recipe for inadvertent p-hacking.
- Important to use methods like cross-validation to avoid overfitting.
- Fundamentally an issue of research ethics!
External validity: Who and when?
External validity refers to how well a given claim generalizes to the population of interest.
- Much social science research focuses on WEIRD populations (Western, Educated, Industrialized, Rich, Democratic).
- Digital data exhibits even more skew.
- How well do conclusions based on a given sample generalize across populations and times?
Ethical challenges in CSS
CSS research involves many important ethical questions.
- Privacy and consent.
- Algorithmic bias.
- Research ethics (reproducibility, etc.).
Summary, and moving forward
- CSS is pluralistic in terms of methods and research questions: no single “correct” approach.
- Multiple epistemological challenges facing empirical science.
- This course will focus on statistical methods, but we’ll also touch on other core issues, especially construct validity.