The problem of knowledge

Goals of the lecture

What is knowledge and how do we produce it?
Perspectives on scientific knowledge.
Epistemological and ethical challenges facing CSS.

How do we learn about the world?

There are many ways to construct knowledge.

Subjective experience

Intuition and reason

Empirical research and measurement

Why not just subjective experience?

Subjective experience refers to the thoughts and experiences an individual has.

For many things, no substitute for personal experience.
“Anecdotal” evidence can still function as a kind of evidence.
Cultural knowledge (rituals, customs, etc.) also functions as a kind of evolved wisdom.
But over-reliance on anecdotes can also lead us astray when it comes to building models of the world.

Why not just intuition and reason?

Intuition and reason rely on logic and abstract thinking to understand the world.

Rationalist tradition: knowledge comes from reasoning logically.
Mathematics and formal logic are powerful tools for understanding
But reason alone can lead to elegant theories that seem empirically wrong
- Sometimes absurdly so, e.g., Zeno’s paradox
- Sometimes empiricism is intuitive, e.g., Galileo’s work on gravity.

The rise of empirical science

Empiricism is the idea that knowledge comes primarily from direct sensory experience and observation.

Combines experience-based approach with systematic measurement and experimentation.
Over time, systematic observations can help form theories.
In turn, theories guide observation.
But how exactly should we do empirical science? Many different philosophies…

💭 Check-in

What’s your intuitive theory of how science works (or should work)?

Empiricism: various philosophies

Historically, scientists and philosophers of science have made different arguments about how science works, both descriptively and prescriptively.

Logical positivism

Statements are meaningful if and only if they can be verified.
Problem: Many statements can’t be positively verified; problem of induction.

Falsification

Theories cannot be proven, but can be proven wrong.
E.g., a single black swan can disprove the claim: All swans are white.
Problem: Science doesn’t always work this way…

Beyond falsification

Science operates under paradigms or research programmes.
Pragmatism: Focus on producing useful explanations.

So what makes a good explanation?

One goal of science is to produce explanations of natural phenomena.

💭 Check-in

What do you think makes a good explanation?

Covering law: show how phenomena emerge from general principles
Causal-mechanical: identify manipulable causes and trace mechanisms
Pragmatic: serve the practical needs of the question-asker
Also connects to deep questions about what constitutes “scientific understanding” of a phenomenon.

Prediction vs. understanding

💭 Check-in

Does being able to predict something mean we understand it?

Increasingly, machine learning models can make accurate predictions…
…but we don’t always know why the model works.
- A model that predicts which word you’ll say doesn’t mean we know why you said that word.
Yet simpler, more interpretable models sometimes trade-off with predictive accuracy.
This complexity/accuracy trade-off is pervasive in statistical modeling.
It also connects to more general challenges facing CSS.

Epistemological challenges in CSS

Any given empirical claim can be evaluated according to several validities.

Here, we’ll focus on these validities with respect to CSS specifically.

Construct validity

Are we measuring what we think we’re measuring?

Internal validity

Can we establish causal relationships?

Statistical validity

Are our analytical methods appropriate?

External validity

Do our findings generalize beyond our sample?

Construct validity in CSS

Construct validity refers to how well a variable is operationalized.

Many variables are somewhat abstract: how do we measure them?

Examples of hard constructs to operationalize:

Happiness and well-being.
Social connectedness.
Inequality and poverty.
Political polarization.

Key questions:

What aspects are we capturing?
What are we missing?

💭 Check-in

With a partner, choose one of these constructs. How might you operationalize it? What are limitations to this approach?

Internal validity: establishing causation

Internal validity is an indication of a study’s ability to eliminate alternative explanations for the effect of interest.

We’ve all heard correlation does not imply causation.
Best way to establish causation is through experiments (RCTs), but that’s not always possible (or realistic) in many CSS domains.
- Example: social media use and mental health; digital campaigning and voter turnout; and much more.
Additionally, experimental control sometimes trades off with external validity!
If design is observational (no experiment), need to account for possible confounds.

💭 Check-in

With a partner, think of some observational CSS studies you’ve read about. What might be alternative explanations for the effect of interest?

Statistical validity in CSS

Statistical validity is the extent to which a study’s statistical conclusions are accurate.

Could include reporting the margin of error associated with a claim (e.g., \(10 \pm 2\)).
Also encompasses common pitfalls.
- Flexible models applies to large datasets are a recipe for inadvertent p-hacking.
- Important to use methods like cross-validation to avoid overfitting.
Fundamentally an issue of research ethics!

💭 Check-in

Suppose you analyze the correlation between various personality traits and hundreds of outcome measures (life satisfication, income, etc.). You find significant results for about \(5 \%\) of your analyses. What’s a potential concern here?

External validity: Who and when?

External validity refers to how well a given claim generalizes to the population of interest.

Much social science research focuses on WEIRD populations (Western, Educated, Industrialized, Rich, Democratic).
- Digital data exhibits even more skew.
How well do conclusions based on a given sample generalize across populations and times?

💭 Check-in

With a partner, talk about a CSS-related study you’ve read about. How well do you think the conclusions generalize from the sample (people, society, time, etc.) studied?

Ethical challenges in CSS

CSS research involves many important ethical questions.

Privacy and consent.
Algorithmic bias.
Research ethics (reproducibility, etc.).

Summary, and moving forward

CSS is pluralistic in terms of methods and research questions: no single “correct” approach.
Multiple epistemological challenges facing empirical science.
This course will focus on statistical methods, but we’ll also touch on other core issues, especially construct validity.

💭 Key takeaway

Producing knowledge is hard, but methodological and theoretical principles can help guide us.