Stanford Seminar - Replication strategies for more robust human simulation

08 Jun 2024 (1 year ago)

Using LLMs for Social Scientific Research

LLMs can advance social scientific inquiry and simulate human behavior.
LLMs can be used to understand how people make decisions, interact with each other, and form opinions.
There are challenges in using LLMs for social scientific research, such as sampling bias and variations.
Appropriate interfaces and standards are needed for using LLMs in social scientific research.

Concerns about the validity and reproducibility of social science findings generated using LLMs.
Some studies embrace transparency and reproducibility by providing prompting materials and input data.
Open-source models and data are advocated to understand biases and ensure reproducibility.
Need to assess distinct threats to reproducing social science with AI models.
Prior work focused on estimating bias and sampling problems in other new research settings.

LLM-specific threats:
- Prompt sensitivity: Idiosyncrasies in crafting prompts affect generalizability.
- Stochasticity: Inherent randomness impacts consistency and reliability.
- Memorization: Reproducing artifacts of training data leads to biased simulations.
Sensitivity probes:
- Perturbation: Observing effects of small changes to prompts or parameters.
- Data augmentation: Assessing sensitivity to variations in input data.
- Model comparison: Comparing results across different LLMs or datasets.

Perturbation: Systematically varying prompts and settings to assess sensitivity.
- Dimensions of perturbation: study protocol, settings, prompting strategies, model version.
Iteration: Drawing multiple samples to understand distributional characteristics.
Re-replication: Replicating existing replications to assess consistency.
Perturbation and iteration can be combined to understand the sampling distribution of perturbed results.

Replication: Repeating a study to confirm or refute the original findings.
Re-replication: Replicating an existing replication to assess consistency.
Re-replication is not as common in social science as replication and meta-analysis.
Implications of replications and re-replications in social science.

Study using a language model to simulate a social science experiment.
Model's choices compared to human data from the original study.
Model's overall patterns resemble human data, but point estimates are extreme.
Perturbing prompting and settings produces substantial variations in model output.
Probing and exploring settings reveal important information about result sensitivity.