The observation method in psychology involves directly and systematically witnessing and recording measurable behaviors, actions, and responses in natural or contrived settings without attempting to intervene or manipulate what is being observed.
Used to describe phenomena, generate hypotheses, or validate self-reports, psychological observation can be either controlled or naturalistic with varying degrees of structure imposed by the researcher.
There are different types of observational methods, and distinctions need to be made between:
1. Controlled Observations
2. Naturalistic Observations
3. Participant Observations
In addition to the above categories, observations can also be either overt/disclosed (the participants know they are being studied) or covert/undisclosed (the researcher keeps their real identity a secret from the research subjects, acting as a genuine member of the group).
In general, conducting observational research is relatively inexpensive, but it remains highly time-consuming and resource-intensive in data processing and analysis.
The considerable investments needed in terms of coder time commitments for training, maintaining reliability, preventing drift, and coding complex dynamic interactions place practical barriers on observers with limited resources.
Controlled Observation
Controlled observation is a research method for studying behavior in a carefully controlled and structured environment.
The researcher sets specific conditions, variables, and procedures to systematically observe and measure behavior, allowing for greater control and comparison of different conditions or groups.
The researcher decides where the observation will occur, at what time, with which participants, and in what circumstances, and uses a standardized procedure. Participants are randomly allocated to each independent variable group.
Rather than writing a detailed description of all behavior observed, it is often easier to code behavior according to a previously agreed scale using a behavior schedule (i.e., conducting a structured observation).
The researcher systematically classifies the behavior they observe into distinct categories. Coding might involve numbers or letters to describe a characteristic or the use of a scale to measure behavior intensity.
The categories on the schedule are coded so that the data collected can be easily counted and turned into statistics.
For example, Mary Ainsworth used a behavior schedule to study how infants responded to brief periods of separation from their mothers. During the Strange Situation procedure, the infant’s interaction behaviors directed toward the mother were measured, e.g.,
- Proximity and contact-seeking
- Contact maintaining
- Avoidance of proximity and contact
- Resistance to contact and comforting
The observer noted down the behavior displayed during 15-second intervals and scored the behavior for intensity on a scale of 1 to 7.
Sometimes participants’ behavior is observed through a two-way mirror, or they are secretly filmed. Albert Bandura used this method to study aggression in children (the Bobo doll studies).
A lot of research has been carried out in sleep laboratories as well. Here, electrodes are attached to the scalp of participants. What is observed are the changes in electrical activity in the brain during sleep (the machine is called an EEG).
Controlled observations are usually overt as the researcher explains the research aim to the group so the participants know they are being observed.
Controlled observations are also usually non-participant as the researcher avoids direct contact with the group and keeps a distance (e.g., observing behind a two-way mirror).
Strengths
- Controlled observations can be easily replicated by other researchers by using the same observation schedule. This means it is easy to test for reliability.
- The data obtained from structured observations is easier and quicker to analyze as it is quantitative (i.e., numerical) – making this a less time-consuming method compared to naturalistic observations.
- Controlled observations are fairly quick to conduct which means that many observations can take place within a short amount of time. This means a large sample can be obtained, resulting in the findings being representative and having the ability to be generalized to a large population.
Limitations
- Controlled observations can lack validity due to the Hawthorne effect /demand characteristics. When participants know they are being watched, they may act differently.
Naturalistic Observation
Naturalistic observation is a research method in which the researcher studies behavior in its natural setting without intervention or manipulation.
It involves observing and recording behavior as it naturally occurs, providing insights into real-life behaviors and interactions in their natural context.
Naturalistic observation is a research method commonly used by psychologists and other social scientists.
This technique involves observing and studying the spontaneous behavior of participants in natural surroundings. The researcher simply records what they see in whatever way they can.
In unstructured observations, the researcher records all relevant behavior with a coding system. There may be too much to record, and the behaviors recorded may not necessarily be the most important, so the approach is usually used as a pilot study to see what type of behaviors would be recorded.
Compared with controlled observations, it is like the difference between studying wild animals in a zoo and studying them in their natural habitat.
With regard to human subjects, Margaret Mead used this method to research the way of life of different tribes living on islands in the South Pacific. Kathy Sylva used it to study children at play by observing their behavior in a playgroup in Oxfordshire.
Collecting Naturalistic Behavioral Data
Technological advances are enabling new, unobtrusive ways of collecting naturalistic behavioral data.
The Electronically Activated Recorder (EAR) is a digital recording device participants can wear to periodically sample ambient sounds, allowing representative sampling of daily experiences (Mehl et al., 2012).
Studies program EARs to record 30-50 second sound snippets multiple times per hour. Although coding the recordings requires extensive resources, EARs can capture spontaneous behaviors like arguments or laughter.
EARs minimize participant reactivity since sampling occurs outside of awareness. This reduces the Hawthorne effect, where people change behavior when observed.
The SenseCam is another wearable device that passively captures images documenting daily activities. Though primarily used in memory research currently (Smith et al., 2014), systematic sampling of environments and behaviors via the SenseCam could enable innovative psychological studies in the future.
Strengths
- By being able to observe the flow of behavior in its own setting, studies have greater ecological validity.
- Like case studies, naturalistic observation is often used to generate new ideas. Because it gives the researcher the opportunity to study the total situation, it often suggests avenues of inquiry not thought of before.
- The ability to capture actual behaviors as they unfold in real-time, analyze sequential patterns of interactions, measure base rates of behaviors, and examine socially undesirable or complex behaviors that people may not self-report accurately.
Limitations
- These observations are often conducted on a micro (small) scale and may lack a representative sample (biased in relation to age, gender, social class, or ethnicity). This may result in the findings lacking the ability to generalize to wider society.
- Natural observations are less reliable as other variables cannot be controlled. This makes it difficult for another researcher to repeat the study in exactly the same way.
- Highly time-consuming and resource-intensive during the data coding phase (e.g., training coders, maintaining inter-rater reliability, preventing judgment drift).
- With observations, we do not have manipulations of variables (or control over extraneous variables), meaning cause-and-effect relationships cannot be established.
Participant Observation
Participant observation is a variant of the above (natural observations) but here, the researcher joins in and becomes part of the group they are studying to get a deeper insight into their lives.
If it were research on animals, we would now not only be studying them in their natural habitat but be living alongside them as well!
Leon Festinger used this approach in a famous study into a religious cult that believed that the end of the world was about to occur. He joined the cult and studied how they reacted when the prophecy did not come true.
Participant observations can be either covert or overt. Covert is where the study is carried out “undercover.” The researcher’s real identity and purpose are kept concealed from the group being studied.
The researcher takes a false identity and role, usually posing as a genuine member of the group.
On the other hand, overt is where the researcher reveals his or her true identity and purpose to the group and asks permission to observe.
Limitations
- It can be difficult to get time/privacy for recording. For example, researchers can’t take notes openly with covert observations as this would blow their cover. This means they must wait until they are alone and rely on their memory. This is a problem as they may forget details and are unlikely to remember direct quotations.
- If the researcher becomes too involved, they may lose objectivity and become biased. There is always the danger that we will “see” what we expect (or want) to see. This problem is because they could selectively report information instead of noting everything they observe. Thus reducing the validity of their data.
Recording of Data
With controlled/structured observation studies, an important decision the researcher has to make is how to classify and record the data. Usually, this will involve a method of sampling.
In most coding systems, codes or ratings are made either per behavioral event or per specified time interval (Bakeman & Quera, 2011).
The three main sampling methods are:
- Event sampling. The observer decides in advance what types of behavior (events) she is interested in and records all occurrences. All other types of behavior are ignored.
Event-based coding involves identifying and segmenting interactions into meaningful events rather than timed units.
-
For example, parent-child interactions may be segmented into control or teaching events to code. Interval recording involves dividing interactions into fixed time intervals (e.g., 6-15 seconds) and coding behaviors within each interval (Bakeman & Quera, 2011).
-
Event recording allows counting event frequency and sequencing while also potentially capturing event duration through timed-event recording. This provides information on time spent on behaviors.
-
- Time (interval) sampling. The key feature of time sampling is that the interaction is divided into continuous fixed time intervals (e.g., every 15 seconds, 10 minutes every hour, 1 hour per day), and the observer codes for the behaviors that occur within each interval period.
- Interval recording is common in microanalytic coding to sample discrete behaviors in brief time samples across an interaction. The time unit can range from seconds to minutes to whole interactions. Interval recording requires segmenting interactions based on timing rather than events (Bakeman & Quera, 2011).
- Instantaneous (target time) sampling. The observer decides in advance the pre-selected moments when observation will occur and records what is happening at that instant. Everything happening before or after is ignored.
- Instantaneous sampling provides snapshot coding at certain moments rather than summarizing behavior within full intervals. This allows quicker coding but may miss behaviors in between target times.
Coding Systems
The coding system should focus on behaviors, patterns, individual characteristics, or relationship qualities that are relevant to the theory guiding the study (Wampler & Harper, 2014).
Codes vary in how much inference is required, from concrete observable behaviors like frequency of eye contact to more abstract concepts like degree of rapport between a therapist and client (Hill & Lambert, 2004). More inference may reduce reliability.
Coding schemes can vary in their level of detail or granularity. Micro-level schemes capture fine-grained behaviors, such as specific facial movements, while macro-level schemes might code broader behavioral states or interactions. The appropriate level of granularity depends on the research questions and the practical constraints of the study.
Another important consideration is the concreteness of the codes. Some schemes use physically based codes that are directly observable (e.g., “eyes closed”), while others use more socially based codes that require some level of inference (e.g., “showing empathy”). While physically based codes may be easier to apply consistently, socially based codes often capture more meaningful behavioral constructs.
Most coding schemes strive to create sets of codes that are mutually exclusive and exhaustive (ME&E). This means that for any given set of codes, only one code can apply at a time (mutual exclusivity), and there is always an applicable code (exhaustiveness). This property simplifies both the coding process and subsequent data analysis.
For example, a simple ME&E set for coding infant state might include: 1) Quiet alert, 2) Crying, 3) Fussy, 4) REM sleep, and 5) Deep sleep. At any given moment, an infant would be in one and only one of these states.
Macroanalytic coding systems
Macroanalytic coding systems involve rating or summarizing behaviors using larger coding units and broader categories that reflect patterns across longer periods of interaction rather than coding small or discrete behavioral acts.
Macroanalytic coding systems focus on capturing overarching themes, global qualities, or general patterns of behavior rather than specific, discrete actions.
For example, a macroanalytic coding system may rate the overall degree of therapist warmth or level of client engagement globally for an entire therapy session, requiring the coders to summarize and infer these constructs across the interaction rather than coding smaller behavioral units.
These systems require observers to make more inferences (more time-consuming) but can better capture contextual factors, stability over time, and the interdependent nature of behaviors (Carlson & Grotevant, 1987).
Examples of Macroanalytic Coding Systems:
- Emotional Availability Scales (EAS): This system assesses the quality of emotional connection between caregivers and children across dimensions like sensitivity, structuring, non-intrusiveness, and non-hostility.
- Classroom Assessment Scoring System (CLASS): Evaluates the quality of teacher-student interactions in classrooms across domains like emotional support, classroom organization, and instructional support.
Microanalytic coding systems
Microanalytic coding systems involve rating behaviors using smaller, more discrete coding units and categories.
These systems focus on capturing specific, discrete behaviors or events as they occur moment-to-moment. Behaviors are often coded second-by-second or in very short time intervals.
For example, a microanalytic system may code each instance of eye contact or head nodding during a therapy session. These systems code specific, molecular behaviors as they occur moment-to-moment rather than summarizing actions over longer periods.
Microanalytic systems require less inference from coders and allow for analysis of behavioral contingencies and sequential interactions between therapist and client. However, they are more time-consuming and expensive to implement than macroanalytic approaches.
Examples of Microanalytic Coding Systems:
- Facial Action Coding System (FACS): Codes minute facial muscle movements to analyze emotional expressions.
- Specific Affect Coding System (SPAFF): Used in marital interaction research to code specific emotional behaviors.
- Noldus Observer XT: A software system that allows for detailed coding of behaviors in real-time or from video recordings.
Mesoanalytic coding systems
Mesoanalytic coding systems attempt to balance macro- and micro-analytic approaches.
In contrast to macroanalytic systems that summarize behaviors in larger chunks, mesoanalytic systems use medium-sized coding units that target more specific behaviors or interaction sequences (Bakeman & Quera, 2017).
For example, a mesoanalytic system may code each instance of a particular type of therapist statement or client emotional expression. However, mesoanalytic systems still use larger units than microanalytic approaches coding every speech onset/offset.
The goal of balancing specificity and feasibility makes mesoanalytic systems well-suited for many research questions (Morris et al., 2014). Mesoanalytic codes can preserve some sequential information while remaining efficient enough for studies with adequate but limited resources.
For instance, a mesoanalytic couple interaction coding system could target key behavior patterns like validation sequences without coding turn-by-turn speech.
In this way, mesoanalytic coding allows reasonable reliability and specificity without requiring extensive training or observation. The mid-level focus offers a pragmatic compromise between depth and breadth in analyzing interactions.
Examples of Mesoanalytic Coding Systems:
- Feeding Scale for Mother-Infant Interaction: Assesses feeding interactions in 5-minute episodes, coding specific behaviors and overall qualities.
- Couples Interaction Rating System (CIRS): Codes specific behaviors and rates overall qualities in segments of couple interactions.
- Teaching Styles Rating Scale: Combines frequency counts of specific teacher behaviors with global ratings of teaching style in classroom segments.
Preventing Coder Drift
Coder drift results in a measurement error caused by gradual shifts in how observations get rated according to operational definitions, especially when behavioral codes are not clearly specified.
This type of error creeps in when coders fail to regularly review what precise observations constitute or do not constitute the behaviors being measured.
Preventing drift refers to taking active steps to maintain consistency and minimize changes or deviations in how coders rate or evaluate behaviors over time. Specifically, some key ways to prevent coder drift include:
- Operationalize codes: It is essential that code definitions unambiguously distinguish what interactions represent instances of each coded behavior.
- Ongoing training: Returning to those operational definitions through ongoing training serves to recalibrate coder interpretations and reinforce accurate recognition. Having regular “check-in” sessions where coders practice coding the same interactions allows monitoring that they continue applying codes reliably without gradual shifts in interpretation.
- Using reference videos: Coders periodically coding the same “gold standard” reference videos anchors their judgments and calibrate against original training. Without periodic anchoring to original specifications, coder decisions tend to drift from initial measurement reliability.
- Assessing inter-rater reliability: Statistical tracking that coders maintain high levels of agreement over the course of a study, not just at the start, flags any declines indicating drift. Sustaining inter-rater agreement requires mitigating this common tendency for observer judgment change during intensive, long-term coding tasks.
- Recalibrating through discussion: Having meetings for coders to discuss disagreements openly explores reasons judgment shifts may be occurring over time. Consensus on the application of codes is restored.
- Adjusting unclear codes: If reliability issues persist, revisiting and refining ambiguous code definitions or anchors can eliminate inconsistencies arising from coder confusion.
Essentially, the goal of preventing coder drift is maintaining standardization and minimizing unintentional biases that may slowly alter how observational data gets rated over periods of extensive coding.
Through the upkeep of skills, continuing calibration to benchmarks, and monitoring consistency, researchers can notice and correct for any creeping changes in coder decision-making over time.
Reducing Observer Bias
Observational research is prone to observer biases resulting from coders’ subjective perspectives shaping the interpretation of complex interactions (Burghardt et al., 2012). When coding, personal expectations may unconsciously influence judgments. However, rigorous methods exist to reduce such bias.
Coding Manual
A detailed coding manual minimizes subjectivity by clearly defining what behaviors and interaction dynamics observers should code (Bakeman & Quera, 2011).
High-quality manuals have strong theoretical and empirical grounding, laying out explicit coding procedures and providing rich behavioral examples to anchor code definitions (Lindahl, 2001).
Clear delineation of the frequency, intensity, duration, and type of behaviors constituting each code facilitates reliable judgments and reduces ambiguity for coders. Application risks inconsistency across raters without clarity on how codes translate to observable interaction.
Coder Training
Competent coders require both interpersonal perceptiveness and scientific rigor (Wampler & Harper, 2014). Training thoroughly reviews the theoretical basis for coded constructs and teaches the coding system itself.
Multiple “gold standard” criterion videos demonstrate code ranges that trainees independently apply. Coders then meet weekly to establish reliability of 80% or higher agreement both among themselves and with master criterion coding (Hill & Lambert, 2004).
Ongoing training manages coder drift over time. Revisions to unclear codes may also improve reliability. Both careful selection and investment in rigorous training increase quality control.
Blind Methods
To prevent bias, coders should remain unaware of specific study predictions or participant details (Burghardt et al., 2012). Separate data gathering versus coding teams helps maintain blinding.
Coders should be unaware of study details or participant identities that could bias coding (Burghardt et al., 2012).
Separate teams collecting data versus coding data can reduce bias.
In addition, scheduling procedures can prevent coders from rating data collected directly from participants with whom they have had personal contact. Maintaining coder independence and blinding enhances objectivity.
Data Analysis Approaches
Data analysis in behavioral observation aims to transform raw observational data into quantifiable measures that can be statistically analyzed.
It’s important to note that the choice of analysis approach is not arbitrary but should be guided by the research questions, study design, and nature of the data collected.
Interval data (where behavior is recorded at fixed time points), event data (where the occurrence of behaviors is noted as they happen), and timed-event data (where both the occurrence and duration of behaviors are recorded) may require different analytical approaches.
Similarly, the level of measurement (categorical, ordinal, or continuous) will influence the choice of statistical tests.
Researchers typically start with simple descriptive statistics to get a feel for their data before moving on to more complex analyses. This stepwise approach allows for a thorough understanding of the data and can often reveal unexpected patterns or relationships that merit further investigation.
simple descriptive statistics
Descriptive statistics give an overall picture of behavior patterns and are often the first step in analysis.
- Frequency counts tell us how often a particular behavior occurs, while rates express this frequency in relation to time (e.g., occurrences per minute).
- Duration measures how long behaviors last, offering insight into their persistence or intensity.
- Probability calculations indicate the likelihood of a behavior occurring under certain conditions, and relative frequency or duration statistics show the proportional occurrence of different behaviors within a session or across the study.
These simple statistics form the foundation of behavioral analysis, providing researchers with a broad picture of behavioral patterns.
They can reveal which behaviors are most common, how long they typically last, and how they might vary across different conditions or subjects.
For instance, in a study of classroom behavior, these statistics might show how often students raise their hands, how long they typically stay focused on a task, or what proportion of time is spent on different activities.
contingency analyses
Contingency analyses help identify if certain behaviors tend to occur together or in sequence.
- Contingency tables, also known as cross-tabulations, display the co-occurrence of two or more behaviors, allowing researchers to see if certain behaviors tend to happen together.
- Odds ratios provide a measure of the strength of association between behaviors, indicating how much more likely one behavior is to occur in the presence of another.
- Adjusted residuals in these tables can reveal whether the observed co-occurrences are significantly different from what would be expected by chance.
For example, in a study of parent-child interactions, contingency analyses might reveal whether a parent’s praise is more likely to follow a child’s successful completion of a task, or whether a child’s tantrum is more likely to occur after a parent’s refusal of a request.
These analyses can uncover important patterns in social interactions, learning processes, or behavioral chains.
sequential analyses
Sequential analyses are crucial for understanding processes and temporal relationships between behaviors.
- Lag sequential analysis looks at the likelihood of one behavior following another within a specified number of events or time units.
- Time-window sequential analysis examines whether a target behavior occurs within a defined time frame after a given behavior.
These methods are particularly valuable for understanding processes that unfold over time, such as conversation patterns, problem-solving strategies, or the development of social skills.
observer agreement
Since human observers often code behaviors, it’s important to check reliability. This is typically done through measures of observer agreement.
- Cohen’s kappa is commonly used for categorical data, providing a measure of agreement between observers that accounts for chance agreement.
- Intraclass correlation coefficient (ICC): Used for continuous data or ratings.
Good observer agreement is crucial for the validity of the study, as it demonstrates that the observed behaviors are consistently identified and coded across different observers or time points.
advanced statistical approaches
As researchers delve deeper into their data, they often employ more advanced statistical techniques.
- Analysis of variance (ANOVA) can be used to compare behavior frequencies or durations across different groups or conditions.
- For instance, an ANOVA might reveal differences in the frequency of aggressive behaviors between children from different socioeconomic backgrounds or in different school settings.
- Multilevel modeling is particularly useful in behavioral observation studies where data is nested – for example, behaviors within individuals, individuals within groups, or observations across multiple time points.
- This approach allows researchers to account for dependencies in the data and to examine how behaviors might be influenced by factors at different levels (e.g., individual characteristics, group dynamics, and situational factors).
- Time series analysis is another powerful tool, especially for studies that involve continuous observation over extended periods.
- This method can reveal trends, cycles, or patterns in behavior over time, which might not be apparent from simpler analyses. For instance, in a study of animal behavior, time series analysis might uncover daily or seasonal patterns in feeding, mating, or territorial behaviors.
representation techniques
Representation techniques help organize and visualize data:
- Code-unit grid: Represents data as a matrix of behaviors and time units
- Many researchers use a code-unit grid, which represents the data as a matrix with behaviors as rows and time units as columns.
- This format facilitates many types of analyses and allows for easy visualization of behavioral patterns.
- Sequential Data Interchange Standard (SDIS): Standardizes data format for analysis
- Standardized formats like the Sequential Data Interchange Standard (SDIS) help ensure consistency in data representation across studies and facilitate the use of specialized analysis software.
- Indeed, the complexity of behavioral observation data often necessitates the use of specialized software tools. Programs like GSEQ, Observer, and INTERACT are designed specifically for the analysis of observational data and can perform many of the analyses described above efficiently and accurately.
References
Bakeman, R., & Quera, V. (2017). Sequential analysis and observational methods for the behavioral sciences. Cambridge University Press.
Burghardt, G. M., Bartmess-LeVasseur, J. N., Browning, S. A., Morrison, K. E., Stec, C. L., Zachau, C. E., & Freeberg, T. M. (2012). Minimizing observer bias in behavioral studies: A review and recommendations. Ethology, 118(6), 511-517.
Hill, C. E., & Lambert, M. J. (2004). Methodological issues in studying psychotherapy processes and outcomes. In M. J. Lambert (Ed.), Bergin and Garfield’s handbook of psychotherapy and behavior change (5th ed., pp. 84–135). Wiley.
Lindahl, K. M. (2001). Methodological issues in family observational research. In P. K. Kerig & K. M. Lindahl (Eds.), Family observational coding systems: Resources for systemic research (pp. 23–32). Lawrence Erlbaum Associates.
Mehl, M. R., Robbins, M. L., & Deters, F. G. (2012). Naturalistic observation of health-relevant social processes: The electronically activated recorder methodology in psychosomatics. Psychosomatic Medicine, 74(4), 410–417.
Morris, A. S., Robinson, L. R., & Eisenberg, N. (2014). Applying a multimethod perspective to the study of developmental psychology. In H. T. Reis & C. M. Judd (Eds.), Handbook of research methods in social and personality psychology (2nd ed., pp. 103–123). Cambridge University Press.
Smith, J. A., Maxwell, S. D., & Johnson, G. (2014). The microstructure of everyday life: Analyzing the complex choreography of daily routines through the automatic capture and processing of wearable sensor data. In B. K. Wiederhold & G. Riva (Eds.), Annual Review of Cybertherapy and Telemedicine 2014: Positive Change with Technology (Vol. 199, pp. 62-64). IOS Press.
Traniello, J. F., & Bakker, T. C. (2015). The integrative study of behavioral interactions across the sciences. In T. K. Shackelford & R. D. Hansen (Eds.), The evolution of sexuality (pp. 119-147). Springer.
Wampler, K. S., & Harper, A. (2014). Observational methods in couple and family assessment. In H. T. Reis & C. M. Judd (Eds.), Handbook of research methods in social and personality psychology (2nd ed., pp. 490–502). Cambridge University Press.