Embedding the Mind: Mapping Psychological Constructs from Language

Author

Hubert Plisiecki

Ideas Research Institute

Project Description

Just as neuroscience represents brain activity in high-dimensional state spaces, Natural Language Processing (NLP) represents language in embedding spaces that encode semantic structure. By modeling how words co-occur and relate, NLP captures the geometry of meaning itself. By analyzing how individuals use language, we can approximate how the concepts they express are internally structured and how they vary across psychological traits or groups.

This insight motivated the creation of the Supervised Semantic Differential (SSD), a method that reconstructs interpretable semantic gradients from open-ended text and statistically tests whether differences in meaning align with psychological variables or distinguish groups, while explaining these shifts in a mixed-methods manner.

Owing to its versatility and sensitivity in small samples, SSD has been applied to questions such as how the concepts of self and time differ across personality disorders, how people conceptualize their own country as a function of collective narcissism, and how people talk about AI depending on whether they have the Narcissistic Personality Disorder or not.

What you will do:

During Brainhack, you will use SSD to map differences in a construct of your choice leading to your own first-authored paper. Depending on your expertise and dataset, I will guide you through the analytical pipeline and help you draft a methods and results section suitable for a computational psychology preprint.

If you have no prior experience with NLP or Python, you will be able to use a local app that implements the analysis without coding. If you are experienced, you can work directly with the PyPI package ssdiff, extend the framework to new statistical models (e.g., mixed-effects regression), or generate synthetic datasets with LLMs to stress-test the method.

The goal is to advance NLP-based computational psychology by producing rigorous, preprint-worthy analyses in parallel, and to have fun doing that.

What you will need:

Before joining a team, identify a text dataset that includes structured metadata linked to each entry. The metadata should correspond to a psychological or group-level variable of interest. This may include diaries with wellbeing scores, online comments paired with clinical diagnoses, open-ended survey responses with personality measures, political speeches labeled by affiliation, or any similar combination of text and meaningful metadata. In short, you need text plus a related variable, continuous or categorical.

The dataset should contain at least 100 entries and be written in a single language, unless you plan to extend the method cross-linguistically. Familiarity with the theoretical literature on your construct will increase the likelihood that your hackathon analysis can become a preprint-ready paper. If you cannot find a suitable dataset, I will provide options, just contact me beforehand at hplisiecki@gmail.com.

Project requirements

Participants should bring or identify a text dataset that contains at least 100 entries and includes structured metadata associated with each text (e.g., psychological scores, group labels, demographic variables, or other meaningful measures). The texts should be written in the same language unless you intend to explore a cross-linguistic extension of the method.

You should have a clear research question or construct of interest that you would like to investigate. Familiarity with the theoretical literature related to your chosen construct is strongly recommended.

No prior experience with NLP or programming is required, but you should be willing to engage with computational methods and statistical interpretation. If you have coding experience (Python), you may work directly with the ssdiff package or even extend it; if not, you can use a local application that implements the pipeline without coding.

Participants are expected to actively collaborate within their team and be motivated to produce preprint-ready computational analyses.

Programming languages used in this project

Python

Who are we looking for?

What profession can participants be?

The project is open to participants from diverse backgrounds, including neuroinformatics, psychology, cognitive science, computational social science, data science, linguistics, psychiatry, digital humanities, and related fields. Both early-stage students and more advanced researchers are welcome. No prior experience in NLP is required, although an interest in computational approaches to studying the mind is essential.

How many can join?

The project can accommodate up to 8–10 active participants to ensure that each team receives sufficient guidance and that we can realistically produce preprint-ready analyses by the end of the hackathon.

What can you gain from participating?

You will learn how to apply novel Natural Language Processing computational psychology tools and will be given a chance to develop your own research paper based on the data you bring to the table.

Key resources

  1. Main SSD Paper: https://osf.io/preprints/psyarxiv/gvrsb_v1
  2. A paper that applies SSD to the study of personality disorders: https://osf.io/preprints/psyarxiv/r8y6b_v1
  3. ssdiff github repo: https://github.com/hplisiecki/Supervised-Semantic-Differential