Psychedelic Data Science π
ππ Advent of Open Source β Day 18/24: A fun project analyzing vocabulary richness in psychedelic trip reports.
(See my intro post)
π Note: Running out of awesome projects and I want to save the best for last, so today just something fun.
While preparing for this advent calendar, I browsed through my 382 GitHub repositories and rediscovered this time capsule from 8 years ago. It explores the language of psychedelic experiences through data science - with an interesting hypothesis: do psychedelic experiences generate richer vocabulary compared to other substance reports?
π Origin Story
Back in 2015, when I was still “young” (before back pain and two-day hangovers became a thing), I was learning data science and natural language processing. I had an amusing hypothesis: surely people describing their psychedelic experiences would use richer vocabulary than those writing about stimulants - I mean, who hasn’t pondered the linguistic complexity of “everything is connected” versus “I cleaned my entire apartment at 4 AM”? Erowid.org, with its thousands of detailed first-person narratives across different substances, provided the perfect dataset to test this theory.
π§ Technical Highlights
- Natural language processing of experience reports
- Vocabulary richness analysis across substance categories
- TF-IDF vectorization for text analysis
- Support Vector Machine classification of experience types
- Web scraping with BeautifulSoup
- K-means clustering for discovering common themes
- Word cloud generation to visualize vocabulary differences
π Impact
- A learning project that helped understand:
- How different experiences shape language use
- Processing subjective experience narratives
- Document classification techniques
- The challenges of quantifying vocabulary richness
- 10 GitHub stars
π― Challenges and Solutions
- Implementing respectful web scraping
- Controlling for report length and education level
- Creating meaningful metrics for vocabulary richness
- Visualizing linguistic patterns across categories
π‘ Lessons Learned
- Putting things on GitHub means it’s less likely forgotten things get lost
- Early data science projects often reveal our initial fascinations
- Web scraping requires both technical skill and ethical consideration
- Text analysis tools have evolved dramatically since 2015
- Sometimes the most interesting projects are the ones you almost forgot about
Want to explore this intersection of psychedelics, language, and data science? Check out the project on GitHub!
#OpenSource #Python #DataScience #NLP #MachineLearning