An open‑source interface for preprocessing, anonymizing, and securing multi‑source training data for humanoid robots
Author
Bartosz Kostarczyk
University of Warsaw
Project Description
CogniCap Studio — Open-Source Anonymization & Preprocessing Interface for Humanoid Robot Training Data
Abstract
CogniCap Studio will be an open-source, locally-run application giving researchers and data collectors a visual interface to review, trim, and anonymize raw multimodal footage before it leaves their device — ensuring that real-world physical skill data can be collected ethically and shared safely for humanoid robot training.
Hypothesis
The primary bottleneck in embodied AI is not compute — it is the lack of clean, privacy-compliant, real-world sensorimotor data. We hypothesize that by removing the technical barrier to safe data collection and anonymization, CogniCap Studio can accelerate the creation of physical AI datasets the same way early annotation tools accelerated computer vision research.
The Problem
Teaching robots physical skills requires recording real humans in real environments — workshops, kitchens, clinics. This footage inevitably captures faces, documents, and other identifying information. Without accessible tooling to handle this, valuable training data goes uncollected or unusable due to privacy risk.
The Solution
During the hackathon, a working prototype of CogniCap Studio will be built around three modules:
Ingest & Review — load stereo video footage and optional sensor streams (accelerometer CSV), preview on a frame-accurate timeline with variable playback speed, and trim unwanted scenes.
Smart Anonymization — automatic face detection on import via MediaPipe. For other sensitive objects, the operator draws one bounding box; Meta’s SAM2 then tracks and masks it automatically across the entire clip.
Secure Export — cleaned footage exported in robotics-compatible formats (MP4 + CSV or HDF5), ready for downstream processing and training pipelines.
Equipment
Dual-webcam stereo rig and a standard laptop running the full pipeline locally, with pre-recorded footage as the primary demo asset.
Why It Matters
Sensorimotor data — how humans see and physically interact with the world — is the missing training resource for the next generation of robots. CogniCap Studio will provide the secure, privacy-first client layer that makes collecting this data at scale realistic for anyone, without requiring data engineering expertise or cloud infrastructure.
Project requirements
Requirements for Collaborators
This project sits at the intersection of computer vision, privacy engineering, and robotics data infrastructure. No prior robotics experience is needed — the right mindset matters more than a specific background.
Education: Open to students of any level, primarily CS, AI/ML, Physics, or related fields. Non-technical backgrounds welcome if the skillset fits.
Language: Polish (primary working language). English B1+ sufficient for reading documentation and technical resources. Non-Polish speakers should have English at B2+ level to communicate effectively with the team.
Must-have:
- Comfort using AI-assisted coding tools (Cursor, GitHub Copilot, Claude Code, or similar) — this is how we will move fast
- Strong analytical thinking and a “digger” mindset — ability to break down an unfamiliar problem, research it, and push through until it works
- Ability to own and understand the code you produce with AI help, not just paste and pray
Valuable experience (not required but a big plus):
- Python (OpenCV, FastAPI, MediaPipe, NumPy)
- React or any frontend framework
- Video processing (FFmpeg)
- ML model integration or working with pretrained models
- Data pipelines or file format handling (HDF5, CSV)
Mindset above all:
We are building a working demo in 48 hours. The ideal collaborator is someone who doesn’t panic when something breaks, knows how to ask the right question to an AI tool, and can stay in control of what the codebase is actually doing.
Programming languages used in this project
Python and JavaScript (React)
What can you gain from participating?
Hands-on experience building a real, deployable open-source tool — not a toy project.
Specifically:
- Practical integration of state-of-the-art vision models (SAM2, MediaPipe) into a production-like pipeline
- Experience with multimodal data (synchronized video + sensor streams) — a highly niche and increasingly valuable skill in the AI job market
- Full-stack development under time pressure: FastAPI backend + React frontend
- Direct exposure to the embodied AI and humanoid robotics ecosystem — one of the fastest-growing fields in tech
- A concrete, GitHub-publishable project with real-world applicability
Key resources
- Humanoid Everyday https://arxiv.org/abs/2510.08807
- SAM 3: Segment Anything with Concepts https://arxiv.org/abs/2511.16719
