Robot Self Awareness

1 · Brief

How do we understand what the self is? As humans, we learn to walk through practice, but if you drop any of us into a new environment, we'll still figure out how to move. Yes, we might fall, yet over time we train ourselves to travel at a decent clip (relative to what, exactly, is the interesting question).

Now flip the situation: place a perfectly functioning human body in that same unfamiliar world but strip away any sense of what the body is. Progress would be significantly slower.

That intuitive grasp of our own body is what we call self-awareness—Consider it as tackling the old Descartes puzzle—"How does the mind know the body is its own?"—but from the lens of proprioception.

My goal is to help a simple robot ant develop that same sense—first in simulation, then in real life.

2 · Why It Matters

Learning machines shouldn't need a bespoke controller for every new task—they should adapt on the fly. A robot that actually "knows" its body could pivot from walking to climbing (or from Mars dust to factory floor) with far less retraining. Besides, building a self-aware robot is just plain cool—and a crucial step toward more general, trustworthy AI.

3 · How We're Tackling It (High-Level)

Meta-Learning Inspiration – Approaches like MAML - Model-Agnostic Meta-Learning - show how to jump-start learning across tasks by fine-tuning a single base model - We try to take a different approach of encouraging the model to seperate rather than forcing/using a seperation

Task Roulette + Reward Shaping – We constantly shuffle surfaces, goals, and kinematic quirks and reward only sustained, stable gains. The combo forces the ant to reuse and refine its internal body model instead of chasing quick hacks.

Representation Probes – Neuron-matching heatmaps, SVCCA, PCA and other techniques track whether a coherent "body map" is forming.

4 · Early Signals & Analysis

Separation works when forced—confirmed via neuron matching and low-dimensional probes.

Multi-task models bounce back faster on unseen terrains than single-task baselines.

Emerging gaits reuse body knowledge rather than relearning from scratch.

Correlated neuron clusters persist across tasks—evidence of an internal body representation.

Reward curves and demo GIFs (coming soon) will make these gains tangible.

Next up: developing a natural sense of self without any explicit separation constraints.

5 · Under the Hood

I've built a broad stack around this project: RL design (phase-switch training, per-terrain replay), experiment ops (reproducible launches, checkpoint/resume, GPU scheduling), and extensive analysis (TensorBoard reads, quick rollout videos, and most importantly representation & model-comparison tools (mechanistic interpretability) to see what the policy learned and how it differs across runs). On the engineering side it's Python/PyTorch + Isaac, YAML configs, Bash/tmux, lightweight tests, Git hygiene, and clear docs (Markdown/LaTeX, runbooks, postmortems).

My desk is set up for fast loops: code in the middle, server control and git on the right, AI + prompt/coding on the left, a rotating doc screen for specs/PRs, and a "focus" screen with a concert/EDM set.

6 · Acknowledgements

Research conducted in the Creative Machines Lab, Columbia University under Professor Hod Lipson.

Questions or ideas? aj3337@columbia.edu