Khurram Javed

I make systems that learn in real-time from experience. Currently, I work in a small team led by John Carmack. Previously, I developed efficient reinforcement learning algorithms with Richard S. Sutton. I also participated in at the 55th International Mathematical Olympiad (Honorable-mention), and the XXVI Asian Pacific Mathematical Olympiad (Bronze Medal).

My research is driven by the big world hypothesis (Javed & Sutton, 2024) [PDF, Talk], which is the idea that no matter how large and complex our agents become, they will always be small compared to the world they interact with. Some consequences of the big world hypothesis are that no amount of prior learning is sufficient, continual learning is the only way to maintain strong performance, and computationally efficient learning algorithms are essential.

Selected Papers, Talks and Articles

The Current Crop of AI Startups is not Prepared for Big Worlds

[article]

Khurram Javed

SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning

Khurram Javed, Arsalan Sharifnassab, Richard S. Sutton

SwiftTD is a TD learning algorithm that can learn to assign credit to important signals. An algorithm like SwiftTD will be a key ingredient for robust real-time learning from rich and noisy sensory streams.

The Big World Hypothesis and its Ramifications for AI

[article]

Khurram Javed, Richard S. Sutton

Scalable Real-time Recurrent Learning using Columnar-constructive Networks

[paper]

Khurram Javed, Haseeb Shah, Richard S. Sutton, Martha White

One algorithm for backprop-free recurrent learning. Some variant of efficient recurrent learning that does not require backpropagation is essential for learning an effective agent state in big worlds. Learning an effective agent state, in turn, is essential for learning in big worlds. This paper shows that it is possible to have backprop-free recurrent learning.

Meta-learning Representations for Continual Learning

[paper]

Khurram Javed, Martha White

Learning online from a stream requires representations that are suitable for online updating. This paper shows that such representations exist. The method for learning such representations in this paper—gradient-based meta-learning with the gradient computed using backpropagation—scales poorly but there are other alternatives that scale better.