I am a passionate AI researcher with a strong background in computer vision and robotics. I am currently pursuing my PhD in Electrical and Computer Engineering at the University of Southern California. My research interests include deep reinforcement learning, machine learning, and robotics. I am a strong believer in the power of AI to improve the quality of life of people around the world.
May 2024 - August 2025
Redmond, WA
Applied Research on Autonomous Vehicles with the applied research team.
May 2025 - August 2025
May 2024 - December 2024
Sept 2021 - Present
Los Angeles, CA
RESL is a research lab at the University of Southern California. The lab is headed by Prof. Gaurav Sukhatme. The lab is focused on robotics and AI research.
Sept 2021 - Present
May 2021 - August 2021
Los Angeles, CA
This company investigates the application of AI in the field of sales
May 2021 - August 2021
May 2020 - May 2021
Los Angeles, CA
SSLL is a research lab at the University of Southern California. The lab is headed by Prof. Rahul Jain. The lab is focused on theoretical reinforcement learning .
May 2020 - May 2021
November 2019 - October 2020
Los Angeles, CA
DRCL is a research lab at the University of Southern California. The lab is headed by Prof. Quan Nguyen. The lab is focused on robotic control.
November 2019 - October 2020
June 2016 - July 2019
Bangalore, India
Fidelity Investments is renowned financial institution that specializes in investment management, retirement planning, portfolio guidance, brokerage, benefits outsourcing, and many other financial products and services.
July 2017 - July 2019
June 2016 - August 2016
May 2014 - June 2015
Mangalore, India
Laboratory of Applied Biology, Kuppers Biotech Unit.
May 2014 - June 2015
Currently, large generalized policies are trained to predict controls or trajectories using diffusion models, which have the desirable property of learning multimodal action distributions. However, generalizability comes with a cost, namely, larger model size and slower inference. This is especially an issue for robotic tasks that require high control frequency. Further, there is a known trade-off between performance and action horizon for Diffusion Policy (DP), a popular model for generating trajectories - fewer diffusion queries accumulate greater trajectory tracking errors. For these reasons, it is common practice to run these models at high inference frequency, subject to robot computational constraints. To address these limitations, we propose Latent Weight Diffusion (LWD), a method that uses diffusion and a world model to generate closed-loop policies (weights for neural policies) for robotic tasks, rather than generating trajectories. Learning the behavior distribution through parameter space over trajectory space offers two key advantages - longer action horizons (fewer diffusion queries) & robustness to perturbations while retaining high performance; and a lower inference compute cost. To this end, we show that LWD has higher success rates than DP when the action horizon is longer and when stochastic perturbations exist in the environment. Furthermore, LWD achieves multitask performance comparable to DP while requiring just ~ 1/45th of the inference-time FLOPS per step.
We propose the use of latent space generative world models to address the covariate shift problem in autonomous driving. A world model is a neural network capable of predicting an agent’s next state given past states and actions. By leveraging a world model during training, the driving policy effectively mitigates covariate shift without requiring an excessive amount of training data. During end-to-end training, our policy learns how to recover from errors by aligning with states observed in human demonstrations, so that at runtime it can recover from perturbations outside the training distribution. Additionally, we introduce a novel transformer-based perception encoder that employs multi-view cross-attention and a learned scene query. We present qualitative and quantitative results, demonstrating significant improvements upon prior state of the art in closed-loop testing in the CARLA simulator, as well as showing the ability to handle perturbations in both CARLA and NVIDIA’s DRIVE Sim.
We present a protocol using machine learning (ML) to simultaneously optimize the quantum error-correcting code space and the corresponding recovery map in the framework of continuous-time quantum error correction. Given a Hilbert space and a noise process – potentially correlated across both space and time – the protocol identifies the optimal recovery strategy, measured by the average logical state fidelity. This approach enables the discovery of recovery schemes tailored to arbitrary device-level noise.
The quantum error correction protocol has been a practical problem in quantum computation, especially in measuring high-weight stabilizers and decoding the error syndrome to find recovery operators. We propose a technique to actively maintain a quantum stabilizer codestate in the codespace even under the influence of decoherence. Our protocol uses continuous measurements of operators from the stabilizer algebra to perform Hamiltonian corrections. The measurement operators and the correction strengths are provided by a reinforcement learning agent. We process the measurement data by first applying an exponential averaging filter and then stacking the previous measurement outcomes before sending them to a reinforcement learning agent. The agent then provides correction strengths and the subsequent measurement operators. We demonstrate that this protocol can evolve any unknown quantum state into a stabilizer code state, and also maintain it within the codespace. This technique is particularly useful since it is scalable to higher dimensional quantum stabilizer codes.
We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints.
In this work, we propose using diffusion models to distill a dataset of policies into a single generative model over policy parameters. We show that our method achieves a compression ratio of 13x while recovering 98% of the original rewards and 89% of the original coverage. Furthermore, the conditioning mechanism of diffusion models allows for flexibly selecting and sequencing behaviors using language.
We leverage graph hyper networks to learn graph hyper policies trained with off-policy reinforcement learning resulting in networks that are two orders of magnitude smaller than commonly used networks yet encode policies comparable to those encoded by much larger networks trained on the same task.
In this work, we focus on the combination of analytical and learning-based techniques to help researchers solve challenging robot locomotion problems. Specifically, we explore the combination of curricular trajectory optimization (CTO) and deep reinforcement learning (RL) for quadruped hurdling tasks.
We introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations. We study the performance of different model architectures in a series of tasks that require the agent to recognize sounds and execute instructions given in natural language. Finally, we train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary.
We present RANDPOL, a generalized policy iteration algorithm for MDPs with continuous state and action spaces. Both the policy and value functions are represented with randomized networks. We also give finite time guarantees on the performance of the algorithm.
This paper evaluates the use of DDPG to solve the problem of risk aware portfolio construction. Simulations are done on a portfolio of twenty stocks and the use of both Rate of Return and Sortino ratio as a measure of portfolio performance are evaluated. Results are presented that demonstrate the effectiveness of DDPG for risk aware portfolio construction.
Microalgae are an alternative source for renewable energy to overcome the energy crises caused by exhaustion of fuel reserves. Algal biofuel technology demands a cost effective strategy for net profitable productivity. Inconsistent illumination intensities hinder microalgal growth. The light-utilizing efficiency of the cells is critical. Light scarcity leads to low production and high intensities cause photo-inhibition. We report effective usage of LEDs of different band wavelengths on the growth of microalgae in a closed, controlled environment to generate biomass and lipid yields. Among the different intensity and wavelengths tested. The light intensities of 500 lx of blue-red combination gave maximum biomass in terms of cell density. LED of red light 220 lx wavelength doubled the lipid dry weight from 30% (w/w) in white light to 60% (w/w). Thin layer lipid chromatogram demonstrated a dense and prominent spot of triacylglycerols in the red light, 220 lx grown cultures. The FTIR profile indicates that different wavelength exposure did not alter the functional groups or change the chemical composition of the extracted lipids ensuring the quality of the product. We reiterate the fact that combination of red and blue LEDs is favoured over white light illumination for generation of biomass. In addition, we report an exciting finding of exposure to LEDs of red wavelength post-biomass generation lead to enhanced lipid production. This simple process doubled the lipid content harvested in 20 days culture period.
Apply BERT sentence transformer to encode abstracts of hundreds of papers, and then find cosine similarity of the encoding with that of topic definitions to rank and tag them
As a part of the Autonomous Vehicle lab, I worked on navigation, path planning and simulation of an autonomous car to take part in IGVC 2021. I used Gazebo to build an accurate simulation of the track, and implement path finding algorithms such as A star.
As a part of my directed research with the Hardware Accelerated Learning group, I’m experimented with various multi agent reinforcement learning algorithms. The goal of this project is to understand the state of the art RL algorithms that work well in both competitive and cooperative environments.
Use reinforcement learning and transfer learning to create robust AI agents. The AI agent should generalize to a variety of open world self driving simulations. After training an AI for a self driving car simulation using Imitation learning and reinforcement learning, the learnt policy was used as a pre trained network for an AI agent in another self driving simulation. The pretrained model showed faster learning in the new simulation.
Use a conditional Generative Adversarial Neural Network to generate images on spectrograms of speech signals. By using cycle GANs we use style transfer on spectrograms of speech signals to embed emotion in them. The generated spectrogram is reconstructed back to speech using the Griffin-Lim algorithm.
Use a Siamese Convolutional Neural Network to classify if two fashion objects are compatible with each other. Then using the pair-wise similarity scores predicted to see if an outfit is compatible. To do this Google Tensorflow 2.0 was used and the models were trained on AWS p3.2xlarge instances (Tesla V100 GPUs).
Implement a Gated Recurrent Unit based Neural Network to classify the extracted MFCC features from speech audio. A streaming model classifies the language being spoken in real time. Using this streaming model, we could analyse the probability of miss-classification at every instant of speech.
Undergraduate Thesis, sEMG signal controlled speech production aid for speech challenged individuals using Machine Learning. The signals were collected, filtered, pre-processed and then fed to a classifier that would predict the hand action performed. The action would then be translated to speech.
I was part of a three member team that built a Machine Learning driven emotion detector using variations in speech signals. Using MFCC feature extraction and PCA on other features, we built a classifier.
2021-Present Ph.D in Electrical and Computer EngineeringCGPA: 3.94 out of 4 | ||
2019-2021 M.S. in Electrical and Computer EngineeringCGPA: 3.94 out of 4 | ||
B.Tech. in Electrical and Electronics EngineeringCGPA: 8.17 out of 10 |
Reviewer for Conferences.
Organizer for the USC Robotics Seminar.
Awarded a 1 year Fellowship for my PhD.
Winner of image classification contest by Deep Cognition.link
EE541 - A Computational Introduction to Deep Learning
EE641 - Deep Learning Systems
CSCI567 - Machine Learning
Delivered a company wide talk on SOTA applied Deep Reinforcement Learning. pdf
Presented a talk on Generative models for robotics