Author Image

Hi, I am Shashank

Shashank Hegde

AI PhD Researcher at Robotics Embedded Systems Lab.

I am a passionate AI researcher with a strong background in computer vision and robotics. I am currently pursuing my PhD in Electrical and Computer Engineering at the University of Southern California. My research interests include deep reinforcement learning, machine learning, and robotics. I am a strong believer in the power of AI to improve the quality of life of people around the world.

Experiences

1
NVIDIA

May 2024 - August 2025

Redmond, WA

Applied Research on Autonomous Vehicles with the applied research team.

Applied Scientist Intern

May 2025 - August 2025

Responsibilities:
  • Trained a large-scale LLAMA transformer for self-driving vehicles using world model based imitation learning, reducing front collisions and improving policy safety.
  • Developed generative models to produce BEV visualizations, revealing latent policy representations.
Deep Learning Scientist Intern

May 2024 - December 2024

Responsibilities:
  • Trained a video diffusion model to generate RGB frames and BEVs for self-driving vehicle scenarios.
  • Utilized automatic mixed precision to bring down project costs by 30%.

Los Angeles, CA

RESL is a research lab at the University of Southern California. The lab is headed by Prof. Gaurav Sukhatme. The lab is focused on robotics and AI research.

PhD Researcher

Sept 2021 - Present

Responsibilities:
  • Develop sample efficient leaning methods for quadruped hurdling tasks on SLURM clusters. Use Sample factory for distributed learning with reduced policy lag.
  • Experiment with audio based communication between agents for multi agent reinforcement learning.
  • Create high performing small Neural Networks on AWS for robotic control, to fulfill device and time latency constraints.
2

3
SalesDNA (stealth mode Startup)

May 2021 - August 2021

Los Angeles, CA

This company investigates the application of AI in the field of sales

Data Scientist

May 2021 - August 2021

Responsibilities:
  • Built data pipelines for collection, cleaning and modeling. Use real time Markov modeling for a sales process.
  • Built model free reinforcement learning algorithms to build AI strategies on this sales simulation.

Los Angeles, CA

SSLL is a research lab at the University of Southern California. The lab is headed by Prof. Rahul Jain. The lab is focused on theoretical reinforcement learning .

Research Assistant

May 2020 - May 2021

Responsibilities:
  • Build scale-able Reinforcement Learning policies using function approximators with lesser trainable parameters.
  • Study and Apply state of the art Imitation Learning techniques to self driving vehicles and experiment on Hyper realistic simulations such as CARLA.
4

5

Los Angeles, CA

DRCL is a research lab at the University of Southern California. The lab is headed by Prof. Quan Nguyen. The lab is focused on robotic control.

Research Assistant

November 2019 - October 2020

Responsibilities:
  • Simulate and control a quadruped mini cheetah robot on Pybullet and Gazebo, by using stochastic control with policy gradient based agents. Test the RL controller on the actual robot after integration with ROS.
  • Experiment on different action spaces such as impedance control, torque control, force control, and use hybrid learning methods with model predictive control to help faster learning. Use RLLib for distributed learning.

Fidelity Investments

June 2016 - July 2019

Bangalore, India

Fidelity Investments is renowned financial institution that specializes in investment management, retirement planning, portfolio guidance, brokerage, benefits outsourcing, and many other financial products and services.

Data Scientist

July 2017 - July 2019

Responsibilities:
  • Develop applications based on Supervised Machine Learning for trade order selection and efficient execution.
  • Research on Reinforcement Learning and its application on portfolio construction in equity trading. A Gym simulation was built using real trading data. Google Tensorflow was used during the course of this work.
  • Worked with the Equity Trading team to develop and support the java and python based trading engine. Gained experience in java spring-boot, python flask, SQL, splunk, AWS and many other software developer tools.
Software Engineer Intern

June 2016 - August 2016

Responsibilities:
  • Worked with the fixed income research team to build a complete end to end application using .NET and Excel VBA. Gained experience in the Microsoft Windows Presentation framework for building hard clients.
6

7
Mangalore University.

May 2014 - June 2015

Mangalore, India

Laboratory of Applied Biology, Kuppers Biotech Unit.

Research Intern

May 2014 - June 2015

Responsibilities:
  • Predicting growth trend of algae after studying the effect of light on enhanced algal bio-fuel production. These predictions were done using Linear regression on the collected time series data.

Select Publications

Currently, large generalized policies are trained to predict controls or trajectories using diffusion models, which have the desirable property of learning multimodal action distributions. However, generalizability comes with a cost, namely, larger model size and slower inference. This is especially an issue for robotic tasks that require high control frequency. Further, there is a known trade-off between performance and action horizon for Diffusion Policy (DP), a popular model for generating trajectories - fewer diffusion queries accumulate greater trajectory tracking errors. For these reasons, it is common practice to run these models at high inference frequency, subject to robot computational constraints. To address these limitations, we propose Latent Weight Diffusion (LWD), a method that uses diffusion and a world model to generate closed-loop policies (weights for neural policies) for robotic tasks, rather than generating trajectories. Learning the behavior distribution through parameter space over trajectory space offers two key advantages - longer action horizons (fewer diffusion queries) & robustness to perturbations while retaining high performance; and a lower inference compute cost. To this end, we show that LWD has higher success rates than DP when the action horizon is longer and when stochastic perturbations exist in the environment. Furthermore, LWD achieves multitask performance comparable to DP while requiring just ~ 1/45th of the inference-time FLOPS per step.

We propose the use of latent space generative world models to address the covariate shift problem in autonomous driving. A world model is a neural network capable of predicting an agent’s next state given past states and actions. By leveraging a world model during training, the driving policy effectively mitigates covariate shift without requiring an excessive amount of training data. During end-to-end training, our policy learns how to recover from errors by aligning with states observed in human demonstrations, so that at runtime it can recover from perturbations outside the training distribution. Additionally, we introduce a novel transformer-based perception encoder that employs multi-view cross-attention and a learned scene query. We present qualitative and quantitative results, demonstrating significant improvements upon prior state of the art in closed-loop testing in the CARLA simulator, as well as showing the ability to handle perturbations in both CARLA and NVIDIA’s DRIVE Sim.

Optimizing continuous-time quantum error correction for arbitrary noise
Arxiv 2025

We present a protocol using machine learning (ML) to simultaneously optimize the quantum error-correcting code space and the corresponding recovery map in the framework of continuous-time quantum error correction. Given a Hilbert space and a noise process – potentially correlated across both space and time – the protocol identifies the optimal recovery strategy, measured by the average logical state fidelity. This approach enables the discovery of recovery schemes tailored to arbitrary device-level noise.

Active steering into quantum stabilizer codespace with reinforcement learning

The quantum error correction protocol has been a practical problem in quantum computation, especially in measuring high-weight stabilizers and decoding the error syndrome to find recovery operators. We propose a technique to actively maintain a quantum stabilizer codestate in the codespace even under the influence of decoherence. Our protocol uses continuous measurements of operators from the stabilizer algebra to perform Hamiltonian corrections. The measurement operators and the correction strengths are provided by a reinforcement learning agent. We process the measurement data by first applying an exponential averaging filter and then stacking the previous measurement outcomes before sending them to a reinforcement learning agent. The agent then provides correction strengths and the subsequent measurement operators. We demonstrate that this protocol can evolve any unknown quantum state into a stabilizer code state, and also maintain it within the codespace. This technique is particularly useful since it is scalable to higher dimensional quantum stabilizer codes.

HyperPPO- A scalable method for finding small policies for robotic control

We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints.

Generating Behaviorally Diverse Policies with Latent Diffusion Models

In this work, we propose using diffusion models to distill a dataset of policies into a single generative model over policy parameters. We show that our method achieves a compression ratio of 13x while recovering 98% of the original rewards and 89% of the original coverage. Furthermore, the conditioning mechanism of diffusion models allows for flexibly selecting and sequencing behaviors using language.

Efficiently Learning Small Policies for Locomotion and Manipulation

We leverage graph hyper networks to learn graph hyper policies trained with off-policy reinforcement learning resulting in networks that are two orders of magnitude smaller than commonly used networks yet encode policies comparable to those encoded by much larger networks trained on the same task.

Guided Learning of Robust Hurdling Policies with Curricular Trajectory Optimization

In this work, we focus on the combination of analytical and learning-based techniques to help researchers solve challenging robot locomotion problems. Specifically, we explore the combination of curricular trajectory optimization (CTO) and deep reinforcement learning (RL) for quadruped hurdling tasks.

Agents that Listen, High-Throughput Reinforcement Learning with Multiple Sensory Systems

We introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations. We study the performance of different model architectures in a series of tasks that require the agent to recognize sounds and execute instructions given in natural language. Finally, we train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary.

Randomized Policy Learning for Continuous State and Action MDPs
Arxiv 2020

We present RANDPOL, a generalized policy iteration algorithm for MDPs with continuous state and action spaces. Both the policy and value functions are represented with randomized networks. We also give finite time guarantees on the performance of the algorithm.

Risk aware portfolio construction using deep deterministic policy gradients

This paper evaluates the use of DDPG to solve the problem of risk aware portfolio construction. Simulations are done on a portfolio of twenty stocks and the use of both Rate of Return and Sortino ratio as a measure of portfolio performance are evaluated. Results are presented that demonstrate the effectiveness of DDPG for risk aware portfolio construction.

Use of light emitting diodes (LEDs) for enhanced lipid production in micro-algae based biofuels

Microalgae are an alternative source for renewable energy to overcome the energy crises caused by exhaustion of fuel reserves. Algal biofuel technology demands a cost effective strategy for net profitable productivity. Inconsistent illumination intensities hinder microalgal growth. The light-utilizing efficiency of the cells is critical. Light scarcity leads to low production and high intensities cause photo-inhibition. We report effective usage of LEDs of different band wavelengths on the growth of microalgae in a closed, controlled environment to generate biomass and lipid yields. Among the different intensity and wavelengths tested. The light intensities of 500 lx of blue-red combination gave maximum biomass in terms of cell density. LED of red light 220 lx wavelength doubled the lipid dry weight from 30% (w/w) in white light to 60% (w/w). Thin layer lipid chromatogram demonstrated a dense and prominent spot of triacylglycerols in the red light, 220 lx grown cultures. The FTIR profile indicates that different wavelength exposure did not alter the functional groups or change the chemical composition of the extracted lipids ensuring the quality of the product. We reiterate the fact that combination of red and blue LEDs is favoured over white light illumination for generation of biomass. In addition, we report an exciting finding of exposure to LEDs of red wavelength post-biomass generation lead to enhanced lipid production. This simple process doubled the lipid content harvested in 20 days culture period.

Projects

Automatic paper tagging
Individual Researcher September 2023 - October 2023

Apply BERT sentence transformer to encode abstracts of hundreds of papers, and then find cosine similarity of the encoding with that of topic definitions to rank and tag them

Autonomous Vehicle Navigation
Team member August 2019 - May 2021

As a part of the Autonomous Vehicle lab, I worked on navigation, path planning and simulation of an autonomous car to take part in IGVC 2021. I used Gazebo to build an accurate simulation of the track, and implement path finding algorithms such as A star.

Competitive and Co-operative Multi Agent Reinforcement Learning
Individual Researcher Jun 2020 - August 2020

As a part of my directed research with the Hardware Accelerated Learning group, I’m experimented with various multi agent reinforcement learning algorithms. The goal of this project is to understand the state of the art RL algorithms that work well in both competitive and cooperative environments.

Torque Transfer
Team Lead August 2020 - December 2020

Use reinforcement learning and transfer learning to create robust AI agents. The AI agent should generalize to a variety of open world self driving simulations. After training an AI for a self driving car simulation using Imitation learning and reinforcement learning, the learnt policy was used as a pre trained network for an AI agent in another self driving simulation. The pretrained model showed faster learning in the new simulation.

Emotion Transfer on speech using spectrogram images
Team Lead August 2020 - December 2020

Use a conditional Generative Adversarial Neural Network to generate images on spectrograms of speech signals. By using cycle GANs we use style transfer on spectrograms of speech signals to embed emotion in them. The generated spectrogram is reconstructed back to speech using the Griffin-Lim algorithm.

Fashion compatibility prediction
Team Lead Jan 2020 - May 2020

Use a Siamese Convolutional Neural Network to classify if two fashion objects are compatible with each other. Then using the pair-wise similarity scores predicted to see if an outfit is compatible. To do this Google Tensorflow 2.0 was used and the models were trained on AWS p3.2xlarge instances (Tesla V100 GPUs).

Spoken Language classifier
Individual Researcher Jan 2020 - May 2020

Implement a Gated Recurrent Unit based Neural Network to classify the extracted MFCC features from speech audio. A streaming model classifies the language being spoken in real time. Using this streaming model, we could analyse the probability of miss-classification at every instant of speech.

Prosthetic Voice (Thesis)
Team member August 2016 - May 2017

Undergraduate Thesis, sEMG signal controlled speech production aid for speech challenged individuals using Machine Learning. The signals were collected, filtered, pre-processed and then fed to a classifier that would predict the hand action performed. The action would then be translated to speech.

Emotion Detection
Team member August 2015 - December 2015

I was part of a three member team that built a Machine Learning driven emotion detector using variations in speech signals. Using MFCC feature extraction and PCA on other features, we built a classifier.

Skills

Education

Ph.D in Electrical and Computer Engineering
CGPA: 3.94 out of 4
M.S. in Electrical and Computer Engineering
CGPA: 3.94 out of 4
B.Tech. in Electrical and Electronics Engineering
CGPA: 8.17 out of 10

Accomplishments and Service

Organizer
USC Robotics Seminar August 2021 - Present

Organizer for the USC Robotics Seminar.

USC Annenberg Fellow
USC August 2021 - July 2022

Awarded a 1 year Fellowship for my PhD.

Masters Student Honors Program
USC August 2019 - May 2021

Certificate for outstanding academic and research achievements. PDF

The Data Open

Was a finalist in the SoCal round of the Data Open Hackathon organized by Citadel. PDF

Soda bottle classification contest

Winner of image classification contest by Deep Cognition.link

Teaching Assistant
USC August 2022 - December 2022

EE541 - A Computational Introduction to Deep Learning
EE641 - Deep Learning Systems
CSCI567 - Machine Learning

Delivered a company wide talk on SOTA applied Deep Reinforcement Learning. pdf

Presented a talk on Generative models for robotics