Simulators play a key role as testbeds for robotics control algorithms, but deciding when to use them versus collecting real-world data is often treated as an art. We have developed a framework for efficient reinforcement learning (RL) in a scenario where multiple simulators or a target task, each with varying levels of fidelity, are available. Our framework is designed to limit the number of samples used in each successively higher-fidelity/cost simulator by allowing a learning agent to choose to run trajectories at the lowest level simulator that will still provide it with information. The approach transfers state-action Q-values from lower-fidelity models as heuristics for the “Knows What It Knows” Rmax family of RL algorithms, which is applicable over a wide range of possible dynamics and reward representations. Theoretical proofs of the framework’s sample complexity have been developed and empirical results demonstrated on a remote controlled car with multiple simulators. The approach allows RL algorithms to find near-optimal policies in a physical robot domain with fewer expensive real-world samples than previous transfer approaches or learning without simulators.