On the sample complexity of reinforcement learning door