diff --git a/README.md b/README.md index 36a9393c4600ef08fe46722e4023657336875983..66cc5000948c3a2a009ee9fcbc2a9bbadb9eabdc 100644 --- a/README.md +++ b/README.md @@ -57,59 +57,52 @@ print("Score: ", np.sum(reward)) # Print the total reward accumulated This is the docstring for the `stirap-v0` environment: - Description: - A potential with three wells is used to confine a quantum system: + Description: + A potential with three wells is used to confine a quantum system: - v(x) = 0.5 * self.trap_strength * (x - xl)**2 * x ** 2 * (x - xr)**2 + v(x) = 0.5 * self.trap_strength * (x - xl)**2 * x ** 2 * (x - xr)**2 - The dynamics is described by the 1D Schrödinger equation. + The dynamics is described by the 1D Schrödinger equation. - The system is initially in the ground state of the left well. The goal is to move as much of the - probability density to the right well by the end of the dynamics. - - The agent can move the left and right wells left independently by an amount Delta at each timestep. - - Source: - - Observation: - - If full_observation=False: - Type: Box(4) - Num Observation Min Max - 0 Left population 0 1 - 1 Right population 0 1 - 2 Left well position -2 +2 - 3 Right well position -2 +2 - - If full_observation=True: - Type: Box(2 * n + 2) - where n is the number of space points. - - (re(psi), im(psi), left_well_pos, right_well_pos) - - Actions: - Type: Discrete(9) - Num Action [ Left, Right ] - 0 [-Delta, -Delta], - 1 [ 0. , -Delta], - 2 [ Delta, -Delta], - 3 [-Delta, 0. ], - 4 [ 0. , 0. ], - 5 [ Delta, 0. ], - 6 [-Delta, Delta], - 7 [ 0. , Delta], - 8 [ Delta, Delta] - - Reward: - Reward at each time step is proportional to the population in the right well times t - Reward -10 is given if the episode terminates before hand - - Starting State: - The system is initially in the ground state of the left well. - - Episode Termination: - * After the amount of time defined by env.totaltime (and the timesteps defined by env.timesteps) - * If the agent moves the traps out of range (or swaps them) + The system is initially in the ground state of the left well. The goal is to move as much of the + probability density to the right well by the end of the dynamics. - Solved Requirements: - Handmade solution scores 120. Try and beat it \ No newline at end of file + The agent can move the left and right wells left independently by an amount Delta at each timestep. + + Source: + + Observation: + + If full_observation=False: + Type: Box(4) + Num Observation Min Max + 0 Left population 0 1 + 1 Right population 0 1 + 2 Left well position -2 +2 + 3 Right well position -2 +2 + + If full_observation=True: + Type: Box(2 * n + 2) + where n is the number of space points. + + (re(psi), im(psi), left_well_pos, right_well_pos) + + Actions: + Type: Box(2) + Num Action Min Max + 0 Move left well of amount -1 +1 + 1 Move right well of amount -1 +1 + + Reward: + Reward at each time step is proportional to the population in the right well times t + Reward -10 is given if the episode terminates before hand + + Starting State: + The system is initially in the ground state of the left well. + + Episode Termination: + * After the amount of time defined by env.totaltime (and the timesteps defined by env.timesteps) + * If the agent moves the traps out of range (or swaps them) + + Solved Requirements: + Handmade solution scores 120. Try and beat it \ No newline at end of file