diff --git a/README.md b/README.md
index 36a9393c4600ef08fe46722e4023657336875983..66cc5000948c3a2a009ee9fcbc2a9bbadb9eabdc 100644
--- a/README.md
+++ b/README.md
@@ -57,59 +57,52 @@ print("Score: ", np.sum(reward)) # Print the total reward accumulated
 
 This is the docstring for the `stirap-v0` environment:
 
-    Description:
-        A potential with three wells is used to confine a quantum system:
+        Description:
+            A potential with three wells is used to confine a quantum system:
 
-            v(x) = 0.5 * self.trap_strength * (x - xl)**2 *  x ** 2 * (x - xr)**2
+                v(x) = 0.5 * self.trap_strength * (x - xl)**2 *  x ** 2 * (x - xr)**2
 
-        The dynamics is described by the 1D Schrödinger equation.
+            The dynamics is described by the 1D Schrödinger equation.
 
-        The system is initially in the ground state of the left well. The goal is to move as much of the 
-        probability density to the right well by the end of the dynamics.
-        
-        The agent can move the left and right wells left independently by an amount Delta at each timestep.
-
-    Source:
-            
-    Observation: 
-
-        If full_observation=False:
-            Type: Box(4)
-            Num	Observation                 Min         Max
-            0	Left population               0           1
-            1	Right population              0           1
-            2	Left well position            -2          +2
-            3	Right well position           -2          +2
-            
-        If full_observation=True:
-            Type: Box(2 * n + 2)
-            where n is the number of space points.
-
-            (re(psi), im(psi), left_well_pos, right_well_pos)
-            
-    Actions:
-            Type: Discrete(9)
-            Num	Action  [ Left,  Right ]
-            0   [-Delta, -Delta],
-            1   [ 0.   , -Delta],
-            2   [ Delta, -Delta],
-            3   [-Delta,  0.   ],
-            4   [ 0.   ,  0.   ],
-            5   [ Delta,  0.   ],
-            6   [-Delta,  Delta],
-            7   [ 0.   ,  Delta],
-            8   [ Delta,  Delta]
-            
-    Reward:
-            Reward at each time step is proportional to the population in the right well times t
-            Reward -10 is given if the episode terminates before hand
-
-    Starting State:
-            The system is initially in the ground state of the left well. 
-
-    Episode Termination:
-            * After the amount of time defined by env.totaltime (and the timesteps defined by env.timesteps)
-            * If the agent moves the traps out of range (or swaps them)
+            The system is initially in the ground state of the left well. The goal is to move as much of the 
+            probability density to the right well by the end of the dynamics.
             
-    Solved Requirements:
-            Handmade solution scores 120. Try and beat it
\ No newline at end of file
+            The agent can move the left and right wells left independently by an amount Delta at each timestep.
+
+        Source:
+                
+        Observation: 
+    
+            If full_observation=False:
+                Type: Box(4)
+                Num	Observation                 Min         Max
+                0	Left population               0           1
+                1	Right population              0           1
+                2	Left well position            -2          +2
+                3	Right well position           -2          +2
+                
+            If full_observation=True:
+                Type: Box(2 * n + 2)
+                where n is the number of space points.
+
+                (re(psi), im(psi), left_well_pos, right_well_pos)
+                
+        Actions:
+                Type: Box(2)
+                Num	Action                      Min         Max
+                0   Move left well of amount     -1         +1
+                1   Move right well of amount    -1         +1
+                
+        Reward:
+                Reward at each time step is proportional to the population in the right well times t
+                Reward -10 is given if the episode terminates before hand
+
+        Starting State:
+                The system is initially in the ground state of the left well. 
+
+        Episode Termination:
+                * After the amount of time defined by env.totaltime (and the timesteps defined by env.timesteps)
+                * If the agent moves the traps out of range (or swaps them)
+                
+        Solved Requirements:
+                Handmade solution scores 120. Try and beat it
\ No newline at end of file