Will devise initial method for evaluating the Q-val of a state-action pair.
First will make it work for this case with tiling, then abstract the structure away, to allow for any internal representation, as long as it respects the idea of spitting out a "point prediction" as well as "moving towards the desired Q-val."