Skip to content
This repository has been archived by the owner on May 21, 2022. It is now read-only.

Policy initialization #1

Open
jhlq opened this issue Jul 31, 2016 · 6 comments
Open

Policy initialization #1

jhlq opened this issue Jul 31, 2016 · 6 comments

Comments

@jhlq
Copy link

jhlq commented Jul 31, 2016

Require a way to conveniently manually provide initial knowledge for a policy.

For example, say we have a hexagonal grid of which we are tasked to choose a sequence in which it is certainly never correct to take the first pick right at the grid edges, with a getter+setter we can both view the previous edge probabilities and set them to zero.

Is such functionality in line with the intended directions?

@tbreloff
Copy link
Member

I would say that I haven't settled on a policy api yet... I've been a
little more focused on the environments. If you have time, could you write
out a little example code of how you see initializing policies? Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com wrote:

Require a way to conveniently manually provide initial knowledge for a
policy.

For example, say we have a hexagonal grid of which we are tasked to choose
a sequence in which it is certainly never correct to take the first pick
right at the grid edges, with a getter+setter we can both view the previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru
.

@jhlq
Copy link
Author

jhlq commented Jul 31, 2016

The getters are straight forward, just query the policy as usual. The
setters are a form of supervised learning so it would make sense to save
every set value as a training example, then we can have a basic
implementation and if a user builds up a large library of samples they can
easily plug their favorite library for supervised learning into the setter
system.
On 31 Jul 2016 13:45, "Tom Breloff" notifications@github.com wrote:

I would say that I haven't settled on a policy api yet... I've been a
little more focused on the environments. If you have time, could you write
out a little example code of how you see initializing policies? Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com
wrote:

Require a way to conveniently manually provide initial knowledge for a
policy.

For example, say we have a hexagonal grid of which we are tasked to
choose
a sequence in which it is certainly never correct to take the first pick
right at the grid edges, with a getter+setter we can both view the
previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru
.

@tbreloff
Copy link
Member

I think that, without sample code, I'll have a hard time understanding what
a "getter/setter" is. Do you mean a lookup table for states and actions? If
so, my interest lies much more in RL through function approximation, so I
don't have much need for table lookup apis (though it could certainly be
supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com wrote:

The getters are straight forward, just query the policy as usual. The
setters are a form of supervised learning so it would make sense to save
every set value as a training example, then we can have a basic
implementation and if a user builds up a large library of samples they can
easily plug their favorite library for supervised learning into the setter
system.
On 31 Jul 2016 13:45, "Tom Breloff" <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I would say that I haven't settled on a policy api yet... I've been a
little more focused on the environments. If you have time, could you
write
out a little example code of how you see initializing policies? Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

Require a way to conveniently manually provide initial knowledge for a
policy.

For example, say we have a hexagonal grid of which we are tasked to
choose
a sequence in which it is certainly never correct to take the first
pick
right at the grid edges, with a getter+setter we can both view the
previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the
thread
<

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<
https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236425611>,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru
.

@jhlq
Copy link
Author

jhlq commented Jul 31, 2016

Let's say our child is practicing math and we have prepared a challenging
problem. The getter would be asking what they think the answer is and the
setter is telling them the answer.
On 31 Jul 2016 19:28, "Tom Breloff" notifications@github.com wrote:

I think that, without sample code, I'll have a hard time understanding what
a "getter/setter" is. Do you mean a lookup table for states and actions? If
so, my interest lies much more in RL through function approximation, so I
don't have much need for table lookup apis (though it could certainly be
supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com
wrote:

The getters are straight forward, just query the policy as usual. The
setters are a form of supervised learning so it would make sense to save
every set value as a training example, then we can have a basic
implementation and if a user builds up a large library of samples they
can
easily plug their favorite library for supervised learning into the
setter
system.
On 31 Jul 2016 13:45, "Tom Breloff" <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I would say that I haven't settled on a policy api yet... I've been a
little more focused on the environments. If you have time, could you
write
out a little example code of how you see initializing policies? Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

Require a way to conveniently manually provide initial knowledge for
a
policy.

For example, say we have a hexagonal grid of which we are tasked to
choose
a sequence in which it is certainly never correct to take the first
pick
right at the grid edges, with a getter+setter we can both view the
previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the
thread
<

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<
#1 (comment)
,
or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<
https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236441150>,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGCB2sa3CFGebaevB9yw8rjXcHqIQ2VPks5qbNtKgaJpZM4JY9Ru
.

@tbreloff
Copy link
Member

So that's not really reinforcement learning. You should check out our
effort in JuliaML if you're more interested in more general machine
learning. In RL there are no "answers", only rewards.

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com wrote:

Let's say our child is practicing math and we have prepared a challenging
problem. The getter would be asking what they think the answer is and the
setter is telling them the answer.
On 31 Jul 2016 19:28, "Tom Breloff" <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I think that, without sample code, I'll have a hard time understanding
what
a "getter/setter" is. Do you mean a lookup table for states and actions?
If
so, my interest lies much more in RL through function approximation, so I
don't have much need for table lookup apis (though it could certainly be
supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

The getters are straight forward, just query the policy as usual. The
setters are a form of supervised learning so it would make sense to
save
every set value as a training example, then we can have a basic
implementation and if a user builds up a large library of samples they
can
easily plug their favorite library for supervised learning into the
setter
system.
On 31 Jul 2016 13:45, "Tom Breloff" <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');
<javascript:_e(%7B%7D,'cvml','notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

I would say that I haven't settled on a policy api yet... I've been a
little more focused on the environments. If you have time, could you
write
out a little example code of how you see initializing policies?
Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');
<javascript:_e(%7B%7D,'cvml','notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');');>>
wrote:

Require a way to conveniently manually provide initial knowledge
for
a
policy.

For example, say we have a hexagonal grid of which we are tasked to
choose
a sequence in which it is certainly never correct to take the first
pick
right at the grid edges, with a getter+setter we can both view the
previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the
thread
<

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<

#1 (comment)
,

or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<
#1 (comment)
,
or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<
https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236443872>,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AGCB2sa3CFGebaevB9yw8rjXcHqIQ2VPks5qbNtKgaJpZM4JY9Ru

.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492m8FFEIwojCH6oSr64DzlwmFgLDEks5qbNxggaJpZM4JY9Ru
.

@jhlq
Copy link
Author

jhlq commented Jul 31, 2016

Yes, as mentioned this is supervised and people would be able to plug their
favorite ML library.

Connecting the two is the goal, schools don't let students work entirely on
their own and neither do teachers lead them through every single problem. A
mix allows the AI to explore on its own with intermittent interventions
from more knowledgeable intelligences.

Reinforcement learning is key for robust AI and like mixing a metal with
trace elements can create strong alloys so will adding specks of
supervision significantly hasten progress.
On 31 Jul 2016 19:44, "Tom Breloff" notifications@github.com wrote:

So that's not really reinforcement learning. You should check out our
effort in JuliaML if you're more interested in more general machine
learning. In RL there are no "answers", only rewards.

On Sunday, July 31, 2016, Marcus Appelros notifications@github.com
wrote:

Let's say our child is practicing math and we have prepared a challenging
problem. The getter would be asking what they think the answer is and the
setter is telling them the answer.
On 31 Jul 2016 19:28, "Tom Breloff" <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I think that, without sample code, I'll have a hard time understanding
what
a "getter/setter" is. Do you mean a lookup table for states and
actions?
If
so, my interest lies much more in RL through function approximation,
so I
don't have much need for table lookup apis (though it could certainly
be
supported if others want that).

On Sunday, July 31, 2016, Marcus Appelros <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

The getters are straight forward, just query the policy as usual. The
setters are a form of supervised learning so it would make sense to
save
every set value as a training example, then we can have a basic
implementation and if a user builds up a large library of samples
they
can
easily plug their favorite library for supervised learning into the
setter
system.
On 31 Jul 2016 13:45, "Tom Breloff" <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');
<javascript:_e(%7B%7D,'cvml','notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

I would say that I haven't settled on a policy api yet... I've
been a
little more focused on the environments. If you have time, could
you
write
out a little example code of how you see initializing policies?
Looking
forward to what you come up with.

On Sunday, July 31, 2016, Marcus Appelros <
notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');
<javascript:_e(%7B%7D,'cvml','notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');');>>
wrote:

Require a way to conveniently manually provide initial knowledge
for
a
policy.

For example, say we have a hexagonal grid of which we are tasked
to
choose
a sequence in which it is certainly never correct to take the
first
pick
right at the grid edges, with a getter+setter we can both view
the
previous
edge probabilities and set them to zero.

Is such functionality in line with the intended directions?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1, or mute the
thread
<

https://github.com/notifications/unsubscribe-auth/AA492njPHEqSC2jN_LR2gRvLFUQAifpKks5qbCnbgaJpZM4JY9Ru

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<

#1 (comment)
,

or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AGCB2vQ8r-Xk5XAvDEXjl7BkK0AxnzN3ks5qbIrxgaJpZM4JY9Ru

.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<

#1 (comment)
,

or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AA492nQg-W3pkNtr5qc0MWiPJzRfsBG4ks5qbNKwgaJpZM4JY9Ru

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<
#1 (comment)
,
or mute the thread
<

https://github.com/notifications/unsubscribe-auth/AGCB2sa3CFGebaevB9yw8rjXcHqIQ2VPks5qbNtKgaJpZM4JY9Ru

.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<
https://github.com/tbreloff/Reinforce.jl/issues/1#issuecomment-236444186>,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AA492m8FFEIwojCH6oSr64DzlwmFgLDEks5qbNxggaJpZM4JY9Ru

.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGCB2sKlC8AbAx9Ofsi0n1By2cVSsgw0ks5qbN7-gaJpZM4JY9Ru
.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants