강화학습(Reinforced Learning) Monte Carlo 학습 구현

Data

강화학습(Reinforced Learning) Monte Carlo 학습 구현

patrck_jjh 2021. 5. 18. 22:30

GridWorld 구현

import random
import numpy as np

class GridWorld():
    def __init__(self):
        self.x=0
        self.y=0
    
    def step(self, a):
        # 0번 액션: 왼쪽, 1번 액션: 위, 2번 액션: 오른쪽, 3번 액션: 아래쪽
        if a==0:
            self.move_left()
        elif a==1:
            self.move_up()
        elif a==2:
            self.move_right()
        elif a==3:
            self.move_down()

        reward = -1 # 보상은 항상 -1로 고정
        done = self.is_done()
        return (self.x, self.y), reward, done

    def move_right(self):
        self.y += 1  
        if self.y > 3:
            self.y = 3
      
    def move_left(self):
        self.y -= 1
        if self.y < 0:
            self.y = 0
      
    def move_up(self):
        self.x -= 1
        if self.x < 0:
            self.x = 0
  
    def move_down(self):
        self.x += 1
        if self.x > 3:
            self.x = 3

    def is_done(self):
        if self.x == 3 and self.y == 3:
            return True
        else :
            return False

    def get_state(self):
        return (self.x, self.y)
      
    def reset(self):
        self.x = 0
        self.y = 0
        return (self.x, self.y)

y는 오른쪽,왼쪽을 x는 위쪽, 아래쪽을 의미

__init__ : 처음 시점은 (0,0)

is_done: (3,3)지점에 다다르면 True값을 return하여 종료

Agent구현

class Agent():
    def __init__(self):
        pass        

    def select_action(self):
        coin = random.random()
        if coin < 0.25:
            action = 0
        elif coin < 0.5:
            action = 1
        elif coin < 0.75:
            action = 2
        else:
            action = 3
        return action

coin에 0과 1사이의 임의의 숫자를 생성하여 위,아래,오른쪽,왼쪽으로 옮겨가는 action을 1/4확률로 하도록 설정.

Main class

def main():
    env = GridWorld()
    agent = Agent()
    data = [[0,0,0,0],[0,0,0,0],[0,0,0,0],[0,0,0,0]]
    gamma = 1.0
    reward = -1
    alpha = 0.001

    for k in range(50000):
        done = False
        history = []

        while not done:
            action = agent.select_action()
            (x,y), reward, done = env.step(action)
            history.append((x,y,reward))
        env.reset()

        cum_reward = 0
        for transition in history[::-1]:
            x, y, reward = transition
            data[x][y] = data[x][y] + alpha*(cum_reward-data[x][y])
            cum_reward = reward + gamma*cum_reward  
            
    for row in data:
        print(row)

env와 agent로 각각 GridWorld와 Agent의 instance 생성

data에 4x4그리드 월드 생성

어떤 개념을 구현할 때 Class가 많이 활용된다. Class에 대해서 이해가 부족했는데 이번 내용을 공부하며 어떤식으로 Class가 작동하고 활용되는지 감이 잡힐 수 있었다.

바닥부터 배우는 강화학습(노승은, 영진닷컴)

'Data' 카테고리의 다른 글

Backpropagation(역전파) - 은닉층 가중치 업데이트 계산식 도출 (0)	2021.05.25
Backpropagation(역전파) - 출력층 가중치 업데이트 계산식 도출 (0)	2021.05.22
딥러닝 순전파 실습(퍼셉트론 계산과정) (0)	2021.05.16
금융 투자 기본 개념(샤프비율, 효율적 투자선) (0)	2021.05.08
금융 투자 기본 개념(Return, Risk) (0)	2021.05.02

현재글강화학습(Reinforced Learning) Monte Carlo 학습 구현

패트릭의 개발노트

백준, 알고리즘, backtrader, 자바, 금융사API, 방문 길이, #코테#프로그래머스, Git#Gtihub#Git Code, yfinance#주식데이터#파이썬, 정보처리기사 실기 #자격증 #개발자 자격증, Sorted, find(), SQL, SQLAchemy, #프로그래머스#코테, Python, 프로그래머스, mysql, 코테, 주가데이터,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

패트릭의 개발노트