Previous: , Up: Assembler Programs   [Contents][Index]


5.15 Example of Working with an Assembler Program

In the example of working with an assembler program, an agent controlled by the sample program has to find a path to the gold in a labyrinth and then to an exit from the labyrinth. As in sample program samples/labyr2.c described in Example of Using the Actor API, the image of the labyrinth is encoded in the C source program text using a subset of ASCII characters that look like pseudographics. You can change the image to test the behavior of the agent for various configurations of the labyrinth.

To learn the configuration of the labyrinth, the agent visits the labyrinth the number of times defined by the macro NVISIT. A visit is considered finished when the agent moves to a cell where the labyrinth exit is located.

Each labyrinth cell is a rectangle 4x3. Maximum zero-based indices of a column and a row of labyrinth cell are defined by the macros MAX_X and MAX_Y. Coordinates of the entry cell of the labyrinth relative to its upper left corner are defined by the macros ENTRY_X and ENTRY_Y. Characters ‘*’ at cell corners denote a cell with the gold. Characters ‘#’ at cell corners denote a labyrinth exit.

Space characters denote allowed movement paths, and other characters denote places to which movement is impossible. If a cell is reachable from the labyrinth entry, then it must have two spaces in the middle. Every reachable cell must also have two spaces or two non-spaces at the top and at the bottom (in the middle), which indicates whether movement is allowed to an adjacent cell located at the north and/or the south. When modifying the image of the labyrinth, make sure that cells are properly aligned and connected, and there are no paths outside of the labyrinth, otherwise an assertion failure will occur.

Although the sample program uses a picture of labyrinth encoded in its source text, in other respects this program works much like as sample program samples/maze.c described in Example of Working in Large-scale Mode. That is, in the default mode of program operation, the agent does not know its precise location in the labyrinth and receives only limited amount of information about current location. However, as opposed to sample program samples/maze.c, an agent controlled by the program described in this section receives either an indication whether current cell is an exit from the labyrinth or, if the current cell is not an exit from the labyrinth, a union of the following pieces of information:

In the default mode of program operation, the number of tracked environment states is defined by the macro NSTATE. This number is about 3.5 times greater than in sample program samples/maze.c, which along with much greater amount of information about current condition of visiting the labyrinth received by the agent considerably decreases the probability of infinite looping during program execution. Otherwise, such infinite loopings would occur once in a while for some reason.

After its invocation, the sample program prints the number of tracked environment states. Then, after every visit of the labyrinth, the sample program prints an index number of the visit, the number of times the agent found the gold since the beginning of program execution, the length of all traversed visiting paths, and the average amount of gold found per one move in the labyrinth. In the default mode of program operation for the default configuration of the labyrinth, after starting the program, the agent visits the labyrinth a number of times without taking the gold, until at some point the agent usually “understands” how to take the gold, and then it finds the gold at almost every visit of the labyrinth.

At the last visit of the labyrinth, the sample program switches the terminal to full-screen mode and shows the picture of the labyrinth. After a user presses SPACE, the sample program shows movements of the agent in the labyrinth. An indication whether the agent took the gold and current length of visiting path are printed. After the agent finds an exit from the labyrinth, the user can press Q to turn off full-screen terminal mode and quit the sample program with dumping to file prg_disasm in the current directory a learned assembler program according to which the agent operates.

A random seed can be specified as the first program argument. If the random seed is non-negative, then the agent will operate normally. If the random seed is negative, then the agent will move in the labyrinth completely randomly. You could compare the program output for these two modes of program execution.

The special mode of operation of the sample program, as opposed to the default mode, is the use of a profile assembler program that helps the agent to do its job more efficiently. The name of a file with the profile assembler program should be specified as the second program argument. A single node of the model is assembled with the QSMM_ASM_TEMPLATE flag set, so when disassembling the node at the end of execution of the sample program, a disassembled program will be the input assembler program, in which profile probabilities are replaced with learned probabilities.

The source text of the profile assembler program is provided in the file samples/prg_maze in the package distribution and is also given below. That assembler program is intended for use with labyrinths that have up to 10 cells in width and height. It actually provides tracking a precise location of the agent in the labyrinth, which is why the agent will find the gold at almost all visits of the labyrinth. The final visiting path shown in full-screen terminal mode at the end of execution of the sample program does not have vacillations that take place when the profile assembler program is not used.

        .data

ls_a    probeq  3

        .code


reloc   macro   l_repeat, new_row, new_col

l_g0    def     g0_r##new_row##_c##new_col
l_g1    def     g1_r##new_row##_c##new_col

        joe     0, l_g0
        joe     1, l_g0
        joe     2, l_g0
        joe     3, l_g0
        joe     4, l_g0
        joe     5, l_g0
        joe     6, l_g0
        joe     7, l_g0
        joe     8, l_g0
        joe     9, l_g0
        joe     10, l_g0
        joe     11, l_g0
        joe     12, l_g0
        joe     13, l_g0
        joe     14, l_g0
        joe     15, l_g0
        joe     32, l_g1
        joe     33, l_g1
        joe     34, l_g1
        joe     35, l_g1
        joe     36, l_g1
        joe     37, l_g1
        joe     38, l_g1
        joe     39, l_g1
        joe     40, l_g1
        joe     41, l_g1
        joe     42, l_g1
        joe     43, l_g1
        joe     44, l_g1
        joe     45, l_g1
        joe     46, l_g1
        joe     47, l_g1
        jmp     l_repeat

        end     macro


state   macro   is_gf, row, col,
                row_north, row_south, col_west, col_east

repeat  def     u##__UNIQUE__
m_north def     u##__UNIQUE__
m_east  def     u##__UNIQUE__
m_south def     u##__UNIQUE__

g##is_gf##_r##row##_c##col:
repeat: stt

        casels  ls_a,
                m_north,
                m_east,
                m_south

        mv      west
        reloc   repeat, row, col_west

m_south:
        mv      south
        reloc   repeat, row_south, col

m_east: mv      east
        reloc   repeat, row, col_east

m_north:
        mv      north
        reloc   repeat, row_north, col

        end     macro


strow   macro   is_gf, row, row_north, row_south

        state   is_gf, row, 0, row_north, row_south, 9, 1
        state   is_gf, row, 1, row_north, row_south, 0, 2
        state   is_gf, row, 2, row_north, row_south, 1, 3
        state   is_gf, row, 3, row_north, row_south, 2, 4
        state   is_gf, row, 4, row_north, row_south, 3, 5
        state   is_gf, row, 5, row_north, row_south, 4, 6
        state   is_gf, row, 6, row_north, row_south, 5, 7
        state   is_gf, row, 7, row_north, row_south, 6, 8
        state   is_gf, row, 8, row_north, row_south, 7, 9
        state   is_gf, row, 9, row_north, row_south, 8, 0

        end     macro


strows  macro   is_gf

        strow   is_gf, 0, 9, 1
        strow   is_gf, 1, 0, 2
        strow   is_gf, 2, 1, 3
        strow   is_gf, 3, 2, 4
        strow   is_gf, 4, 3, 5
        strow   is_gf, 5, 4, 6
        strow   is_gf, 6, 5, 7
        strow   is_gf, 7, 6, 8
        strow   is_gf, 8, 7, 9
        strow   is_gf, 9, 8, 0

        end     macro


        strows  0
        strows  1

The source code of the sample program is provided in the file samples/maze_asm.c in the package distribution and is also given below. The command make builds the sample program if the QSMM package is configured by the configure script to use the Curses library. See the file INSTALL in the root of the package distribution for information on the configure script.

#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#if defined(HAVE_CURSES_H)
#   include <curses.h>
#elif defined(HAVE_NCURSES_CURSES_H)
#   include <ncurses/curses.h>
#endif

#include <qsmm/qsmm.h>

#define NVISIT  400
#define NSTATE  512
#define MAX_X   8
#define MAX_Y   8
#define ENTRY_X 1
#define ENTRY_Y 8


#define ERREXIT(fmt, ...)                                                 \
    do {                                                                  \
        fprintf(stderr,(fmt), ## __VA_ARGS__);                            \
        fprintf(stderr,"\n");                                             \
        goto Exit;                                                        \
    }                                                                     \
    while (0)


#define CHK_FAIL(func, ...)                                               \
    do {                                                                  \
        int rc=func(__VA_ARGS__);                                         \
        if (rc<0) ERREXIT( #func ": %s", qsmm_err_str(rc));               \
    }                                                                     \
    while (0)


enum direct_e {
    DIRECT_NORTH=0,
    DIRECT_EAST =1,
    DIRECT_SOUTH=2,
    DIRECT_WEST =3
};


static char is_gold_found;
static int visit, n_gold_found=0, path_len=0;
static qsmm_prg_t prg_asm=0;
static qsmm_msglist_t msglist=0;


static unsigned char get_percept(const char **picture_pp, int xx, int yy) {
    unsigned char result=0;
    int col=xx*3+1, row=yy*2+1;
    assert(picture_pp[row][col]==' ' && picture_pp[row][col+1]==' ');
    assert((picture_pp[row-1][col]==' ' &&
            picture_pp[row-1][col+1]==' ') ||
           (picture_pp[row-1][col]!=' ' &&
            picture_pp[row-1][col+1]!=' '));
    assert((picture_pp[row+1][col]==' ' &&
            picture_pp[row+1][col+1]==' ') ||
           (picture_pp[row+1][col]!=' ' &&
            picture_pp[row+1][col+1]!=' '));
    if (picture_pp[row-1][col]==' ' && picture_pp[row-1][col+1]==' ')
        result|=1 << DIRECT_NORTH;
    if (picture_pp[row][col+2]==' ') result|=1 << DIRECT_EAST;
    if (picture_pp[row+1][col]==' ' && picture_pp[row+1][col+1]==' ')
        result|=1 << DIRECT_SOUTH;
    if (picture_pp[row][col-1]==' ') result|=1 << DIRECT_WEST;
    return result;
}


static int opaque_maze(enum direct_e direct,
                       unsigned char *percept_p) {
    static const char *picture[]={
        // 0  1  2  3  4  5  6  7  8
        "x-----+--x--#..#--x--------+",
        "|     |  |        |        |",  // 0
        "+--+  |  |  #--#  |  +  +  |",
        "|     |  |     |  |  |  |  |",  // 1
        "|  +--+  |  +--+  +--+  *  *",
        "|        |     |  |     |  |",  // 2
        "x--+  +--x-----+  x--+  *--*",
        "|     |  |        |        |",  // 3
        "|  +--+  +  +  +  |  +--+  |",
        "|           |  |  |     |  |",  // 4
        "|  +--+  +--+  |  |  +--+  |",
        "|  |     |     |  |     |  |",  // 5
        "x--+  +--x--+  +--x-----+  |",
        "|     |  |        |        |",  // 6
        "+--+  +  |  +--+  +--+  +--+",
        "|     |  |     |           |",  // 7
        "|  +--+  |  +--+  +--+  +--+",
        "|        |     |  |        |",  // 8
        "+--+..+--x-----+--x--------+"
    };
    static char is_gf=0;
    static int path_len, xx=-1, yy=-1;
    char ss[128];
    unsigned char percept;
    int col, row, result=0;
    if (xx<0 || yy<0) {
        if (visit==NVISIT-1) {
            if (!initscr()) exit(2);
            noecho();
            for (row=0; row<=(MAX_Y+1)*2+1; row++)
                mvaddstr(row,0,picture[row]);
            row=(MAX_Y+1)*2+3;
            mvaddstr(row,0,"Press [Space] to start moving");
            while (getch()!=' ') ;
            sprintf(ss,"%64s","");
            mvaddstr(row,0,ss);
        }
        xx=ENTRY_X;
        yy=ENTRY_Y;
        path_len=0;
    }
    assert(xx>=0 && xx<=MAX_X);
    assert(yy>=0 && yy<=MAX_Y);
    if (get_percept(picture,xx,yy) & (1 << direct)) {
        col=xx*3+1;
        row=yy*2+1;
        switch (direct) {
            case DIRECT_NORTH: yy--; break;
            case DIRECT_EAST:  xx++; break;
            case DIRECT_SOUTH: yy++; break;
            case DIRECT_WEST:  xx--; break;
            default: assert(0);
        }
        percept=get_percept(picture,xx,yy);
        path_len++;
        if (visit==NVISIT-1) mvaddstr(row,col,"  ");
        col=xx*3+1;
        row=yy*2+1;
        if (visit==NVISIT-1) {
            int picture_w=strlen(picture[0]);
            mvaddstr(row,col,"[]");
            sprintf(ss," Gold found: %d",is_gf);
            mvaddstr(1,picture_w+2,ss);
            sprintf(ss,"Path length: %d",path_len);
            mvaddstr(3,picture_w+2,ss);
            move((MAX_Y+1)*2+3,0);
            refresh();
            usleep(125000);
        }
        if (picture[row-1][col-1]=='*' && picture[row-1][col+2]=='*' &&
            picture[row+1][col-1]=='*' && picture[row+1][col+2]=='*') {
            if (!is_gf) {
                is_gf=1;
                result=1;
            }
        }
        else if (picture[row-1][col-1]=='#' && picture[row-1][col+2]=='#' &&
                 picture[row+1][col-1]=='#' && picture[row+1][col+2]=='#') {
            is_gf=0;
            result=2;
            xx=-1;
            yy=-1;
            if (visit==NVISIT-1) {
                row=(MAX_Y+1)*2+3;
                mvaddstr(row,0,"Press [Q] to exit");
                while (1) {
                    int key=getch();
                    if (key=='q' || key=='Q') break;
                }
                endwin();
            }
        }
    }
    else {
        percept=get_percept(picture,xx,yy)+16;
        result=3;
    }
    if (is_gf) percept+=32;
    if (percept_p) *percept_p=percept;
    return result;
}


static QSMM_INSTR_META_CLASS(mv) {
    const char *ccp;
    int rc;
    enum direct_e direct=0;
    if (QSMM_HAS_INSTR_CLASS(qsmm_evt))
        qsmm_get_eh_instr_param(qsmm,sizeof(direct),&direct);
    switch (qsmm_evt) {
        case QSMM_EVT_INSTR_CLASS_INIT:
            switch (direct) {
                case DIRECT_NORTH: ccp="north"; break;
                case DIRECT_EAST:  ccp="east";  break;
                case DIRECT_SOUTH: ccp="south"; break;
                case DIRECT_WEST:  ccp="west";  break;
                default: assert(0);
            }
            qsmm_set_eh_instr_param_str_f(qsmm,"%s",ccp);
            qsmm_set_eh_noutcome(qsmm,64);
            break;
        case QSMM_EVT_ACTIVATE: {
            unsigned char percept=0;
            rc=opaque_maze(direct,&percept);
            qsmm_time_delta(qsmm,1);
            switch (rc) {
                case 0:
                case 3:
                    break;
                case 1:
                    is_gold_found=1;
                    n_gold_found++;
                    break;
                case 2:
                    if (is_gold_found) qsmm_spur_delta(qsmm,1,1);
                    qsmm_return_to_caller_node(qsmm);
                    break;
                default:
                    assert(0);
            }
            if (rc!=2) {
                if (rc!=3) path_len++;
                qsmm_set_instr_outcome(qsmm,percept);
            }
            break;
        }
    }
    return 0;
}


static QSMM_INSTR_CLASS_SET(walker) {
    switch (qsmm_evt) {
        case QSMM_EVT_ENT_INIT: {
            int nstate;
            enum direct_e direct;
            direct=DIRECT_NORTH, QSMM_REG_INSTR_CLASS_PARAM(mv,direct);
            direct=DIRECT_EAST,  QSMM_REG_INSTR_CLASS_PARAM(mv,direct);
            direct=DIRECT_SOUTH, QSMM_REG_INSTR_CLASS_PARAM(mv,direct);
            direct=DIRECT_WEST,  QSMM_REG_INSTR_CLASS_PARAM(mv,direct);
            if (prg_asm)
                nstate=qsmm_get_prg_nstate(qsmm, __FUNCTION__,
                                           QSMM_ASM_TEMPLATE, prg_asm, 0);
            else nstate=NSTATE;
            printf("nstate = %d\n",nstate);
            qsmm_set_nstate_max(qsmm,__FUNCTION__,nstate);
            QSMM_NODE_CREATE(0);
            break;
        }
        case QSMM_EVT_ENGINE_INIT:
            if (prg_asm) qsmm_node_asm(qsmm, 0, QSMM_ASM_TEMPLATE,
                                       prg_asm, msglist);
            break;
        case QSMM_EVT_NODE_ENTER: {
            unsigned char percept=0;
            is_gold_found=0;
            opaque_maze(DIRECT_NORTH,&percept);
            break;
        }
    }
    return 0;
}


int main(int argc, char **argv) {
    const char *ccp;
    int seed=0, exit_code=1;
    qsmm_t qsmm=0;
    qsmm_prg_t prg_disasm=0;
    FILE *file_disasm_p=0;
    struct qsmm_desc_s desc;
    struct qsmm_disasm_desc_s disasm_desc;
    if (argc>2) {
        CHK_FAIL(qsmm_msglist_create,&msglist);
        CHK_FAIL(qsmm_parse_asm_source_file, argv[2],
                 QSMM_PARSE_ASM_PREPROCESS, 0, 0, msglist, &prg_asm);
    }
    memset(&desc,0,sizeof(desc));
    desc.dont_use_instr_class_weights=1;
    desc.is_large_env=1;
    desc.is_large_opt=1;
    desc.nspur=2;
    desc.stack_sz_max=1;
    desc.profile_pool_env_sz=2;
    desc.profile_pool_opt_sz=2;
    desc.compat=1;
    desc.sparse_fill_max=0.2;
    CHK_FAIL(qsmm_create,&desc,&qsmm);
    QSMM_REG_INSTR_META_CLASS(qsmm,mv,0);
    QSMM_REG_INSTR_CLASS_SET(qsmm,walker,0);
    qsmm_engine_create(qsmm);
    if (argc>1 && (seed=atoi(argv[1]))<0) {
        qsmm_set_random(qsmm,1);
        seed=-seed;
    }
    qsmm_rng_seed(qsmm_get_rng(qsmm),seed);
    for (visit=0; visit<NVISIT; visit++) {
        qsmm_node_call_default(qsmm,0,0);
        printf("visit %d: ngf=%d, path_len=%d, ngf/path_len=%.8f\n",
               visit+1, n_gold_found, path_len,
               (double) n_gold_found/path_len);
    }
    memset(&disasm_desc,0,sizeof(disasm_desc));
    disasm_desc.use_stt_start=1;
    disasm_desc.use_stt_state=1;
    disasm_desc.use_stt_lookup=1;
    disasm_desc.use_stt_abort=1;
    disasm_desc.use_choice=1;
    disasm_desc.use_abort_1=1;
    disasm_desc.fq_goto_min=1;
    disasm_desc.fq_action_min=1;
    if (argc<3) {
        disasm_desc.prob_goto_min=0.005;
        disasm_desc.prob_action_min=0.005;
    }
    qsmm_node_disasm(qsmm,0,&disasm_desc,&prg_disasm);
    if (!(file_disasm_p=fopen(ccp="prg_disasm","w")))
        ERREXIT("%s: failed to open the file for writing",ccp);
    CHK_FAIL(qsmm_prg_dump,prg_disasm,0,file_disasm_p);
    exit_code=0;

Exit:
    qsmm_prg_destroy(prg_disasm);
    qsmm_prg_destroy(prg_asm);
    qsmm_destroy(qsmm);
    if (msglist) {
        qsmm_msglist_dump(msglist,0,"BUFFER",0,stderr);
        qsmm_msglist_destroy(msglist);
    }
    if (file_disasm_p) fclose(file_disasm_p);
    return exit_code;
}

Here is given sample program output.

$ ./maze-asm -1
nstate = 512
visit 1: ngf=0, path_len=94, ngf/path_len=0.00000000
visit 2: ngf=0, path_len=800, ngf/path_len=0.00000000
...
visit 201: ngf=37, path_len=179608, ngf/path_len=0.00020600
visit 202: ngf=38, path_len=180284, ngf/path_len=0.00021078
...
visit 399: ngf=89, path_len=402678, ngf/path_len=0.00022102
visit 400: ngf=90, path_len=405574, ngf/path_len=0.00022191
$ ./maze-asm 1
nstate = 512
visit 1: ngf=0, path_len=94, ngf/path_len=0.00000000
visit 2: ngf=0, path_len=800, ngf/path_len=0.00000000
...
visit 200: ngf=175, path_len=98092, ngf/path_len=0.00178404
visit 201: ngf=176, path_len=98488, ngf/path_len=0.00178702
...
visit 399: ngf=373, path_len=180780, ngf/path_len=0.00206328
visit 400: ngf=374, path_len=181180, ngf/path_len=0.00206425
$ ./maze-asm -1 prg_maze
nstate = 200
visit 1: ngf=1, path_len=3086, ngf/path_len=0.00032404
visit 2: ngf=1, path_len=3360, ngf/path_len=0.00029762
...
visit 200: ngf=43, path_len=192884, ngf/path_len=0.00022293
visit 201: ngf=43, path_len=193488, ngf/path_len=0.00022224
...
visit 399: ngf=84, path_len=381448, ngf/path_len=0.00022021
visit 400: ngf=84, path_len=382074, ngf/path_len=0.00021985
$ ./maze-asm 1 prg_maze
nstate = 200
visit 1: ngf=1, path_len=3086, ngf/path_len=0.00032404
visit 2: ngf=2, path_len=3138, ngf/path_len=0.00063735
...
visit 200: ngf=200, path_len=13434, ngf/path_len=0.01488760
visit 201: ngf=201, path_len=13486, ngf/path_len=0.01490435
...
visit 399: ngf=399, path_len=23782, ngf/path_len=0.01677739
visit 400: ngf=400, path_len=23834, ngf/path_len=0.01678275

As it can be seen from the above output, the average amount of gold found per one move in the labyrinth in the normal mode of operation of the sample program with random seed 1 and without using the profile assembler program is about 10 times greater than in the completely random mode of operation. An average amount of gold found in the normal mode of operation with random seed 1 and the use of the profile assembler program is about 80 times greater than in the completely random mode of operation.


Previous: , Up: Assembler Programs   [Contents][Index]