Previous: , Up: Assembler Programs   [Contents][Index]


5.14 Example of Working with an Assembler Program

In the example of working with an assembler program, an agent controlled by the sample program has to find a path to the gold in a labyrinth and then to an exit from the labyrinth. As in the sample program samples/labyr2.c described in Example of Using the Actor API, the C source code contains the picture of this labyrinth encoded using a subset of ASCII characters resembling pseudographics. You can change that picture to test agent behavior for various labyrinth configurations.

To learn a labyrinth configuration, the agent visits the labyrinth the number of times defined by the macro NVISIT. Moving the agent to a cell designated as a labyrinth exit finishes a labyrinth visit.

The picture of a labyrinth cell is in Figure 2.8. See the description of a labyrinth cell and the macros MAX_X, MAX_Y, ENTRY_X and ENTRY_Y around that figure.

The agent does not know its precise location in the labyrinth in the default mode of program operation—the agent receives a limited amount of information about a current location. This information includes an indication whether a current cell is an exit from the labyrinth and, if the current cell is not an exit from the labyrinth, the union of the following pieces of information:

The macro NSTATE specifies the number of tracked environment states in the default mode of program operation. Defining that macro to a lesser value may result in infinite looping.

Just after its invocation, the sample program prints the number of tracked environment states. After every labyrinth visit, the sample program prints the following information:

  1. The ordinal number of a labyrinth visit.
  2. The number of times the agent found the gold since the beginning of program execution.
  3. The length of last visiting path.
  4. The length of all traversed visiting paths.
  5. The average amount of gold found per one move in the labyrinth.

In the default mode of program operation, the agent visits the labyrinth a number of times without taking the gold until at some point the agent usually “understands” how to take the gold and then finds the gold at almost every labyrinth visit.

At last labyrinth visit, the sample program switches the terminal to full-screen mode and shows a labyrinth picture. After pressing SPACE, the sample program shows movements of the agent in the labyrinth and prints the current length of visiting path and an indication whether or not the agent took the gold. After the agent finds an exit from the labyrinth, pressing Q turns off full-screen terminal mode and quits the sample program.

You can specify a random seed by the first program argument. If the random seed is non-negative, the agent operates adaptively (normally). If the random seed is negative, the agent operates randomly. You can compare agent behavior and program output for these two modes of program execution.

The special mode of program operation is the use of a profile assembler program helping the agent to do its job more efficiently. The second program argument turns on that special mode and specifies the name of a file with the profile assembler program.

Note: when using a custom profile assembler program, you may need to increase the sizes of the pools of probabilities lists in normal form, that is, increase values assigned to desc.profile_pool_env_sz and desc.profile_pool_opt_sz in the function main.

The file samples/maze.asm in the package distribution contains the source text of a profile assembler program for use with labyrinths that have up to 10 cells in width and height. This assembler program actually provides tracking a precise location of the agent in the labyrinth—the agent finds the gold at almost all labyrinth visits. The final visiting path shown in full-screen terminal mode does not have vacillations taking place in the default mode of program operation.

The content of that file is below.

        ; RELOCate.
        ;
        ; Analyze the outcome of an `mv' instruction for performing the
        ; next move.
        ;
        ; l_repeat - jump label for repeating an attempt to perform the
        ;            move if the `mv' instruction moved the
        ;            agent to an obstacle.
        ;
        ; new_row  - new row index for the agent if the movement
        ;            was successful.

        ; new_col  - new column index for the agent if the movement
        ;            was successful.

reloc   macro   l_repeat, new_row, new_col

l_g0    def     g0_r##new_row##_c##new_col
l_g1    def     g1_r##new_row##_c##new_col

        joe     0, l_g0
        joe     1, l_g0
        joe     2, l_g0
        joe     3, l_g0
        joe     4, l_g0
        joe     5, l_g0
        joe     6, l_g0
        joe     7, l_g0
        joe     8, l_g0
        joe     9, l_g0
        joe     10, l_g0
        joe     11, l_g0
        joe     12, l_g0
        joe     13, l_g0
        joe     14, l_g0
        joe     15, l_g0
        joe     32, l_g1
        joe     33, l_g1
        joe     34, l_g1
        joe     35, l_g1
        joe     36, l_g1
        joe     37, l_g1
        joe     38, l_g1
        joe     39, l_g1
        joe     40, l_g1
        joe     41, l_g1
        joe     42, l_g1
        joe     43, l_g1
        joe     44, l_g1
        joe     45, l_g1
        joe     46, l_g1
        joe     47, l_g1
        jmp     l_repeat

        end     macro


        ; Select a movement direction and analyze a movement outcome.
        ;
        ; is_gf     - indication whether the agent has found gold:
        ;             1 = has found;
        ;             0 = not yet found.
        ;
        ; row       - current row index for the agent.
        ; col       - current column index for the agent.
        ;
        ; row_north - new row index for the agent if it moves one step in
        ;             the north direction.
        ;
        ; row_south - new row index for the agent if it moves one step in
        ;             the south direction.
        ;
        ; col_west  - new column index for the agent if it moves one step
        ;             in the west direction.
        ;
        ; col_east  - new column index for the agent if it moves one step
        ;             in the east direction.

state   macro   is_gf, row, col,
                row_north, row_south, col_west, col_east

repeat  def     u##__UNIQUE__
m_north def     u##__UNIQUE__
m_east  def     u##__UNIQUE__
m_south def     u##__UNIQUE__

g##is_gf##_r##row##_c##col:
repeat: stt

        choice
        case    0.25, m_north
        case    0.25, m_east
        case    0.25, m_south
        end     choice

        mv      west
        reloc   repeat, row, col_west

m_south:
        mv      south
        reloc   repeat, row_south, col

m_east: mv      east
        reloc   repeat, row, col_east

m_north:
        mv      north
        reloc   repeat, row_north, col

        end     macro


        ; STates for a ROW.
        ;
        ; Movement selection states for a specific agent location row.
        ;
        ; is_gf     - indication whether the agent has found gold:
        ;             1 = has found;
        ;             0 = not yet found.
        ;
        ; row       - current row index for the agent.
        ;
        ; row_north - new row index for the agent if it moves one step in
        ;             the north direction.
        ;
        ; row_south - new row index for the agent if it moves one step in
        ;             the south direction.


strow   macro   is_gf, row, row_north, row_south

        state   is_gf, row, 0, row_north, row_south, 9, 1
        state   is_gf, row, 1, row_north, row_south, 0, 2
        state   is_gf, row, 2, row_north, row_south, 1, 3
        state   is_gf, row, 3, row_north, row_south, 2, 4
        state   is_gf, row, 4, row_north, row_south, 3, 5
        state   is_gf, row, 5, row_north, row_south, 4, 6
        state   is_gf, row, 6, row_north, row_south, 5, 7
        state   is_gf, row, 7, row_north, row_south, 6, 8
        state   is_gf, row, 8, row_north, row_south, 7, 9
        state   is_gf, row, 9, row_north, row_south, 8, 0

        end     macro


        ; STates for all ROWS.
        ;
        ; Movement selection states for all agent location rows.
        ;
        ; is_gf - indication whether the agent has found gold:
        ;         1 = has found;
        ;         0 = not yet found.

strows  macro   is_gf

        strow   is_gf, 0, 9, 1
        strow   is_gf, 1, 0, 2
        strow   is_gf, 2, 1, 3
        strow   is_gf, 3, 2, 4
        strow   is_gf, 4, 3, 5
        strow   is_gf, 5, 4, 6
        strow   is_gf, 6, 5, 7
        strow   is_gf, 7, 6, 8
        strow   is_gf, 8, 7, 9
        strow   is_gf, 9, 8, 0

        end     macro


        ; Generate movement selection states for the case when the agent
        ; has not yet found gold and the case when the agent has
        ; found the gold.

        strows  0
        strows  1

The file samples/maze_asm.c in the package distribution provides the source code of the sample program. A copy of that source code is below. The command make builds the sample program if the configure script has configured QSMM to use the ncurses library. See the file INSTALL in the root of the package distribution for information on the configure script.

#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#if defined(HAVE_CURSES_H)
#   include <curses.h>
#elif defined(HAVE_NCURSES_CURSES_H)
#   include <ncurses/curses.h>
#endif

#include <qsmm/qsmm.h>

#define NVISIT  200
#define NSTATE  512
#define MAX_X   8
#define MAX_Y   8
#define ENTRY_X 1
#define ENTRY_Y 8


#define CHK_FAIL(func, ...)                                               \
    do {                                                                  \
        int rc=func(__VA_ARGS__);                                         \
        if (rc<0) ERREXIT( #func ": %s", qsmm_err_str(rc));               \
    }                                                                     \
    while (0)


#define ERREXIT(fmt, ...)                                                 \
    do {                                                                  \
        fprintf(stderr,fmt "\n", ## __VA_ARGS__);                         \
        goto Exit;                                                        \
    }                                                                     \
    while (0)


enum direct_e {
    DIRECT_NORTH=0,  // move one step up
    DIRECT_EAST =1,  // move one step right
    DIRECT_SOUTH=2,  // move one step down
    DIRECT_WEST =3   // move one step left
};


static char is_gold_found;
static int path_len_pass, n_gold_found=0, path_len_total=0;
static qsmm_prg_t prg_asm=0;
static qsmm_msglist_t msglist=0;


// Get the bitmask of allowed movement directions.

static unsigned char
get_percept(
    const char **picture_pp,
    int xx,
    int yy
) {
    unsigned char result=0;
    const int col=xx*3+1, row=yy*2+1;
    assert(picture_pp[row][col]==' ' && picture_pp[row][col+1]==' ');
    assert((picture_pp[row-1][col]==' ' &&
            picture_pp[row-1][col+1]==' ') ||
           (picture_pp[row-1][col]!=' ' &&
            picture_pp[row-1][col+1]!=' '));
    assert((picture_pp[row+1][col]==' ' &&
            picture_pp[row+1][col+1]==' ') ||
           (picture_pp[row+1][col]!=' ' &&
            picture_pp[row+1][col+1]!=' '));
    if (picture_pp[row-1][col]==' ' && picture_pp[row-1][col+1]==' ')
        result|=1 << DIRECT_NORTH;
    if (picture_pp[row][col+2]==' ') result|=1 << DIRECT_EAST;
    if (picture_pp[row+1][col]==' ' && picture_pp[row+1][col+1]==' ')
        result|=1 << DIRECT_SOUTH;
    if (picture_pp[row][col-1]==' ') result|=1 << DIRECT_WEST;
    return result;
}


// Make a movement in the labyrinth in a specified direction possibly
// showing the movement on the screen.
// Returns: 3 = movement resulted in exiting the labyrinth;
//          2 = movement resulted in taking the gold;
//          1 = movement made successfully;
//          0 = cannot make the movement because a new location cell of the
//              agent is not empty.

static int
opaque_maze(
    enum direct_e direct,
    unsigned char *percept_p
) {
    static const char *picture[]={
        // 0  1  2  3  4  5  6  7  8
        "+-----+--+--#..#--+--------+",
        "|     |  |        |        |",  // 0
        "+--+  |  |  #--#  |  +  +  |",
        "|     |  |     |  |  |  |  |",  // 1
        "|  +--+  |  +--+  +--+  *  *",
        "|        |     |  |     |  |",  // 2
        "+--+  +--+-----+  +--+  *--*",
        "|     |  |        |        |",  // 3
        "|  +--+  +  +  +  |  +--+  |",
        "|           |  |  |     |  |",  // 4
        "|  +--+  +--+  |  |  +--+  |",
        "|  |     |     |  |     |  |",  // 5
        "+--+  +--+--+  +--+-----+  |",
        "|     |  |        |        |",  // 6
        "+--+  +  |  +--+  +--+  +--+",
        "|     |  |     |           |",  // 7
        "|  +--+  |  +--+  +--+  +--+",
        "|        |     |  |        |",  // 8
        "+--+..+--+-----+--+--------+"
    };
    static char is_gf=0;
    static int path_len, visit=0, xx=-1, yy=-1;
    char ss[128];
    int col, row, result=1;
    if (xx<0 || yy<0) {
        if (visit==NVISIT-1) {
            if (!initscr()) exit(2);
            noecho();
            for (row=0; row<=(MAX_Y+1)*2+1; row++)
                mvaddstr(row,0,picture[row]);
            row=(MAX_Y+1)*2+3;
            mvaddstr(row,0,"Press [Space] to start moving");
            while (getch()!=' ') ;
            sprintf(ss,"%64s","");
            mvaddstr(row,0,ss);
        }
        xx=ENTRY_X;
        yy=ENTRY_Y;
        path_len=0;
    }
    assert(xx>=0 && xx<=MAX_X);
    assert(yy>=0 && yy<=MAX_Y);
    unsigned char percept;
    if (get_percept(picture,xx,yy) & (1 << direct)) {
        col=xx*3+1;
        row=yy*2+1;
        switch (direct) {
            case DIRECT_NORTH: yy--; break;
            case DIRECT_EAST:  xx++; break;
            case DIRECT_SOUTH: yy++; break;
            case DIRECT_WEST:  xx--; break;
            default: assert(0);
        }
        percept=get_percept(picture,xx,yy);
        path_len++;
        if (visit==NVISIT-1) mvaddstr(row,col,"  ");
        col=xx*3+1;
        row=yy*2+1;
        if (visit==NVISIT-1) {
            int picture_w=strlen(picture[0]);
            mvaddstr(row,col,"[]");
            sprintf(ss," Gold found: %d",is_gf);
            mvaddstr(1,picture_w+2,ss);
            sprintf(ss,"Path length: %d",path_len);
            mvaddstr(3,picture_w+2,ss);
            move((MAX_Y+1)*2+3,0);
            refresh();
            usleep(125000);
        }
        if (picture[row-1][col-1]=='*' && picture[row-1][col+2]=='*' &&
            picture[row+1][col-1]=='*' && picture[row+1][col+2]=='*') {
            if (!is_gf) {
                is_gf=1;
                result=2;
            }
        }
        else if (picture[row-1][col-1]=='#' && picture[row-1][col+2]=='#' &&
                 picture[row+1][col-1]=='#' && picture[row+1][col+2]=='#') {
            is_gf=0;
            result=3;
            xx=-1;
            yy=-1;
            if (visit==NVISIT-1) {
                row=(MAX_Y+1)*2+3;
                mvaddstr(row,0,"Press [Q] to exit");
                while (1) {
                    int key=getch();
                    if (key=='q' || key=='Q') break;
                }
                endwin();
            }
            else visit++;
        }
    }
    else {
        percept=get_percept(picture,xx,yy)+16;
        result=0;
    }
    if (is_gf) percept+=32;
    if (percept_p) *percept_p=percept;
    return result;
}


static QSMM_INSTR_META_CLASS(mv) {
    enum direct_e direct=0;
    if (QSMM_HAS_INSTR_CLASS(qsmm_evt))
        qsmm_get_eh_instr_param(qsmm,sizeof(direct),&direct);
    switch (qsmm_evt) {
        case QSMM_EVT_INSTR_CLASS_INIT: {
            const char *ccp;
            switch (direct) {
                case DIRECT_NORTH: ccp="north"; break;
                case DIRECT_EAST:  ccp="east";  break;
                case DIRECT_SOUTH: ccp="south"; break;
                case DIRECT_WEST:  ccp="west";  break;
                default: assert(0);
            }
            qsmm_set_eh_instr_param_str_f(qsmm,"%s",ccp);
            qsmm_set_eh_noutcome(qsmm,64);
            break;
        }
        case QSMM_EVT_ACTIVATE: {
            unsigned char percept=0;
            const int rc=opaque_maze(direct,&percept);
            qsmm_time_delta(qsmm,1);
            switch (rc) {
                case 0:
                case 1:
                    break;
                case 2:
                    is_gold_found=1;
                    n_gold_found++;
                    break;
                case 3:
                    if (is_gold_found) qsmm_spur_delta(qsmm,1,1);
                    qsmm_return_to_caller_node(qsmm);
                    break;
                default:
                    assert(0);
            }
            if (rc!=3) {
                if (rc) {
                    path_len_pass++;
                    path_len_total++;
                }
                qsmm_set_instr_outcome(qsmm,percept);
            }
            break;
        }
    }
    return 0;
}


static QSMM_INSTR_CLASS_SET(walker) {
    switch (qsmm_evt) {
        case QSMM_EVT_ENT_INIT: {
            enum direct_e direct;
            direct=DIRECT_NORTH, QSMM_REG_INSTR_CLASS_PARAM(mv,direct);
            direct=DIRECT_EAST,  QSMM_REG_INSTR_CLASS_PARAM(mv,direct);
            direct=DIRECT_SOUTH, QSMM_REG_INSTR_CLASS_PARAM(mv,direct);
            direct=DIRECT_WEST,  QSMM_REG_INSTR_CLASS_PARAM(mv,direct);
            qsmm_sig_t nstate=0;
            if (prg_asm)
                qsmm_get_prg_nstate_v2(qsmm, __FUNCTION__, 0,
                                       QSMM_EXCEPT_ALL, prg_asm, 0,
                                       &nstate);
            else nstate=NSTATE;
            printf("nstate %d\n",nstate);
            qsmm_set_nstate_max(qsmm,__FUNCTION__,nstate);
            QSMM_NODE_CREATE(0);
            break;
        }
        case QSMM_EVT_ENGINE_INIT:
            if (prg_asm) qsmm_node_asm(qsmm,0,0,prg_asm,msglist);
            break;
        case QSMM_EVT_NODE_ENTER: {
            unsigned char percept=0;
            is_gold_found=0;
            opaque_maze(DIRECT_NORTH,&percept);
            break;
        }
    }
    return 0;
}


// Find a path to gold in the labyrinth and then to its exit possibly
// using a node probability profile specified by an assembler program.

int
main(
    int argc,
    char **argv
) {
    int seed=0, exit_code=1;
    qsmm_t qsmm=0;
    qsmm_msglist_t msglist_dump=0;
    if (argc>2) {
        CHK_FAIL(qsmm_msglist_create,&msglist);
        CHK_FAIL(qsmm_parse_asm_source_file, argv[2],
                 QSMM_PARSE_ASM_PREPROCESS, 0, 0, msglist, &prg_asm);
    }
    struct qsmm_desc_s desc;
    memset(&desc,0,sizeof(desc));
    desc.dont_use_instr_class_weights=1;
    desc.is_large_env=1;
    desc.is_large_opt=1;
    desc.nspur=2;
    desc.stack_sz_max=1;
    desc.profile_pool_env_sz=2;
    desc.profile_pool_opt_sz=2;
    desc.compat=1;
    desc.sparse_fill_max=0.2;
    CHK_FAIL(qsmm_create,&desc,&qsmm);
    QSMM_REG_INSTR_META_CLASS(qsmm,mv,0);
    QSMM_REG_INSTR_CLASS_SET(qsmm,walker,0);
    qsmm_engine_create(qsmm);
    if (argc>1 && (seed=atoi(argv[1]))<0) {
        qsmm_set_random(qsmm,1);
        seed=-seed;
    }
    qsmm_rng_seed(qsmm_get_rng(qsmm),seed);
    for (int visit=0; visit<NVISIT; visit++) {
        path_len_pass=0;
        qsmm_node_call_default(qsmm,0,0);
        printf(
    "visit %d: ngf %d, path_pass %d, path_total %d, ngf/path_total %.8f\n",
               visit+1, n_gold_found, path_len_pass, path_len_total,
               (double) n_gold_found/path_len_total);
    }
    exit_code=0;

Exit:
    qsmm_prg_destroy(prg_asm);
    qsmm_destroy(qsmm);
    if (msglist) {
        msglist_dump=msglist;
        msglist=0;
        CHK_FAIL(qsmm_msglist_dump,msglist_dump,0,"BUFFER",0,stderr);
    }
    qsmm_msglist_destroy(msglist_dump);
    return exit_code;
}

Sample program output is below.

$ ./maze-asm -1
nstate 512
visit 1: ngf 0, path_pass 94, path_total 94, ngf/path_total 0.00000000
visit 2: ngf 0, path_pass 706, path_total 800, ngf/path_total 0.00000000
...
visit 101: ngf 19, path_pass 640, path_total 87374, ngf/path_total 0.00021746
visit 102: ngf 19, path_pass 962, path_total 88336, ngf/path_total 0.00021509
...
visit 199: ngf 37, path_pass 90, path_total 178714, ngf/path_total 0.00020703
visit 200: ngf 37, path_pass 726, path_total 179440, ngf/path_total 0.00020620
$ ./maze-asm 1
nstate 512
visit 1: ngf 0, path_pass 632, path_total 632, ngf/path_total 0.00000000
visit 2: ngf 1, path_pass 2712, path_total 3344, ngf/path_total 0.00029904
...
visit 101: ngf 84, path_pass 570, path_total 115834, ngf/path_total 0.00072518
visit 102: ngf 85, path_pass 290, path_total 116124, ngf/path_total 0.00073198
...
visit 199: ngf 182, path_pass 370, path_total 150890, ngf/path_total 0.00120618
visit 200: ngf 183, path_pass 360, path_total 151250, ngf/path_total 0.00120992

The above pieces of output show that the average amount of gold found per one move in the labyrinth in the adaptive mode of program operation with random seed 1 and without using the profile assembler program is about 6 times greater than in the random mode of program operation.

$ ./maze-asm -1 maze.asm 
nstate 200
visit 1: ngf 1, path_pass 3086, path_total 3086, ngf/path_total 0.00032404
visit 2: ngf 1, path_pass 274, path_total 3360, ngf/path_total 0.00029762
...
visit 100: ngf 17, path_pass 430, path_total 91868, ngf/path_total 0.00018505
visit 101: ngf 17, path_pass 768, path_total 92636, ngf/path_total 0.00018351
...
visit 199: ngf 43, path_pass 132, path_total 192766, ngf/path_total 0.00022307
visit 200: ngf 43, path_pass 118, path_total 192884, ngf/path_total 0.00022293
./maze-asm 1 maze.asm
nstate 200
visit 1: ngf 1, path_pass 3086, path_total 3086, ngf/path_total 0.00032404
visit 2: ngf 2, path_pass 52, path_total 3138, ngf/path_total 0.00063735
...
visit 100: ngf 100, path_pass 52, path_total 8234, ngf/path_total 0.01214477
visit 101: ngf 101, path_pass 52, path_total 8286, ngf/path_total 0.01218923
...
visit 199: ngf 199, path_pass 52, path_total 13382, ngf/path_total 0.01487072
visit 200: ngf 200, path_pass 52, path_total 13434, ngf/path_total 0.01488760

The above pieces of output show that the average amount of gold found per one move in the labyrinth in the adaptive mode of program operation with random seed 1 and the use of the profile assembler program is about 67 times greater than in the random mode of program operation.


Previous: , Up: Assembler Programs   [Contents][Index]