Example of Working with an Assembler Program (QSMM Programmer Manual)

Previous: Using the Assembler Preprocessor, Up: Assembler Programs [Contents][Index]

5.13 Example of Working with an Assembler Program ¶

In the example of working with an assembler program, an agent controlled by the sample program has to find a path to the gold in a labyrinth and then to an exit from the labyrinth. As in the sample program labyr2.c described in Example of Using the Actor API, the C source code contains the picture of a labyrinth encoded using a subset of ASCII characters resembling pseudographics. You can change that picture to test agent behavior for various labyrinth configurations.

To learn a labyrinth configuration, the agent visits the labyrinth the number of times defined by the macro NVISIT. Moving the agent to a cell designated as a labyrinth exit finishes a labyrinth visit.

The picture of a labyrinth cell is in Figure 2.8. See the description of a labyrinth cell and the macros MAX_X, MAX_Y, ENTRY_X and ENTRY_Y around that figure.

The agent does not know its precise location in the labyrinth in default mode of program operation—the agent receives a limited amount of information about a current location. This information includes an indication whether a current cell is an exit from the labyrinth and, if the current cell is not an exit from the labyrinth, the union of the following pieces of information:

an indication whether the last move was successful or unsuccessful because of an obstacle;
a bitmask of four bits indicating ability to move in the north, east, south, or west direction from a current cell; an obstacle prevents a move;
an indication whether the agent has taken the gold during a current labyrinth visit.

The macro NSTATE specifies the number of tracked environment states in default mode of program operation. The sample program prints that number just after its invocation. Defining that macro to a lesser value may result in infinite looping.

After every labyrinth visit, the sample program prints the following information:

The ordinal number of a labyrinth visit.
The number of times the agent found the gold since the beginning of program execution.
The length of the last visiting path.
The length of all traversed visiting paths.
The average amount of gold found per one move in the labyrinth.

In default mode of program operation, the agent visits the labyrinth a number of times without taking the gold until at some point the agent usually “understands” how to take the gold and then finds the gold at almost every labyrinth visit.

At the last labyrinth visit, the sample program switches the terminal to full-screen mode and shows a labyrinth picture. After pressing SPACE, the sample program shows the movements of the agent in the labyrinth and prints current length of visiting path and an indication whether or not the agent took the gold. After the agent finds an exit from the labyrinth, pressing Q turns off full-screen terminal mode and quits the sample program.

You can specify a random seed by the first program argument. If the random seed is non-negative, the agent operates adaptively (normally). If the random seed is negative, the agent operates randomly. You can compare agent behavior and program output for the two modes of program operation.

Additional mode of program operation is the use of a profile assembler program helping the agent to do its job more efficiently. The second program argument turns on the additional mode by passing the name of a file with the profile assembler program.

The file samples/maze.asm in the package distribution installed to $prefix/share/qsmm/samples/maze.asm contains the source text of a profile assembler program for use with labyrinths that have up to 10 cells in width and height. The assembler program actually provides tracking a precise location of the agent in the labyrinth—the agent finds the gold at almost all labyrinth visits. The final visiting path shown in full-screen terminal mode does not have vacillations taking place in default mode of program operation.

The content of that file is below.

        ; RELOCate.
        ;
        ; Analyze the outcome of an `mv' instruction for performing the
        ; next move.

        ;
        ; l_repeat - jump label for repeating an attempt to perform the
        ;            move if the `mv' instruction moved the
        ;            agent to an obstacle.
        ;
        ; new_row  - new row index for the agent if the movement
        ;            was successful.

        ; new_col  - new column index for the agent if the movement
        ;            was successful.

reloc   macro   l_repeat, new_row, new_col

l_g0    def     g0_r##new_row##_c##new_col
l_g1    def     g1_r##new_row##_c##new_col

        joe     0, l_g0
        joe     1, l_g0
        joe     2, l_g0
        joe     3, l_g0
        joe     4, l_g0
        joe     5, l_g0
        joe     6, l_g0
        joe     7, l_g0
        joe     8, l_g0
        joe     9, l_g0
        joe     10, l_g0
        joe     11, l_g0
        joe     12, l_g0
        joe     13, l_g0
        joe     14, l_g0
        joe     15, l_g0
        joe     32, l_g1
        joe     33, l_g1
        joe     34, l_g1
        joe     35, l_g1
        joe     36, l_g1
        joe     37, l_g1
        joe     38, l_g1
        joe     39, l_g1
        joe     40, l_g1
        joe     41, l_g1
        joe     42, l_g1
        joe     43, l_g1
        joe     44, l_g1
        joe     45, l_g1
        joe     46, l_g1
        joe     47, l_g1
        jmp     l_repeat

        end     macro

        ; Select a movement direction and analyze a movement outcome.
        ;
        ; is_gf     - indication whether the agent has found gold:
        ;             1 = has found;
        ;             0 = not yet found.

        ;
        ; row       - current row index for the agent.
        ; col       - current column index for the agent.
        ;
        ; row_north - new row index for the agent if it moves one step in
        ;             the north direction.
        ;
        ; row_south - new row index for the agent if it moves one step in
        ;             the south direction.
        ;
        ; col_west  - new column index for the agent if it moves one step
        ;             in the west direction.
        ;
        ; col_east  - new column index for the agent if it moves one step
        ;             in the east direction.

state   macro   is_gf, row, col,
                row_north, row_south, col_west, col_east

repeat  def     u##__UNIQUE__
m_north def     u##__UNIQUE__
m_east  def     u##__UNIQUE__
m_south def     u##__UNIQUE__

g##is_gf##_r##row##_c##col:
repeat: stt

        choice
        case    0.25, m_north
        case    0.25, m_east
        case    0.25, m_south
        end     choice

        mv      west
        reloc   repeat, row, col_west

m_south:
        mv      south
        reloc   repeat, row_south, col

m_east: mv      east
        reloc   repeat, row, col_east

m_north:
        mv      north
        reloc   repeat, row_north, col

        end     macro


        ; STates for a ROW.
        ;
        ; Movement selection states for a specific agent location row.
        ;
        ; is_gf     - indication whether the agent has found gold:
        ;             1 = has found;
        ;             0 = not yet found.
        ;
        ; row       - current row index for the agent.
        ;
        ; row_north - new row index for the agent if it moves one step in
        ;             the north direction.
        ;
        ; row_south - new row index for the agent if it moves one step in
        ;             the south direction.


strow   macro   is_gf, row, row_north, row_south

        state   is_gf, row, 0, row_north, row_south, 9, 1
        state   is_gf, row, 1, row_north, row_south, 0, 2
        state   is_gf, row, 2, row_north, row_south, 1, 3
        state   is_gf, row, 3, row_north, row_south, 2, 4
        state   is_gf, row, 4, row_north, row_south, 3, 5
        state   is_gf, row, 5, row_north, row_south, 4, 6
        state   is_gf, row, 6, row_north, row_south, 5, 7
        state   is_gf, row, 7, row_north, row_south, 6, 8
        state   is_gf, row, 8, row_north, row_south, 7, 9
        state   is_gf, row, 9, row_north, row_south, 8, 0

        end     macro


        ; STates for all ROWS.
        ;
        ; Movement selection states for all agent location rows.
        ;
        ; is_gf - indication whether the agent has found gold:
        ;         1 = has found;
        ;         0 = not yet found.

strows  macro   is_gf

        strow   is_gf, 0, 9, 1
        strow   is_gf, 1, 0, 2
        strow   is_gf, 2, 1, 3
        strow   is_gf, 3, 2, 4
        strow   is_gf, 4, 3, 5
        strow   is_gf, 5, 4, 6
        strow   is_gf, 6, 5, 7
        strow   is_gf, 7, 6, 8
        strow   is_gf, 8, 7, 9
        strow   is_gf, 9, 8, 0

        end     macro


        ; Generate movement selection states for the case when the agent
        ; has not yet found gold and the case when the agent has
        ; found the gold.

        strows  0
        strows  1

The file samples/maze_asm.c in the package distribution installed to $prefix/share/qsmm/samples/maze_asm.c contains the source code of the sample program. A copy of that source code is below. The command make builds the sample program if the configure script has configured QSMM to use the ncurses library. The command make install installs the binary of the sample program and names the installed binary as ‘qsmm-example-maze-asm’ if the configure script has configured QSMM to install example programs. See the file INSTALL in the root of the package distribution for information on the configure script.

#include <config.h>

#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#if defined(HAVE_CURSES_H)
#   include <curses.h>
#elif defined(HAVE_NCURSES_CURSES_H)
#   include <ncurses/curses.h>
#endif


#include <qsmm/qsmm.h>

#define NVISIT  200
#define NSTATE  512
#define MAX_X   8
#define MAX_Y   8
#define ENTRY_X 1
#define ENTRY_Y 8


#define CHK_FAIL(func, ...)                                               \
    do {                                                                  \
        const int rc=func(__VA_ARGS__);                                   \
        if (rc<0) ERREXIT( #func ": %s", qsmm_err_str(rc));               \
    }                                                                     \
    while (0)


#define ERREXIT(fmt, ...)                                                 \
    do {                                                                  \
        fprintf(stderr,fmt "\n", ## __VA_ARGS__);                         \
        goto Exit;                                                        \
    }                                                                     \
    while (0)


enum direct_e {
    DIRECT_NORTH=0,  // move one step up
    DIRECT_EAST =1,  // move one step right
    DIRECT_SOUTH=2,  // move one step down
    DIRECT_WEST =3,  // move one step left
    DIRECT_COUNT     // number of movement directions
};


static char is_gold_found;
static int path_len_pass, n_gold_found=0, path_len_total=0;
static qsmm_prg_t prg_asm=0;
static qsmm_msglist_t msglist=0;


// Get the bitmask of allowed movement directions.

static unsigned char
get_percept(
    const char **picture_pp,
    int xx,
    int yy
) {
    unsigned char result=0;
    const int col=xx*3+1, row=yy*2+1;
    assert(picture_pp[row][col]==' ' && picture_pp[row][col+1]==' ');
    assert((picture_pp[row-1][col]==' ' &&
            picture_pp[row-1][col+1]==' ') ||
           (picture_pp[row-1][col]!=' ' &&
            picture_pp[row-1][col+1]!=' '));
    assert((picture_pp[row+1][col]==' ' &&
            picture_pp[row+1][col+1]==' ') ||
           (picture_pp[row+1][col]!=' ' &&
            picture_pp[row+1][col+1]!=' '));
    if (picture_pp[row-1][col]==' ' && picture_pp[row-1][col+1]==' ')
        result|=1 << DIRECT_NORTH;
    if (picture_pp[row][col+2]==' ') result|=1 << DIRECT_EAST;
    if (picture_pp[row+1][col]==' ' && picture_pp[row+1][col+1]==' ')
        result|=1 << DIRECT_SOUTH;
    if (picture_pp[row][col-1]==' ') result|=1 << DIRECT_WEST;
    return result;
}


// Make a movement in the labyrinth in a specified direction possibly
// showing the movement on the screen.
// Returns: 3 = movement resulted in exiting the labyrinth;
//          2 = movement resulted in taking the gold;
//          1 = movement made successfully;
//          0 = cannot make the movement because a new location cell of the
//              agent is not empty.

static int
opaque_maze(
    enum direct_e direct,
    unsigned char *percept_p
) {
    static const char *picture[]={
        // 0  1  2  3  4  5  6  7  8
        "+-----+--+--#..#--+--------+",
        "|     |  |        |        |",  // 0
        "+--+  |  |  #--#  |  +  +  |",
        "|     |  |     |  |  |  |  |",  // 1
        "|  +--+  |  +--+  +--+  *  *",
        "|        |     |  |     |  |",  // 2
        "+--+  +--+-----+  +--+  *--*",
        "|     |  |        |        |",  // 3
        "|  +--+  +  +  +  |  +--+  |",
        "|           |  |  |     |  |",  // 4
        "|  +--+  +--+  |  |  +--+  |",
        "|  |     |     |  |     |  |",  // 5
        "+--+  +--+--+  +--+-----+  |",
        "|     |  |        |        |",  // 6
        "+--+  |  |  +--+  +--+  +--+",
        "|     |  |     |           |",  // 7
        "|  +--+  |  +--+  +--+  +--+",
        "|        |     |  |        |",  // 8
        "+--+..+--+-----+--+--------+"
    };
    static char is_gf=0;
    static int path_len, visit=0, xx=-1, yy=-1;
    char ss[128];
    int col, row, result=1;
    if (xx<0 || yy<0) {
        if (visit==NVISIT-1) {
            if (!initscr()) exit(2);
            noecho();
            for (row=0; row<=(MAX_Y+1)*2+1; row++)
                mvaddstr(row,0,picture[row]);
            row=(MAX_Y+1)*2+3;
            mvaddstr(row,0,"Press [Space] to start moving");
            while (getch()!=' ') ;
            sprintf(ss,"%64s","");
            mvaddstr(row,0,ss);
        }
        xx=ENTRY_X;
        yy=ENTRY_Y;
        path_len=0;
    }
    assert(xx>=0 && xx<=MAX_X);
    assert(yy>=0 && yy<=MAX_Y);
    unsigned char percept;
    if (get_percept(picture,xx,yy) & (1 << direct)) {
        col=xx*3+1;
        row=yy*2+1;
        switch (direct) {
            case DIRECT_NORTH: yy--; break;
            case DIRECT_EAST:  xx++; break;
            case DIRECT_SOUTH: yy++; break;
            case DIRECT_WEST:  xx--; break;
            case DIRECT_COUNT:       assert(0);
        }
        percept=get_percept(picture,xx,yy);
        path_len++;
        if (visit==NVISIT-1) mvaddstr(row,col,"  ");
        col=xx*3+1;
        row=yy*2+1;
        if (visit==NVISIT-1) {
            const int picture_w=strlen(picture[0]);
            mvaddstr(row,col,"[]");
            sprintf(ss," Gold found: %d",is_gf);
            mvaddstr(1,picture_w+2,ss);
            sprintf(ss,"Path length: %d",path_len);
            mvaddstr(3,picture_w+2,ss);
            move((MAX_Y+1)*2+3,0);
            refresh();
            usleep(125000);
        }
        if (picture[row-1][col-1]=='*' && picture[row-1][col+2]=='*' &&
            picture[row+1][col-1]=='*' && picture[row+1][col+2]=='*') {
            if (!is_gf) {
                is_gf=1;
                result=2;
            }
        }
        else if (picture[row-1][col-1]=='#' && picture[row-1][col+2]=='#' &&
                 picture[row+1][col-1]=='#' && picture[row+1][col+2]=='#') {
            is_gf=0;
            result=3;
            xx=-1;
            yy=-1;
            if (visit==NVISIT-1) {
                row=(MAX_Y+1)*2+3;
                mvaddstr(row,0,"Press [Q] to exit");
                while (1) {
                    int key=getch();
                    if (key=='q' || key=='Q') break;
                }
                endwin();
            }
            else visit++;
        }
    }
    else {
        percept=get_percept(picture,xx,yy)+16;
        result=0;
    }
    if (is_gf) percept+=32;
    if (percept_p) *percept_p=percept;
    return result;
}


static QSMM_INSTR_META_CLASS(mv) {
    int result=-1;
    enum direct_e direct=0;
    if (QSMM_HAS_INSTR_CLASS(mehcall->evt))
        qsmm_get_mehcall_instr_param_bin(mehcall,sizeof(direct),0,&direct);
    switch (mehcall->evt) {
        case QSMM_EVT_INSTR_CLASS_INIT: {
            const char *ccp=0;
            switch (direct) {
                case DIRECT_NORTH: ccp="north"; break;
                case DIRECT_EAST:  ccp="east";  break;
                case DIRECT_SOUTH: ccp="south"; break;
                case DIRECT_WEST:  ccp="west";  break;
                case DIRECT_COUNT: assert(0);
            }
            assert(ccp);
            qsmm_set_mehcall_instr_param_f(mehcall,"%s",ccp);
            qsmm_set_mehcall_noutcome(mehcall,64);
            break;
        }
        case QSMM_EVT_ACTIVATE: {
            unsigned char percept=0;
            const int rc=opaque_maze(direct,&percept);
            const qsmm_actpair_t actpair=qsmm_get_actpair(qsmm);
            const qsmm_actor_t actor_env=
                qsmm_get_actpair_actor(actpair,QSMM_ENGINE_ENV),
                actor_iee=qsmm_get_actpair_actor(actpair,QSMM_ENGINE_IEE);
            CHK_FAIL(qsmm_actor_time_delta_v2,actor_env,0,0,1);
            CHK_FAIL(qsmm_actor_time_delta_v2,actor_iee,0,0,1);
            switch (rc) {
                case 0:
                case 1:
                    break;
                case 2:
                    is_gold_found=1;
                    n_gold_found++;
                    break;
                case 3:
                    if (is_gold_found) {
                        CHK_FAIL(qsmm_actor_spur_delta,actor_env,1,1);
                        CHK_FAIL(qsmm_actor_spur_delta,actor_iee,0,1);
                    }
                    qsmm_set_mehcall_return(mehcall,1);
                    break;
                default:
                    assert(0);
            }
            if (rc!=3) {
                if (rc) {
                    path_len_pass++;
                    path_len_total++;
                }
                qsmm_set_mehcall_outcome(mehcall,percept);
            }
            break;
        }
    }
    result=0;

Exit:
    return result;
}

// Find a path to gold in the labyrinth and then to its exit possibly
// using a node probability profile specified by an assembler program.

int
main(
    int argc,
    char **argv
) {

    int seed=0, exit_code=1;
    qsmm_t qsmm=0;
    qsmm_msglist_t msglist_dump=0;
    if (argc>2) {
        CHK_FAIL(qsmm_msglist_create,&msglist);
        CHK_FAIL(qsmm_parse_asm_source_file, argv[2],
                 QSMM_PARSE_ASM_PREPROCESS, 0, 0, msglist, &prg_asm);
    }
    struct qsmm_desc_s desc;
    memset(&desc,0,sizeof(desc));
    desc.dont_use_instr_class_weights=1;
    desc.stack_sz_max=1;
    desc.compat=3;
    desc.nnode_max=1;
    struct qsmm_engine_desc_s *const engine_desc_p=desc.engine_desc;
    engine_desc_p[QSMM_ENGINE_ENV].nspur=2;
    engine_desc_p[QSMM_ENGINE_ENV].ntime=1;
    engine_desc_p[QSMM_ENGINE_IEE].nspur=1;
    engine_desc_p[QSMM_ENGINE_IEE].ntime=1;
    struct qsmm_actor_large_desc_s large_desc;
    memset(&large_desc,0,sizeof(large_desc));
    large_desc.tree_arity=2;
    engine_desc_p[QSMM_ENGINE_ENV].large_desc_p=&large_desc;
    engine_desc_p[QSMM_ENGINE_IEE].large_desc_p=&large_desc;
    CHK_FAIL(qsmm_create,&desc,&qsmm);
    QSMM_REG_INSTR_META_CLASS_PARAM(qsmm,mv,0);
    qsmm_reg_instr_class_set(qsmm,"walker",0,0);
    for (enum direct_e direct=0; direct<DIRECT_COUNT; direct++)
        qsmm_reg_instr_class_v2(qsmm, "mv", "walker", 0,
                                sizeof(direct), &direct, 0);
    if (!prg_asm) {
        printf("nstate %d\n",NSTATE);
        qsmm_set_nstate_max(qsmm,"walker",NSTATE);
    }
    qsmm_sig_t node=0;
    qsmm_node_create_v2(qsmm,"walker",0,&node);
    qsmm_engine_create(qsmm);
    if (prg_asm) qsmm_node_asm(qsmm,node,QSMM_EXCEPT_ALL,prg_asm,msglist);
    const qsmm_actor_t actor_env=
        qsmm_get_actpair_actor(qsmm_get_actpair(qsmm),QSMM_ENGINE_ENV);
    CHK_FAIL(qsmm_set_actor_auto_spur_type,actor_env,0);
    CHK_FAIL(qsmm_set_actor_spur_weight,actor_env,0,1);
    CHK_FAIL(qsmm_set_actor_spur_weight,actor_env,1,1);
    if (argc>1 && (seed=atoi(argv[1]))<0) {
        qsmm_set_random(qsmm,1);
        seed=-seed;
    }
    qsmm_rng_seed(qsmm_get_rng(qsmm),seed);
    for (int visit=0; visit<NVISIT; visit++) {
        unsigned char percept=0;
        is_gold_found=0;
        path_len_pass=0;
        opaque_maze(DIRECT_NORTH,&percept);
        qsmm_node_call_default(qsmm,node,0);
        printf(
    "visit %d: ngf %d, path_pass %d, path_total %d, ngf/path_total %.8f\n",
               visit+1, n_gold_found, path_len_pass, path_len_total,
               (double) n_gold_found/path_len_total);
    }
    exit_code=0;

Exit:
    qsmm_prg_destroy(prg_asm);
    qsmm_destroy(qsmm);
    if (msglist) {
        msglist_dump=msglist;
        msglist=0;
        CHK_FAIL(qsmm_msglist_dump,msglist_dump,0,"BUFFER",0,stderr);
    }
    qsmm_msglist_destroy(msglist_dump);
    return exit_code;
}

Sample program output is below.

$ qsmm-example-maze-asm -1
nstate 512
visit 1: ngf 0, path_pass 94, path_total 94, ngf/path_total 0.00000000
visit 2: ngf 0, path_pass 706, path_total 800, ngf/path_total 0.00000000
...
visit 101: ngf 19, path_pass 640, path_total 87374, ngf/path_total 0.00021746
visit 102: ngf 19, path_pass 962, path_total 88336, ngf/path_total 0.00021509
...
visit 199: ngf 37, path_pass 90, path_total 178714, ngf/path_total 0.00020703
visit 200: ngf 37, path_pass 726, path_total 179440, ngf/path_total 0.00020620

$ qsmm-example-maze-asm 1
nstate 512
visit 1: ngf 0, path_pass 632, path_total 632, ngf/path_total 0.00000000
visit 2: ngf 1, path_pass 2252, path_total 2884, ngf/path_total 0.00034674
...
visit 101: ngf 88, path_pass 266, path_total 51398, ngf/path_total 0.00171213
visit 102: ngf 89, path_pass 266, path_total 51664, ngf/path_total 0.00172267
...
visit 199: ngf 186, path_pass 468, path_total 82666, ngf/path_total 0.00225002
visit 200: ngf 187, path_pass 338, path_total 83004, ngf/path_total 0.00225290

The above pieces of output show that the average amount of gold found per one move in the labyrinth in adaptive mode of program operation with random seed 1 without using the profile assembler program is about 11 times greater than in random mode of program operation.

$ qsmm-example-maze-asm -1 "$prefix/share/qsmm/samples/maze.asm"
visit 1: ngf 1, path_pass 3086, path_total 3086, ngf/path_total 0.00032404
visit 2: ngf 1, path_pass 274, path_total 3360, ngf/path_total 0.00029762
...
visit 101: ngf 17, path_pass 768, path_total 92636, ngf/path_total 0.00018351
visit 102: ngf 17, path_pass 898, path_total 93534, ngf/path_total 0.00018175
...
visit 199: ngf 43, path_pass 132, path_total 192766, ngf/path_total 0.00022307
visit 200: ngf 43, path_pass 118, path_total 192884, ngf/path_total 0.00022293

$ qsmm-example-maze-asm 1 "$prefix/share/qsmm/samples/maze.asm"
visit 1: ngf 1, path_pass 3086, path_total 3086, ngf/path_total 0.00032404
visit 2: ngf 2, path_pass 52, path_total 3138, ngf/path_total 0.00063735
...
visit 101: ngf 101, path_pass 52, path_total 8286, ngf/path_total 0.01218923
visit 102: ngf 102, path_pass 52, path_total 8338, ngf/path_total 0.01223315
...
visit 199: ngf 199, path_pass 52, path_total 13382, ngf/path_total 0.01487072
visit 200: ngf 200, path_pass 52, path_total 13434, ngf/path_total 0.01488760

The above pieces of output show that the average amount of gold found per one move in the labyrinth in adaptive mode of program operation with random seed 1 and the use of the profile assembler program is about 67 times greater than in random mode of program operation.