Previous: Using the Assembler Preprocessor, Up: Assembler Programs [Contents][Index]
In the example of working with an assembler program, an agent controlled by the sample program has to find a path to the gold in a labyrinth and then to an exit from the labyrinth. As in the sample program samples/labyr2.c described in Example of Using the Actor API, the C source code contains the picture of this labyrinth encoded using a subset of ASCII characters resembling pseudographics. You can change that picture to test agent behavior for various labyrinth configurations.
To learn a labyrinth configuration, the agent visits the labyrinth the number of times defined by the macro NVISIT
.
Moving the agent to a cell designated as a labyrinth exit finishes a labyrinth visit.
The picture of a labyrinth cell is in Figure 2.8.
See the description of a labyrinth cell and the macros MAX_X
, MAX_Y
, ENTRY_X
and ENTRY_Y
around that figure.
The agent does not know its precise location in the labyrinth in the default mode of program operation—the agent receives a limited amount of information about a current location. This information includes an indication whether a current cell is an exit from the labyrinth and, if the current cell is not an exit from the labyrinth, the union of the following pieces of information:
The macro NSTATE
specifies the number of tracked environment states in the default mode of program operation.
Defining that macro to a lesser value may result in infinite looping.
Just after its invocation, the sample program prints the number of tracked environment states. After every labyrinth visit, the sample program prints the following information:
In the default mode of program operation, the agent visits the labyrinth a number of times without taking the gold until at some point the agent usually “understands” how to take the gold and then finds the gold at almost every labyrinth visit.
At last labyrinth visit, the sample program switches the terminal to full-screen mode and shows a labyrinth picture. After pressing SPACE, the sample program shows movements of the agent in the labyrinth and prints the current length of visiting path and an indication whether or not the agent took the gold. After the agent finds an exit from the labyrinth, pressing Q turns off full-screen terminal mode and quits the sample program.
You can specify a random seed by the first program argument. If the random seed is non-negative, the agent operates adaptively (normally). If the random seed is negative, the agent operates randomly. You can compare agent behavior and program output for these two modes of program execution.
The special mode of program operation is the use of a profile assembler program helping the agent to do its job more efficiently. The second program argument turns on that special mode and specifies the name of a file with the profile assembler program.
Note: when using a custom profile assembler program, you may need to increase the sizes of the pools of probabilities lists in normal form, that is, increase values assigned to
desc.profile_pool_env_sz
anddesc.profile_pool_opt_sz
in the functionmain
.
The file samples/maze.asm in the package distribution contains the source text of a profile assembler program for use with labyrinths that have up to 10 cells in width and height. This assembler program actually provides tracking a precise location of the agent in the labyrinth—the agent finds the gold at almost all labyrinth visits. The final visiting path shown in full-screen terminal mode does not have vacillations taking place in the default mode of program operation.
The content of that file is below.
; RELOCate. ; ; Analyze the outcome of an `mv' instruction for performing the ; next move.
; ; l_repeat - jump label for repeating an attempt to perform the ; move if the `mv' instruction moved the ; agent to an obstacle. ; ; new_row - new row index for the agent if the movement ; was successful. ; new_col - new column index for the agent if the movement ; was successful. reloc macro l_repeat, new_row, new_col l_g0 def g0_r##new_row##_c##new_col l_g1 def g1_r##new_row##_c##new_col joe 0, l_g0 joe 1, l_g0 joe 2, l_g0 joe 3, l_g0 joe 4, l_g0 joe 5, l_g0 joe 6, l_g0 joe 7, l_g0 joe 8, l_g0 joe 9, l_g0 joe 10, l_g0 joe 11, l_g0 joe 12, l_g0 joe 13, l_g0 joe 14, l_g0 joe 15, l_g0 joe 32, l_g1 joe 33, l_g1 joe 34, l_g1 joe 35, l_g1 joe 36, l_g1 joe 37, l_g1 joe 38, l_g1 joe 39, l_g1 joe 40, l_g1 joe 41, l_g1 joe 42, l_g1 joe 43, l_g1 joe 44, l_g1 joe 45, l_g1 joe 46, l_g1 joe 47, l_g1 jmp l_repeat end macro ; Select a movement direction and analyze a movement outcome. ; ; is_gf - indication whether the agent has found gold: ; 1 = has found; ; 0 = not yet found. ; ; row - current row index for the agent. ; col - current column index for the agent. ; ; row_north - new row index for the agent if it moves one step in ; the north direction. ; ; row_south - new row index for the agent if it moves one step in ; the south direction. ; ; col_west - new column index for the agent if it moves one step ; in the west direction. ; ; col_east - new column index for the agent if it moves one step ; in the east direction. state macro is_gf, row, col, row_north, row_south, col_west, col_east repeat def u##__UNIQUE__ m_north def u##__UNIQUE__ m_east def u##__UNIQUE__ m_south def u##__UNIQUE__ g##is_gf##_r##row##_c##col: repeat: stt choice case 0.25, m_north case 0.25, m_east case 0.25, m_south end choice mv west reloc repeat, row, col_west m_south: mv south reloc repeat, row_south, col m_east: mv east reloc repeat, row, col_east m_north: mv north reloc repeat, row_north, col end macro ; STates for a ROW. ; ; Movement selection states for a specific agent location row. ; ; is_gf - indication whether the agent has found gold: ; 1 = has found; ; 0 = not yet found. ; ; row - current row index for the agent. ; ; row_north - new row index for the agent if it moves one step in ; the north direction. ; ; row_south - new row index for the agent if it moves one step in ; the south direction. strow macro is_gf, row, row_north, row_south state is_gf, row, 0, row_north, row_south, 9, 1 state is_gf, row, 1, row_north, row_south, 0, 2 state is_gf, row, 2, row_north, row_south, 1, 3 state is_gf, row, 3, row_north, row_south, 2, 4 state is_gf, row, 4, row_north, row_south, 3, 5 state is_gf, row, 5, row_north, row_south, 4, 6 state is_gf, row, 6, row_north, row_south, 5, 7 state is_gf, row, 7, row_north, row_south, 6, 8 state is_gf, row, 8, row_north, row_south, 7, 9 state is_gf, row, 9, row_north, row_south, 8, 0 end macro ; STates for all ROWS. ; ; Movement selection states for all agent location rows. ; ; is_gf - indication whether the agent has found gold: ; 1 = has found; ; 0 = not yet found. strows macro is_gf strow is_gf, 0, 9, 1 strow is_gf, 1, 0, 2 strow is_gf, 2, 1, 3 strow is_gf, 3, 2, 4 strow is_gf, 4, 3, 5 strow is_gf, 5, 4, 6 strow is_gf, 6, 5, 7 strow is_gf, 7, 6, 8 strow is_gf, 8, 7, 9 strow is_gf, 9, 8, 0 end macro ; Generate movement selection states for the case when the agent ; has not yet found gold and the case when the agent has ; found the gold. strows 0 strows 1
The file samples/maze_asm.c in the package distribution provides the source code of the sample program.
A copy of that source code is below.
The command make
builds the sample program if the configure script has configured QSMM to use the ncurses
library.
See the file INSTALL in the root of the package distribution for information on the configure script.
#include <assert.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #if defined(HAVE_CURSES_H) # include <curses.h> #elif defined(HAVE_NCURSES_CURSES_H) # include <ncurses/curses.h> #endif #include <qsmm/qsmm.h> #define NVISIT 200 #define NSTATE 512 #define MAX_X 8 #define MAX_Y 8 #define ENTRY_X 1 #define ENTRY_Y 8 #define CHK_FAIL(func, ...) \ do { \ int rc=func(__VA_ARGS__); \ if (rc<0) ERREXIT( #func ": %s", qsmm_err_str(rc)); \ } \ while (0) #define ERREXIT(fmt, ...) \ do { \ fprintf(stderr,fmt "\n", ## __VA_ARGS__); \ goto Exit; \ } \ while (0) enum direct_e { DIRECT_NORTH=0, // move one step up DIRECT_EAST =1, // move one step right DIRECT_SOUTH=2, // move one step down DIRECT_WEST =3 // move one step left }; static char is_gold_found; static int path_len_pass, n_gold_found=0, path_len_total=0; static qsmm_prg_t prg_asm=0; static qsmm_msglist_t msglist=0; // Get the bitmask of allowed movement directions. static unsigned char get_percept( const char **picture_pp, int xx, int yy ) { unsigned char result=0; const int col=xx*3+1, row=yy*2+1; assert(picture_pp[row][col]==' ' && picture_pp[row][col+1]==' '); assert((picture_pp[row-1][col]==' ' && picture_pp[row-1][col+1]==' ') || (picture_pp[row-1][col]!=' ' && picture_pp[row-1][col+1]!=' ')); assert((picture_pp[row+1][col]==' ' && picture_pp[row+1][col+1]==' ') || (picture_pp[row+1][col]!=' ' && picture_pp[row+1][col+1]!=' ')); if (picture_pp[row-1][col]==' ' && picture_pp[row-1][col+1]==' ') result|=1 << DIRECT_NORTH; if (picture_pp[row][col+2]==' ') result|=1 << DIRECT_EAST; if (picture_pp[row+1][col]==' ' && picture_pp[row+1][col+1]==' ') result|=1 << DIRECT_SOUTH; if (picture_pp[row][col-1]==' ') result|=1 << DIRECT_WEST; return result; } // Make a movement in the labyrinth in a specified direction possibly // showing the movement on the screen.
// Returns: 3 = movement resulted in exiting the labyrinth; // 2 = movement resulted in taking the gold; // 1 = movement made successfully; // 0 = cannot make the movement because a new location cell of the // agent is not empty.
static int opaque_maze( enum direct_e direct, unsigned char *percept_p ) { static const char *picture[]={ // 0 1 2 3 4 5 6 7 8 "+-----+--+--#..#--+--------+", "| | | | |", // 0 "+--+ | | #--# | + + |", "| | | | | | | |", // 1 "| +--+ | +--+ +--+ * *", "| | | | | |", // 2 "+--+ +--+-----+ +--+ *--*", "| | | | |", // 3 "| +--+ + + + | +--+ |", "| | | | | |", // 4 "| +--+ +--+ | | +--+ |", "| | | | | | |", // 5 "+--+ +--+--+ +--+-----+ |", "| | | | |", // 6 "+--+ + | +--+ +--+ +--+", "| | | | |", // 7 "| +--+ | +--+ +--+ +--+", "| | | | |", // 8 "+--+..+--+-----+--+--------+" }; static char is_gf=0; static int path_len, visit=0, xx=-1, yy=-1; char ss[128]; int col, row, result=1; if (xx<0 || yy<0) { if (visit==NVISIT-1) { if (!initscr()) exit(2); noecho(); for (row=0; row<=(MAX_Y+1)*2+1; row++) mvaddstr(row,0,picture[row]); row=(MAX_Y+1)*2+3; mvaddstr(row,0,"Press [Space] to start moving"); while (getch()!=' ') ; sprintf(ss,"%64s",""); mvaddstr(row,0,ss); } xx=ENTRY_X; yy=ENTRY_Y; path_len=0; } assert(xx>=0 && xx<=MAX_X); assert(yy>=0 && yy<=MAX_Y); unsigned char percept; if (get_percept(picture,xx,yy) & (1 << direct)) { col=xx*3+1; row=yy*2+1; switch (direct) { case DIRECT_NORTH: yy--; break; case DIRECT_EAST: xx++; break; case DIRECT_SOUTH: yy++; break; case DIRECT_WEST: xx--; break; default: assert(0); } percept=get_percept(picture,xx,yy); path_len++; if (visit==NVISIT-1) mvaddstr(row,col," "); col=xx*3+1; row=yy*2+1; if (visit==NVISIT-1) { int picture_w=strlen(picture[0]); mvaddstr(row,col,"[]"); sprintf(ss," Gold found: %d",is_gf); mvaddstr(1,picture_w+2,ss); sprintf(ss,"Path length: %d",path_len); mvaddstr(3,picture_w+2,ss); move((MAX_Y+1)*2+3,0); refresh(); usleep(125000); } if (picture[row-1][col-1]=='*' && picture[row-1][col+2]=='*' && picture[row+1][col-1]=='*' && picture[row+1][col+2]=='*') { if (!is_gf) { is_gf=1; result=2; } } else if (picture[row-1][col-1]=='#' && picture[row-1][col+2]=='#' && picture[row+1][col-1]=='#' && picture[row+1][col+2]=='#') { is_gf=0; result=3; xx=-1; yy=-1; if (visit==NVISIT-1) { row=(MAX_Y+1)*2+3; mvaddstr(row,0,"Press [Q] to exit"); while (1) { int key=getch(); if (key=='q' || key=='Q') break; } endwin(); } else visit++; } } else { percept=get_percept(picture,xx,yy)+16; result=0; } if (is_gf) percept+=32; if (percept_p) *percept_p=percept; return result; } static QSMM_INSTR_META_CLASS(mv) { enum direct_e direct=0; if (QSMM_HAS_INSTR_CLASS(qsmm_evt)) qsmm_get_eh_instr_param(qsmm,sizeof(direct),&direct); switch (qsmm_evt) { case QSMM_EVT_INSTR_CLASS_INIT: { const char *ccp; switch (direct) { case DIRECT_NORTH: ccp="north"; break; case DIRECT_EAST: ccp="east"; break; case DIRECT_SOUTH: ccp="south"; break; case DIRECT_WEST: ccp="west"; break; default: assert(0); } qsmm_set_eh_instr_param_str_f(qsmm,"%s",ccp); qsmm_set_eh_noutcome(qsmm,64); break; } case QSMM_EVT_ACTIVATE: { unsigned char percept=0; const int rc=opaque_maze(direct,&percept); qsmm_time_delta(qsmm,1); switch (rc) { case 0: case 1: break; case 2: is_gold_found=1; n_gold_found++; break; case 3: if (is_gold_found) qsmm_spur_delta(qsmm,1,1); qsmm_return_to_caller_node(qsmm); break; default: assert(0); } if (rc!=3) { if (rc) { path_len_pass++; path_len_total++; } qsmm_set_instr_outcome(qsmm,percept); } break; } } return 0; } static QSMM_INSTR_CLASS_SET(walker) { switch (qsmm_evt) { case QSMM_EVT_ENT_INIT: { enum direct_e direct; direct=DIRECT_NORTH, QSMM_REG_INSTR_CLASS_PARAM(mv,direct); direct=DIRECT_EAST, QSMM_REG_INSTR_CLASS_PARAM(mv,direct); direct=DIRECT_SOUTH, QSMM_REG_INSTR_CLASS_PARAM(mv,direct); direct=DIRECT_WEST, QSMM_REG_INSTR_CLASS_PARAM(mv,direct); qsmm_sig_t nstate=0; if (prg_asm) qsmm_get_prg_nstate_v2(qsmm, __FUNCTION__, 0, QSMM_EXCEPT_ALL, prg_asm, 0, &nstate); else nstate=NSTATE; printf("nstate %d\n",nstate); qsmm_set_nstate_max(qsmm,__FUNCTION__,nstate); QSMM_NODE_CREATE(0); break; } case QSMM_EVT_ENGINE_INIT: if (prg_asm) qsmm_node_asm(qsmm,0,0,prg_asm,msglist); break; case QSMM_EVT_NODE_ENTER: { unsigned char percept=0; is_gold_found=0; opaque_maze(DIRECT_NORTH,&percept); break; } } return 0; } // Find a path to gold in the labyrinth and then to its exit possibly // using a node probability profile specified by an assembler program. int main( int argc, char **argv ) { int seed=0, exit_code=1; qsmm_t qsmm=0; qsmm_msglist_t msglist_dump=0; if (argc>2) { CHK_FAIL(qsmm_msglist_create,&msglist); CHK_FAIL(qsmm_parse_asm_source_file, argv[2], QSMM_PARSE_ASM_PREPROCESS, 0, 0, msglist, &prg_asm); } struct qsmm_desc_s desc; memset(&desc,0,sizeof(desc)); desc.dont_use_instr_class_weights=1; desc.is_large_env=1; desc.is_large_opt=1; desc.nspur=2; desc.stack_sz_max=1; desc.profile_pool_env_sz=2; desc.profile_pool_opt_sz=2; desc.compat=1; desc.sparse_fill_max=0.2; CHK_FAIL(qsmm_create,&desc,&qsmm); QSMM_REG_INSTR_META_CLASS(qsmm,mv,0); QSMM_REG_INSTR_CLASS_SET(qsmm,walker,0); qsmm_engine_create(qsmm); if (argc>1 && (seed=atoi(argv[1]))<0) { qsmm_set_random(qsmm,1); seed=-seed; } qsmm_rng_seed(qsmm_get_rng(qsmm),seed); for (int visit=0; visit<NVISIT; visit++) { path_len_pass=0; qsmm_node_call_default(qsmm,0,0); printf( "visit %d: ngf %d, path_pass %d, path_total %d, ngf/path_total %.8f\n", visit+1, n_gold_found, path_len_pass, path_len_total, (double) n_gold_found/path_len_total); } exit_code=0; Exit: qsmm_prg_destroy(prg_asm); qsmm_destroy(qsmm); if (msglist) { msglist_dump=msglist; msglist=0; CHK_FAIL(qsmm_msglist_dump,msglist_dump,0,"BUFFER",0,stderr); } qsmm_msglist_destroy(msglist_dump); return exit_code; }
Sample program output is below.
$ ./maze-asm -1 nstate 512 visit 1: ngf 0, path_pass 94, path_total 94, ngf/path_total 0.00000000 visit 2: ngf 0, path_pass 706, path_total 800, ngf/path_total 0.00000000 ... visit 101: ngf 19, path_pass 640, path_total 87374, ngf/path_total 0.00021746 visit 102: ngf 19, path_pass 962, path_total 88336, ngf/path_total 0.00021509 ... visit 199: ngf 37, path_pass 90, path_total 178714, ngf/path_total 0.00020703 visit 200: ngf 37, path_pass 726, path_total 179440, ngf/path_total 0.00020620
$ ./maze-asm 1 nstate 512 visit 1: ngf 0, path_pass 632, path_total 632, ngf/path_total 0.00000000 visit 2: ngf 1, path_pass 2712, path_total 3344, ngf/path_total 0.00029904 ... visit 101: ngf 84, path_pass 570, path_total 115834, ngf/path_total 0.00072518 visit 102: ngf 85, path_pass 290, path_total 116124, ngf/path_total 0.00073198 ... visit 199: ngf 182, path_pass 370, path_total 150890, ngf/path_total 0.00120618 visit 200: ngf 183, path_pass 360, path_total 151250, ngf/path_total 0.00120992
The above pieces of output show that the average amount of gold found per one move in the labyrinth in the adaptive mode of program operation with random seed 1 and without using the profile assembler program is about 6 times greater than in the random mode of program operation.
$ ./maze-asm -1 maze.asm nstate 200 visit 1: ngf 1, path_pass 3086, path_total 3086, ngf/path_total 0.00032404 visit 2: ngf 1, path_pass 274, path_total 3360, ngf/path_total 0.00029762 ... visit 100: ngf 17, path_pass 430, path_total 91868, ngf/path_total 0.00018505 visit 101: ngf 17, path_pass 768, path_total 92636, ngf/path_total 0.00018351 ... visit 199: ngf 43, path_pass 132, path_total 192766, ngf/path_total 0.00022307 visit 200: ngf 43, path_pass 118, path_total 192884, ngf/path_total 0.00022293
./maze-asm 1 maze.asm nstate 200 visit 1: ngf 1, path_pass 3086, path_total 3086, ngf/path_total 0.00032404 visit 2: ngf 2, path_pass 52, path_total 3138, ngf/path_total 0.00063735 ... visit 100: ngf 100, path_pass 52, path_total 8234, ngf/path_total 0.01214477 visit 101: ngf 101, path_pass 52, path_total 8286, ngf/path_total 0.01218923 ... visit 199: ngf 199, path_pass 52, path_total 13382, ngf/path_total 0.01487072 visit 200: ngf 200, path_pass 52, path_total 13434, ngf/path_total 0.01488760
The above pieces of output show that the average amount of gold found per one move in the labyrinth in the adaptive mode of program operation with random seed 1 and the use of the profile assembler program is about 67 times greater than in the random mode of program operation.
Previous: Using the Assembler Preprocessor, Up: Assembler Programs [Contents][Index]