In the example of working with an assembler program, an agent controlled by the sample program has to find a path to the gold in a labyrinth and then to an exit from the labyrinth. As in the sample program labyr2.c described in Example of Using the Actor API, the C source code contains the picture of a labyrinth encoded using a subset of ASCII characters resembling pseudographics. You can change that picture to test agent behavior for various labyrinth configurations.
To learn a labyrinth configuration, the agent visits the labyrinth the number of times defined by the macro NVISIT.
Moving the agent to a cell designated as a labyrinth exit finishes a labyrinth visit.
The picture of a labyrinth cell is in Figure 2.8.
See the description of a labyrinth cell and the macros MAX_X, MAX_Y, ENTRY_X and ENTRY_Y around that figure.
The agent does not know its precise location in the labyrinth in default mode of program operation—the agent receives a limited amount of information about a current location. This information includes an indication whether a current cell is an exit from the labyrinth and, if the current cell is not an exit from the labyrinth, the union of the following pieces of information:
The macro NSTATE specifies the number of tracked environment states in default mode of program operation.
The sample program prints that number just after its invocation.
Defining that macro to a lesser value may result in infinite looping.
After every labyrinth visit, the sample program prints the following information:
In default mode of program operation, the agent visits the labyrinth a number of times without taking the gold until at some point the agent usually “understands” how to take the gold and then finds the gold at almost every labyrinth visit.
At the last labyrinth visit, the sample program switches the terminal to full-screen mode and shows a labyrinth picture. After pressing SPACE, the sample program shows the movements of the agent in the labyrinth and prints current length of visiting path and an indication whether or not the agent took the gold. After the agent finds an exit from the labyrinth, pressing Q turns off full-screen terminal mode and quits the sample program.
You can specify a random seed by the first program argument. If the random seed is non-negative, the agent operates adaptively (normally). If the random seed is negative, the agent operates randomly. You can compare agent behavior and program output for the two modes of program operation.
Additional mode of program operation is the use of a profile assembler program helping the agent to do its job more efficiently. The second program argument turns on the additional mode by passing the name of a file with the profile assembler program.
The file samples/maze.asm in the package distribution installed to $prefix/share/qsmm/samples/maze.asm contains the source text of a profile assembler program for use with labyrinths that have up to 10 cells in width and height. The assembler program actually provides tracking a precise location of the agent in the labyrinth—the agent finds the gold at almost all labyrinth visits. The final visiting path shown in full-screen terminal mode does not have vacillations taking place in default mode of program operation.
The content of that file is below.
; RELOCate.
;
; Analyze the outcome of an `mv' instruction for performing the
; next move.
;
; l_repeat - jump label for repeating an attempt to perform the
; move if the `mv' instruction moved the
; agent to an obstacle.
;
; new_row - new row index for the agent if the movement
; was successful.
; new_col - new column index for the agent if the movement
; was successful.
reloc macro l_repeat, new_row, new_col
l_g0 def g0_r##new_row##_c##new_col
l_g1 def g1_r##new_row##_c##new_col
joe 0, l_g0
joe 1, l_g0
joe 2, l_g0
joe 3, l_g0
joe 4, l_g0
joe 5, l_g0
joe 6, l_g0
joe 7, l_g0
joe 8, l_g0
joe 9, l_g0
joe 10, l_g0
joe 11, l_g0
joe 12, l_g0
joe 13, l_g0
joe 14, l_g0
joe 15, l_g0
joe 32, l_g1
joe 33, l_g1
joe 34, l_g1
joe 35, l_g1
joe 36, l_g1
joe 37, l_g1
joe 38, l_g1
joe 39, l_g1
joe 40, l_g1
joe 41, l_g1
joe 42, l_g1
joe 43, l_g1
joe 44, l_g1
joe 45, l_g1
joe 46, l_g1
joe 47, l_g1
jmp l_repeat
end macro
; Select a movement direction and analyze a movement outcome.
;
; is_gf - indication whether the agent has found gold:
; 1 = has found;
; 0 = not yet found.
;
; row - current row index for the agent.
; col - current column index for the agent.
;
; row_north - new row index for the agent if it moves one step in
; the north direction.
;
; row_south - new row index for the agent if it moves one step in
; the south direction.
;
; col_west - new column index for the agent if it moves one step
; in the west direction.
;
; col_east - new column index for the agent if it moves one step
; in the east direction.
state macro is_gf, row, col,
row_north, row_south, col_west, col_east
repeat def u##__UNIQUE__
m_north def u##__UNIQUE__
m_east def u##__UNIQUE__
m_south def u##__UNIQUE__
g##is_gf##_r##row##_c##col:
repeat: stt
choice
case 0.25, m_north
case 0.25, m_east
case 0.25, m_south
end choice
mv west
reloc repeat, row, col_west
m_south:
mv south
reloc repeat, row_south, col
m_east: mv east
reloc repeat, row, col_east
m_north:
mv north
reloc repeat, row_north, col
end macro
; STates for a ROW.
;
; Movement selection states for a specific agent location row.
;
; is_gf - indication whether the agent has found gold:
; 1 = has found;
; 0 = not yet found.
;
; row - current row index for the agent.
;
; row_north - new row index for the agent if it moves one step in
; the north direction.
;
; row_south - new row index for the agent if it moves one step in
; the south direction.
strow macro is_gf, row, row_north, row_south
state is_gf, row, 0, row_north, row_south, 9, 1
state is_gf, row, 1, row_north, row_south, 0, 2
state is_gf, row, 2, row_north, row_south, 1, 3
state is_gf, row, 3, row_north, row_south, 2, 4
state is_gf, row, 4, row_north, row_south, 3, 5
state is_gf, row, 5, row_north, row_south, 4, 6
state is_gf, row, 6, row_north, row_south, 5, 7
state is_gf, row, 7, row_north, row_south, 6, 8
state is_gf, row, 8, row_north, row_south, 7, 9
state is_gf, row, 9, row_north, row_south, 8, 0
end macro
; STates for all ROWS.
;
; Movement selection states for all agent location rows.
;
; is_gf - indication whether the agent has found gold:
; 1 = has found;
; 0 = not yet found.
strows macro is_gf
strow is_gf, 0, 9, 1
strow is_gf, 1, 0, 2
strow is_gf, 2, 1, 3
strow is_gf, 3, 2, 4
strow is_gf, 4, 3, 5
strow is_gf, 5, 4, 6
strow is_gf, 6, 5, 7
strow is_gf, 7, 6, 8
strow is_gf, 8, 7, 9
strow is_gf, 9, 8, 0
end macro
; Generate movement selection states for the case when the agent
; has not yet found gold and the case when the agent has
; found the gold.
strows 0
strows 1
The file samples/maze_asm.c in the package distribution installed to $prefix/share/qsmm/samples/maze_asm.c contains the source code of the sample program.
A copy of that source code is below.
The command make builds the sample program if the configure script has configured QSMM to use the ncurses library.
The command make install installs the binary of the sample program and names the installed binary as ‘qsmm-example-maze-asm’ if the configure script has configured QSMM to install example programs.
See the file INSTALL in the root of the package distribution for information on the configure script.
#include <config.h> #include <assert.h> #include <stdlib.h> #include <string.h> #include <unistd.h>
#if defined(HAVE_CURSES_H) # include <curses.h> #elif defined(HAVE_NCURSES_CURSES_H) # include <ncurses/curses.h> #endif
#include <qsmm/qsmm.h>
#define NVISIT 200
#define NSTATE 512
#define MAX_X 8
#define MAX_Y 8
#define ENTRY_X 1
#define ENTRY_Y 8
#define CHK_FAIL(func, ...) \
do { \
const int rc=func(__VA_ARGS__); \
if (rc<0) ERREXIT( #func ": %s", qsmm_err_str(rc)); \
} \
while (0)
#define ERREXIT(fmt, ...) \
do { \
fprintf(stderr,fmt "\n", ## __VA_ARGS__); \
goto Exit; \
} \
while (0)
enum direct_e {
DIRECT_NORTH=0, // move one step up
DIRECT_EAST =1, // move one step right
DIRECT_SOUTH=2, // move one step down
DIRECT_WEST =3, // move one step left
DIRECT_COUNT // number of movement directions
};
static char is_gold_found;
static int path_len_pass, n_gold_found=0, path_len_total=0;
static qsmm_prg_t prg_asm=0;
static qsmm_msglist_t msglist=0;
// Get the bitmask of allowed movement directions.
static unsigned char
get_percept(
const char **picture_pp,
int xx,
int yy
) {
unsigned char result=0;
const int col=xx*3+1, row=yy*2+1;
assert(picture_pp[row][col]==' ' && picture_pp[row][col+1]==' ');
assert((picture_pp[row-1][col]==' ' &&
picture_pp[row-1][col+1]==' ') ||
(picture_pp[row-1][col]!=' ' &&
picture_pp[row-1][col+1]!=' '));
assert((picture_pp[row+1][col]==' ' &&
picture_pp[row+1][col+1]==' ') ||
(picture_pp[row+1][col]!=' ' &&
picture_pp[row+1][col+1]!=' '));
if (picture_pp[row-1][col]==' ' && picture_pp[row-1][col+1]==' ')
result|=1 << DIRECT_NORTH;
if (picture_pp[row][col+2]==' ') result|=1 << DIRECT_EAST;
if (picture_pp[row+1][col]==' ' && picture_pp[row+1][col+1]==' ')
result|=1 << DIRECT_SOUTH;
if (picture_pp[row][col-1]==' ') result|=1 << DIRECT_WEST;
return result;
}
// Make a movement in the labyrinth in a specified direction possibly
// showing the movement on the screen.
// Returns: 3 = movement resulted in exiting the labyrinth;
// 2 = movement resulted in taking the gold;
// 1 = movement made successfully;
// 0 = cannot make the movement because a new location cell of the
// agent is not empty.
static int
opaque_maze(
enum direct_e direct,
unsigned char *percept_p
) {
static const char *picture[]={
// 0 1 2 3 4 5 6 7 8
"+-----+--+--#..#--+--------+",
"| | | | |", // 0
"+--+ | | #--# | + + |",
"| | | | | | | |", // 1
"| +--+ | +--+ +--+ * *",
"| | | | | |", // 2
"+--+ +--+-----+ +--+ *--*",
"| | | | |", // 3
"| +--+ + + + | +--+ |",
"| | | | | |", // 4
"| +--+ +--+ | | +--+ |",
"| | | | | | |", // 5
"+--+ +--+--+ +--+-----+ |",
"| | | | |", // 6
"+--+ | | +--+ +--+ +--+",
"| | | | |", // 7
"| +--+ | +--+ +--+ +--+",
"| | | | |", // 8
"+--+..+--+-----+--+--------+"
};
static char is_gf=0;
static int path_len, visit=0, xx=-1, yy=-1;
char ss[128];
int col, row, result=1;
if (xx<0 || yy<0) {
if (visit==NVISIT-1) {
if (!initscr()) exit(2);
noecho();
for (row=0; row<=(MAX_Y+1)*2+1; row++)
mvaddstr(row,0,picture[row]);
row=(MAX_Y+1)*2+3;
mvaddstr(row,0,"Press [Space] to start moving");
while (getch()!=' ') ;
sprintf(ss,"%64s","");
mvaddstr(row,0,ss);
}
xx=ENTRY_X;
yy=ENTRY_Y;
path_len=0;
}
assert(xx>=0 && xx<=MAX_X);
assert(yy>=0 && yy<=MAX_Y);
unsigned char percept;
if (get_percept(picture,xx,yy) & (1 << direct)) {
col=xx*3+1;
row=yy*2+1;
switch (direct) {
case DIRECT_NORTH: yy--; break;
case DIRECT_EAST: xx++; break;
case DIRECT_SOUTH: yy++; break;
case DIRECT_WEST: xx--; break;
case DIRECT_COUNT: assert(0);
}
percept=get_percept(picture,xx,yy);
path_len++;
if (visit==NVISIT-1) mvaddstr(row,col," ");
col=xx*3+1;
row=yy*2+1;
if (visit==NVISIT-1) {
const int picture_w=strlen(picture[0]);
mvaddstr(row,col,"[]");
sprintf(ss," Gold found: %d",is_gf);
mvaddstr(1,picture_w+2,ss);
sprintf(ss,"Path length: %d",path_len);
mvaddstr(3,picture_w+2,ss);
move((MAX_Y+1)*2+3,0);
refresh();
usleep(125000);
}
if (picture[row-1][col-1]=='*' && picture[row-1][col+2]=='*' &&
picture[row+1][col-1]=='*' && picture[row+1][col+2]=='*') {
if (!is_gf) {
is_gf=1;
result=2;
}
}
else if (picture[row-1][col-1]=='#' && picture[row-1][col+2]=='#' &&
picture[row+1][col-1]=='#' && picture[row+1][col+2]=='#') {
is_gf=0;
result=3;
xx=-1;
yy=-1;
if (visit==NVISIT-1) {
row=(MAX_Y+1)*2+3;
mvaddstr(row,0,"Press [Q] to exit");
while (1) {
int key=getch();
if (key=='q' || key=='Q') break;
}
endwin();
}
else visit++;
}
}
else {
percept=get_percept(picture,xx,yy)+16;
result=0;
}
if (is_gf) percept+=32;
if (percept_p) *percept_p=percept;
return result;
}
static QSMM_INSTR_META_CLASS(mv) {
int result=-1;
enum direct_e direct=0;
if (QSMM_HAS_INSTR_CLASS(mehcall->evt))
qsmm_get_mehcall_instr_param_bin(mehcall,sizeof(direct),0,&direct);
switch (mehcall->evt) {
case QSMM_EVT_INSTR_CLASS_INIT: {
const char *ccp=0;
switch (direct) {
case DIRECT_NORTH: ccp="north"; break;
case DIRECT_EAST: ccp="east"; break;
case DIRECT_SOUTH: ccp="south"; break;
case DIRECT_WEST: ccp="west"; break;
case DIRECT_COUNT: assert(0);
}
assert(ccp);
qsmm_set_mehcall_instr_param_f(mehcall,"%s",ccp);
qsmm_set_mehcall_noutcome(mehcall,64);
break;
}
case QSMM_EVT_ACTIVATE: {
unsigned char percept=0;
const int rc=opaque_maze(direct,&percept);
const qsmm_actpair_t actpair=qsmm_get_actpair(qsmm);
const qsmm_actor_t actor_env=
qsmm_get_actpair_actor(actpair,QSMM_ENGINE_ENV),
actor_iee=qsmm_get_actpair_actor(actpair,QSMM_ENGINE_IEE);
CHK_FAIL(qsmm_actor_time_delta_v2,actor_env,0,0,1);
CHK_FAIL(qsmm_actor_time_delta_v2,actor_iee,0,0,1);
switch (rc) {
case 0:
case 1:
break;
case 2:
is_gold_found=1;
n_gold_found++;
break;
case 3:
if (is_gold_found) {
CHK_FAIL(qsmm_actor_spur_delta,actor_env,1,1);
CHK_FAIL(qsmm_actor_spur_delta,actor_iee,0,1);
}
qsmm_set_mehcall_return(mehcall,1);
break;
default:
assert(0);
}
if (rc!=3) {
if (rc) {
path_len_pass++;
path_len_total++;
}
qsmm_set_mehcall_outcome(mehcall,percept);
}
break;
}
}
result=0;
Exit:
return result;
}
// Find a path to gold in the labyrinth and then to its exit possibly
// using a node probability profile specified by an assembler program.
int
main(
int argc,
char **argv
) {
int seed=0, exit_code=1;
qsmm_t qsmm=0;
qsmm_msglist_t msglist_dump=0;
if (argc>2) {
CHK_FAIL(qsmm_msglist_create,&msglist);
CHK_FAIL(qsmm_parse_asm_source_file, argv[2],
QSMM_PARSE_ASM_PREPROCESS, 0, 0, msglist, &prg_asm);
}
struct qsmm_desc_s desc;
memset(&desc,0,sizeof(desc));
desc.dont_use_instr_class_weights=1;
desc.stack_sz_max=1;
desc.compat=3;
desc.nnode_max=1;
struct qsmm_engine_desc_s *const engine_desc_p=desc.engine_desc;
engine_desc_p[QSMM_ENGINE_ENV].nspur=2;
engine_desc_p[QSMM_ENGINE_ENV].ntime=1;
engine_desc_p[QSMM_ENGINE_IEE].nspur=1;
engine_desc_p[QSMM_ENGINE_IEE].ntime=1;
struct qsmm_actor_large_desc_s large_desc;
memset(&large_desc,0,sizeof(large_desc));
large_desc.tree_arity=2;
engine_desc_p[QSMM_ENGINE_ENV].large_desc_p=&large_desc;
engine_desc_p[QSMM_ENGINE_IEE].large_desc_p=&large_desc;
CHK_FAIL(qsmm_create,&desc,&qsmm);
QSMM_REG_INSTR_META_CLASS_PARAM(qsmm,mv,0);
qsmm_reg_instr_class_set(qsmm,"walker",0,0);
for (enum direct_e direct=0; direct<DIRECT_COUNT; direct++)
qsmm_reg_instr_class_v2(qsmm, "mv", "walker", 0,
sizeof(direct), &direct, 0);
if (!prg_asm) {
printf("nstate %d\n",NSTATE);
qsmm_set_nstate_max(qsmm,"walker",NSTATE);
}
qsmm_sig_t node=0;
qsmm_node_create_v2(qsmm,"walker",0,&node);
qsmm_engine_create(qsmm);
if (prg_asm) qsmm_node_asm(qsmm,node,QSMM_EXCEPT_ALL,prg_asm,msglist);
const qsmm_actor_t actor_env=
qsmm_get_actpair_actor(qsmm_get_actpair(qsmm),QSMM_ENGINE_ENV);
CHK_FAIL(qsmm_set_actor_auto_spur_type,actor_env,0);
CHK_FAIL(qsmm_set_actor_spur_weight,actor_env,0,1);
CHK_FAIL(qsmm_set_actor_spur_weight,actor_env,1,1);
if (argc>1 && (seed=atoi(argv[1]))<0) {
qsmm_set_random(qsmm,1);
seed=-seed;
}
qsmm_rng_seed(qsmm_get_rng(qsmm),seed);
for (int visit=0; visit<NVISIT; visit++) {
unsigned char percept=0;
is_gold_found=0;
path_len_pass=0;
opaque_maze(DIRECT_NORTH,&percept);
qsmm_node_call_default(qsmm,node,0);
printf(
"visit %d: ngf %d, path_pass %d, path_total %d, ngf/path_total %.8f\n",
visit+1, n_gold_found, path_len_pass, path_len_total,
(double) n_gold_found/path_len_total);
}
exit_code=0;
Exit:
qsmm_prg_destroy(prg_asm);
qsmm_destroy(qsmm);
if (msglist) {
msglist_dump=msglist;
msglist=0;
CHK_FAIL(qsmm_msglist_dump,msglist_dump,0,"BUFFER",0,stderr);
}
qsmm_msglist_destroy(msglist_dump);
return exit_code;
}
Sample program output is below.
$ qsmm-example-maze-asm -1 nstate 512 visit 1: ngf 0, path_pass 94, path_total 94, ngf/path_total 0.00000000 visit 2: ngf 0, path_pass 706, path_total 800, ngf/path_total 0.00000000 ... visit 101: ngf 19, path_pass 640, path_total 87374, ngf/path_total 0.00021746 visit 102: ngf 19, path_pass 962, path_total 88336, ngf/path_total 0.00021509 ... visit 199: ngf 37, path_pass 90, path_total 178714, ngf/path_total 0.00020703 visit 200: ngf 37, path_pass 726, path_total 179440, ngf/path_total 0.00020620
$ qsmm-example-maze-asm 1 nstate 512 visit 1: ngf 0, path_pass 632, path_total 632, ngf/path_total 0.00000000 visit 2: ngf 1, path_pass 2252, path_total 2884, ngf/path_total 0.00034674 ... visit 101: ngf 88, path_pass 266, path_total 51398, ngf/path_total 0.00171213 visit 102: ngf 89, path_pass 266, path_total 51664, ngf/path_total 0.00172267 ... visit 199: ngf 186, path_pass 468, path_total 82666, ngf/path_total 0.00225002 visit 200: ngf 187, path_pass 338, path_total 83004, ngf/path_total 0.00225290
The above pieces of output show that the average amount of gold found per one move in the labyrinth in adaptive mode of program operation with random seed 1 without using the profile assembler program is about 11 times greater than in random mode of program operation.
$ qsmm-example-maze-asm -1 "$prefix/share/qsmm/samples/maze.asm" visit 1: ngf 1, path_pass 3086, path_total 3086, ngf/path_total 0.00032404 visit 2: ngf 1, path_pass 274, path_total 3360, ngf/path_total 0.00029762 ... visit 101: ngf 17, path_pass 768, path_total 92636, ngf/path_total 0.00018351 visit 102: ngf 17, path_pass 898, path_total 93534, ngf/path_total 0.00018175 ... visit 199: ngf 43, path_pass 132, path_total 192766, ngf/path_total 0.00022307 visit 200: ngf 43, path_pass 118, path_total 192884, ngf/path_total 0.00022293
$ qsmm-example-maze-asm 1 "$prefix/share/qsmm/samples/maze.asm" visit 1: ngf 1, path_pass 3086, path_total 3086, ngf/path_total 0.00032404 visit 2: ngf 2, path_pass 52, path_total 3138, ngf/path_total 0.00063735 ... visit 101: ngf 101, path_pass 52, path_total 8286, ngf/path_total 0.01218923 visit 102: ngf 102, path_pass 52, path_total 8338, ngf/path_total 0.01223315 ... visit 199: ngf 199, path_pass 52, path_total 13382, ngf/path_total 0.01487072 visit 200: ngf 200, path_pass 52, path_total 13434, ngf/path_total 0.01488760
The above pieces of output show that the average amount of gold found per one move in the labyrinth in adaptive mode of program operation with random seed 1 and the use of the profile assembler program is about 67 times greater than in random mode of program operation.