Blog of (former?) MySQL Entomologist: When kill flag is checked for SELECT? Part I

Manual describes this briefly:

In SELECT, ORDER BY and GROUP BY loops, the flag is checked after reading a block of rows. If the kill flag is set, the statement is aborted.

Complete, correct and useful answer is more complex though. Here is correct answer, but not very useful. So, kill flag is checked in the following functions related to SELECT statement processing:

make_join_statistics()
best_extension_by_limited_search()
find_best()
sub_select_cache()
evaluate_join_record()
flush_cached_records()
end_write()
end_update()
end_unique_update()
end_write_group()
remove_dup_with_compare()
remove_dup_with_hash_index()

Continue reading if you are interested in the process of getting correct answer and would like to know how to make it also complete and useful eventually.

Let's just use grep to search for thd->killed usage in source code (of version 5.5.30) that implements SELECT and then check functions with these lines:

[openxs@chief mysql-5.5]$ grep -n 'thd->kill' sql/sql_select.cc
3136: DBUG_RETURN(join->thd->killed || get_best_combination(join));
5463: if (thd->killed) // Abort
5602: if (thd->killed)
11604: if (join->thd->killed)          // If aborted by user
11807: if (join->thd->killed)                  // Aborted by user
12052:    if (join->thd->killed)
12915: if (join->thd->killed)                  // Aborted by user
12968: if (join->thd->killed)                  // Aborted by user
13048: if (join->thd->killed)                  // Aborted by user
13095: if (join->thd->killed)
14394:    if (thd->killed)
14523:    if (thd->killed)

Line 3136 is in the make_join_statistics() function, that is, flag is checked in the process of query optimization also:

3122 /* Find an optimal join order of the non-constant tables. */

3123 if (join->const_tables != join->tables)

3124 {

3125 optimize_keyuse(join, keyuse_array);

3126 if (choose_plan(join, all_table_map & ~join->const_table_map))

3127 goto error;

3128 }

3129 else

3130 {

3131 memcpy((uchar*) join->best_positions,(uchar*) join->positions,

3132 sizeof(POSITION)*join->const_tables);

3133 join->best_read=1.0;

3134 }

3135 /* Generate an execution plan from the found optimal join order. */

3136 DBUG_RETURN(join->thd->killed || get_best_combination(join));

Line 5463 is in the best_extension_by_limited_search() function, so again flag is checked in the process of query optimization:

5451 static bool

5452 best_extension_by_limited_search(JOIN *join,

5453 table_map remaining_tables,

5454 uint idx,

5455 double record_count,

5456 double read_time,

5457 uint search_depth,

5458 uint prune_level)

5459 {

5460 DBUG_ENTER("best_extension_by_limited_search");

5461

5462 THD *thd= join->thd;

5463 if (thd->killed) // Abort

5464 DBUG_RETURN(TRUE);

5465

5466 DBUG_EXECUTE("opt", print_plan(join, idx, read_time, record_count, idx,

5467 "SOFAR:"););

5468

5469 /*

5470 'join' is a partial plan with lower cost than the best plan so far,

5471 so continue expanding it further with the tables in 'remaining_tables'.

5472 */

5473 JOIN_TAB *s;

Line 5602 is in the find_best() function:

5596 static bool

5597 find_best(JOIN *join,table_map rest_tables,uint idx,double record_count,

5598 double read_time)

5599 {

5600 DBUG_ENTER("find_best");

5601 THD *thd= join->thd;

5602 if (thd->killed)

5603 DBUG_RETURN(TRUE);

5604 if (!rest_tables)

5605 {

5606 DBUG_PRINT("best",("read_time: %g record_count: %g",read_time,

5607 record_count));

5608

5609 read_time+=record_count/(double) TIME_FOR_COMPARE;

...

So, again it is related to the process of finding optimal join order at the query optimization stage. All these cases are not mentioned in the manual.

Line 11604 is finally related to query execution (reading rows). It is in the sub_select_cache() function:

 enum_nested_loop_state
 sub_select_cache(JOIN *join,JOIN_TAB *join_tab,bool end_of_records)

 {

   enum_nested_loop_state rc;

   if (end_of_records)

   {

     rc= flush_cached_records(join,join_tab,FALSE);

     if (rc == NESTED_LOOP_OK || rc == NESTED_LOOP_NO_MORE_ROWS)

       rc= sub_select(join,join_tab,end_of_records);

     return rc;

   }

   if (join->thd->killed)        // If aborted by user

   {

     join->thd->send_kill_message();

     return NESTED_LOOP_KILLED;                   /* purecov: inspected */

   }

...

Line 11807 is at the beginning of the evaluate_join_record() function:

11794 static enum_nested_loop_state

11795 evaluate_join_record(JOIN *join, JOIN_TAB *join_tab,

11796 int error)

11797 {

11798 bool not_used_in_distinct=join_tab->not_used_in_distinct;

11799 ha_rows found_records=join->found_records;

11800 COND *select_cond= join_tab->select_cond;

11801 bool select_cond_result= TRUE;

11802

11803 if (error > 0 || (join->thd->is_error())) // Fatal error

11804 return NESTED_LOOP_ERROR;

11805 if (error < 0)

11806 return NESTED_LOOP_NO_MORE_ROWS;

11807 if (join->thd->killed) // Aborted by user

11808 {

11809 join->thd->send_kill_message();

11810 return NESTED_LOOP_KILLED; /* purecov: inspected */

11811 }

Line 12052 is in the records reading loop (finally) in the flush_cached_records() function (that you see called above on line 11599 in sub_select_cache()):

12036 /* read through all records */

12037 if ((error=join_init_read_record(join_tab)))

12038 {

12039 reset_cache_write(&join_tab->cache);

12040 return error < 0 ? NESTED_LOOP_NO_MORE_ROWS: NESTED_LOOP_ERROR;

12041 }

12042

12043 for (JOIN_TAB *tmp=join->join_tab; tmp != join_tab ; tmp++)

12044 {

12045 tmp->status=tmp->table->status;

12046 tmp->table->status=0;

12047 }

12048

12049 info= &join_tab->read_record;

12050 do

12051 {

12052 if (join->thd->killed)

12053 {

12054 join->thd->send_kill_message();

12055 return NESTED_LOOP_KILLED; // Aborted by user /* purecov: inspected */

12056 }

...

12093 } while (!(error=info->read_record(info)));

Line 12915 is in the end_write() function:

12908 static enum_nested_loop_state

12909 end_write(JOIN *join, JOIN_TAB *join_tab __attribute__((unused)),

12910 bool end_of_records)

12911 {

12912 TABLE *table=join->tmp_table;

12913 DBUG_ENTER("end_write");

12914

12915 if (join->thd->killed) // Aborted by user

12916 {

12917 join->thd->send_kill_message();

12918 DBUG_RETURN(NESTED_LOOP_KILLED); /* purecov: inspected */

12919 }

Line 12968 is in the end_update():

12957 static enum_nested_loop_state

12958 end_update(JOIN *join, JOIN_TAB *join_tab __attribute__((unused)),

12959 bool end_of_records)

12960 {

12961 TABLE *table=join->tmp_table;

12962 ORDER *group;

12963 int error;

12964 DBUG_ENTER("end_update");

12965

12966 if (end_of_records)

12967 DBUG_RETURN(NESTED_LOOP_OK);

12968 if (join->thd->killed) // Aborted by user

12969 {

12970 join->thd->send_kill_message();

12971 DBUG_RETURN(NESTED_LOOP_KILLED); /* purecov: inspected */

12972 }

Line 13048 is in the end_unique_update() (it's getting boring, isn't it, a lot of similar looking code is similar named functions):

 static enum_nested_loop_state
 end_unique_update(JOIN *join, JOIN_TAB *join_tab __attribute__((unused)),

           bool end_of_records)

 {

   TABLE *table=join->tmp_table;

   int     error;

   DBUG_ENTER("end_unique_update");

   if (end_of_records)

     DBUG_RETURN(NESTED_LOOP_OK);

   if (join->thd->killed)            // Aborted by user

   {

     join->thd->send_kill_message();

     DBUG_RETURN(NESTED_LOOP_KILLED);             /* purecov: inspected */

   }

Line 13095 is from something similar also, end_write_group() function (note that comment is placed differently though):

 static enum_nested_loop_state
 end_write_group(JOIN *join, JOIN_TAB *join_tab __attribute__((unused)),

         bool end_of_records)

 {

   TABLE *table=join->tmp_table;

   int     idx= -1;

   DBUG_ENTER("end_write_group");

   if (join->thd->killed)

   {                     // Aborted by user

     join->thd->send_kill_message();

     DBUG_RETURN(NESTED_LOOP_KILLED);             /* purecov: inspected */

   }

We are almost done. Line 14394 is finally closely related to something manual described, "GROUP BY loop" in the remove_dup_with_compare() function:

14377 static int remove_dup_with_compare(THD *thd, TABLE *table, Field **first_field,

14378 ulong offset, Item *having)

14379 {

14380 handler *file=table->file;

14381 char *org_record,*new_record;

14382 uchar *record;

14383 int error;

14384 ulong reclength= table->s->reclength-offset;

14385 DBUG_ENTER("remove_dup_with_compare");

14386

14387 org_record=(char*) (record=table->record[0])+offset;

14388 new_record=(char*) table->record[1]+offset;

14389

14390 file->ha_rnd_init(1);

14391 error=file->rnd_next(record);

14392 for (;;)

14393 {

14394 if (thd->killed)

14395 {

14396 thd->send_kill_message();

14397 error=0;

14398 goto err;

14399 }

...

14446 file->position(record); // Remember position

14447 }

14448 }

14449 if (!found)

14450 break; // End of file

14451 /* Restart search on next row */

14452 error=file->restart_rnd_next(record,file->ref);

14453 }

14454

14455 file->extra(HA_EXTRA_NO_CACHE);

14456 DBUG_RETURN(0);

14457 err:

14458 file->extra(HA_EXTRA_NO_CACHE);

14459 if (error)

14460 file->print_error(error,MYF(0));

14461 DBUG_RETURN(1);

14462 }

Finally, line 14523 is in the loop in the remove_dup_with_hash_index() function:

14518 file->ha_rnd_init(1);

14519 key_pos=key_buffer;

14520 for (;;)

14521 {

14522 uchar *org_key_pos;

14523 if (thd->killed)

14524 {

14525 thd->send_kill_message();

14526 error=0;

14527 goto err;

14528 }

...

That's all great, but how these functions are related to each other? MySQL Internals manual will help us to get the general picture of how SELECT is processed:

handle_select()
   mysql_select()
     JOIN::prepare()
       setup_fields()
     JOIN::optimize()            /* optimizer is from here ... */
       optimize_cond()
       opt_sum_query()
       make_join_statistics()
         get_quick_record_count()
         choose_plan()
           /* Find the best way to access tables */
           /* as specified by the user.          */
           optimize_straight_join()
             best_access_path()
           /* Find a (sub-)optimal plan among all or subset */
           /* of all possible query plans where the user    */
           /* controls the exhaustiveness of the search.   */
           greedy_search()
             best_extension_by_limited_search()
               best_access_path()
           /* Perform an exhaustive search for an optimal plan */
           find_best()
       make_join_select()        /* ... to here */
     JOIN::exec()

In the diagram above I've highlighted optimizer part with different background and functions where kill flag is checked with bold.

It would be nice to see/describe JOIN::exec() in a way similar to above. I plan to do this in the second part of this post. To be continued...

2 comments:

sbesterFebruary 17, 2013 at 3:50 PM
for years i wanted to get rid of code like this:
if (thd->killed)
....

and make it into a function. that would be easier to have ability to inject fake kills into the server code (been many bugs in the passed due to wrong kill handling).
Valerii KravchukFebruary 18, 2013 at 9:43 AM
I see in some places we just immediately return when thd->killed is set, while in others we go to some error label at the end of the same function to do cleanup. Not sure if simple single function may help.

Saturday, February 16, 2013

When kill flag is checked for SELECT? Part I

2 comments: