実践ハイパフォーマンスMySQLを参考に情報収集スクリプトを書いてみました

実践ハイパフォーマンスMySQLを参考にMySQLのメトリック取得のスクリプトを作成しました。 シェルスクリプトの勉強のため、余計な機能をいくつかつけました。

日に日に継ぎ足ししているため、すごく長くなってきました。。。

As I read High Perfomance MySQL, there is a exmple script getting mysql status metrics. Also for my study shellscript, I created the script and added as much useful functionality as I could.

取得スクリプト

getoptsを使ってみたかったため、なんか長いです。

引っかかったところが、getoptsでフラグを立てて、フラグの判定で その処理を実行したかったのですが、オプションがないときにFLAGの変数に値がないと [ ]内の判定で以下のエラーが発生し、回避策が のように二重で囲むと良いらしいです。

[: !==: 単項演算子が予期されます

参考 https://bm-server.net/2015/08/07/379/

#!/bin/bash

##
INTERVAL=3
BASE_OUTPUT_DIR=/tmp/mysql_metric_data
##

SCRIPT_PID=$$
SCRIPT_NAME=`basename $0`

function usage() {
cat <<_EOT_
Usage:
  ${SCRIPT_NAME} -user -password [ -host -time -report-time -zip ] &

Description
 -u : MySQL user
 -p : MySQL password
 -h : MySQL Server Hostname
 -t : Script loop time
 -r : Repoting elapsed time while script executing, but it cannot set under ${INTERVAL} time.
 -z : zip results file after script completed.

 * This scprit should be executed as background job!

_EOT_
}


while getopts u:p:h:t:r:z OPT
do
  case $OPT in
   u ) MYSQL_USER="$OPTARG"
        ;;
   p ) MYSQL_PWD="$OPTARG"
        ;;
    h ) FLAG_H="TRUE" ; MYSQL_HOST="$OPTARG"
        ;;
    t ) FLAG_T="TRUE" ; SAMPLING_LOOP_TIME="$OPTARG"
        ;;
    r ) FLAG_R="TRUE" ; REPORT_TIME="$OPTARG"
        ;;
    z ) FLAG_Z="TRUE"
        ;;
   \? ) echo "[NOTICE] ${SCRIPT_NAME} is not correctly excuted."
     usage
     exit 1
  esac
done

if [[ ${FLAG_H} != 'TRUE' ]] ; then
 echo "[NOTICE] When -h option value is not given, it set 'localhost'"
 MYSQL_HOST="localhost"
fi

if [[ ${FLAG_T} != 'TRUE' ]] ; then
 echo "[NOTICE] When -t option value is not given, it set '86400'"
 SAMPLING_LOOP_TIME="86400"
fi

for (( i=1; i <= $#; i++  ))
do
 eval echo "[NOTICE] argv[$i] = \$$i"
done

export MYSQL_PWD
MYSQL_EXE_CMD="mysql -u ${MYSQL_USER} -h ${MYSQL_HOST}"
MYSQL_ADMIN_CMD="mysqladmin -u${MYSQL_USER} -h${MYSQL_HOST}"
SAMPLING_REPEAT_COUNT=`expr $SAMPLING_LOOP_TIME / $INTERVAL`
SAMPLING_COUNTER=0 # Initiaize
SCRIPT_OUTPUT_LOWEST_DIR_NAME=`date "+%Y%m%d-%H%M%S"`
SCRIPT_OUTPUT_EACH_DIR=${BASE_OUTPUT_DIR}/${SCRIPT_OUTPUT_LOWEST_DIR_NAME}
RUNFILE=/var/run/${SCRIPT_NAME}-${SCRIPT_PID}.pid


if  [ ! -e ${SCRIPT_OUTPUT_EACH_DIR} ]; then
 mkdir -p ${SCRIPT_OUTPUT_EACH_DIR}
 echo "[NOTICE] Created ${SCRIPT_OUTPUT_EACH_DIR}. Result data will write here."
fi

$MYSQL_ADMIN_CMD ping 1>/dev/null 2>/dev/null
if [ $? -eq 1 ]; then
  echo "[ERROR] Cannot connect to ${MYSQL_HOST}."
  echo "[ERROR] Script aborted."
  exit 1
else
  echo "[NOTICE] Connecting to ${MYSQL_HOST} successed."
fi

# Below executing only one time
$MYSQL_EXE_CMD -e 'SHOW DATABASES\G' >> ${SCRIPT_OUTPUT_EACH_DIR}/SHOW_DATABASE.txt
$MYSQL_EXE_CMD -e 'SELECT * FROM mysql.user\G' >> ${SCRIPT_OUTPUT_EACH_DIR}/mysql.user.txt
$MYSQL_EXE_CMD -e 'SHOW GLOBAL VARIABLES\G' >> ${SCRIPT_OUTPUT_EACH_DIR}/SHOW_GLOBAL_VARIABLES.txt
 

touch ${RUNFILE}
echo "[NOTICE] ${RUNFILE} is created. Loop to get metrics will start."
echo "[NOTICE] When you want to stop process ${SCRIPT_PID}, delete ${RUNFILE} file."
echo "[NOTICE] DO NOT try to kill process directly."

while [ -e ${RUNFILE} ];
do
 file=$(date +%F_%I)
 sleep=$(date +%s.%N | awk "{print $INTERVAL - (\$1 % $INTERVAL)}")
 sleep $sleep
 ts="$(date +"TS %s.%N %F %T")"

 echo "$ts"            >> ${SCRIPT_OUTPUT_EACH_DIR}/TIMESTAMP.txt
 echo "$ts"            >> ${SCRIPT_OUTPUT_EACH_DIR}/SHOW_GLOBAL_STATUS.txt
 $MYSQL_EXE_CMD -e 'SHOW GLOBAL STATUS'     >> ${SCRIPT_OUTPUT_EACH_DIR}/SHOW_GLOBAL_STATUS.txt &
 echo "$ts"            >> ${SCRIPT_OUTPUT_EACH_DIR}/innodbstatus
 $MYSQL_EXE_CMD -e 'SHOW ENGINE INNODB STATUS\G'   >> ${SCRIPT_OUTPUT_EACH_DIR}/innodbstatus &
 echo "$ts"            >> ${SCRIPT_OUTPUT_EACH_DIR}/processlist
 $MYSQL_EXE_CMD -e 'SHOW FULL PROCESSLIST\G'    >> ${SCRIPT_OUTPUT_EACH_DIR}/processlist &

 if [ ${SAMPLING_COUNTER} -eq 0 ]; then
  $MYSQL_ADMIN_CMD -r -i $INTERVAL ext     >> ${SCRIPT_OUTPUT_EACH_DIR}/MYSQL_ADMIN_extended-status.txt &
  MYSQL_ADMIN_CMD_INTERVAL_PID=`pgrep mysqladmin`
  echo "[NOTICE] mysqladmin -r -i $INTERVAL ext (${MYSQL_ADMIN_CMD_INTERVAL_PID} pid) process started as backgourd job."
 fi

 SAMPLING_COUNTER=`expr ${SAMPLING_COUNTER} + 1`
 if [ ${SAMPLING_COUNTER} -gt ${SAMPLING_REPEAT_COUNT} ]; then
  echo "[NOTICE] ${SAMPLING_LOOP_TIME} secouds passed."
 
  if [[ ${FLAG_Z} = 'TRUE' ]] ; then
   echo "[NOTICE] Result files is going to be zipped because -z option is set."
   zip -r ${BASE_OUTPUT_DIR}/${SCRIPT_OUTPUT_LOWEST_DIR_NAME}.zip ${SCRIPT_OUTPUT_EACH_DIR} 1>/dev/null 2>/dev/null
   ZIP_FLAG=`echo $?`

   if [[ ${ZIP_FLAG} -eq 1 ]]; then
    echo "[ERROR] zip failed."
   else
    echo "[NOTICE] zip succeded."
   fi
  fi

  rm -f ${RUNFILE}
  echo "[NOTICE] ${RUNFILE} removed by script."
  kill -9 ${MYSQL_ADMIN_CMD_INTERVAL_PID}
  echo "[NOTICE] ${MYSQL_ADMIN_CMD_INTERVAL_PID} backgrouod process killed."
  echo "[NOTICE] ${SCRIPT_NAME} is going to end..."
 
 exit 0
 fi

 if [[ ${FLAG_R} = 'TRUE' ]] ; then
  if [ `expr ${INTERVAL} \* ${SAMPLING_COUNTER} % ${REPORT_TIME}` -eq 0 ]; then
  echo "[NOTICE] `expr ${INTERVAL} \* ${SAMPLING_COUNTER}` seconds passed. Scripts keeps on working!"
  fi
 fi

done

echo ""
echo "[NOTICE] Script Completed !"
echo "[NOTICE] Existing because $RUNFILE does not exist."
kill -9 ${MYSQL_ADMIN_CMD_INTERVAL_PID}
echo "[NOTICE] mysqladmin -r -i $INTERVAL ext (${MYSQL_ADMIN_CMD_INTERVAL_PID} pid) backgrouod process killed."
exit 0

#EOF

整形スクリプト

上のスクリプトで収集した結果でSHOW GLOBAL STATUSの結果を整形するスクリプトです。 整形したいメトリックを簡単に変更できるように別ファイルに記載して それを読み込んで実行するようにしました。

Here analyzing scrpit from above script result. Metrics that is analyzed possibly vary and I want to separate what metrics analyze. So script reads from external file.

こんな感じで、現在の値として整形したい場合はメトリックの頭に"-"の記号を付けておくことが必要です。

Com_insert
-Connections
#!/bin/bash
OUTPUT_DIR=/tmp/mysql_metric_data/analyze
CMDNAME=`basename $0`

function usage() {
cat <<_EOT_
Usage:
  ${CMDNAME} -f -m

Description
  -f    : FULLPATH filename that contains raw data.
  -m    : FULLPATH filename that contains metric names.
          Marking "-" first letter of metric name tells that it culculate as current values.

# Metric file example
value1  # accumulative value.
-value2 # current value.

_EOT_
}


while getopts f:m: OPT
do
    case $OPT in
        f   ) RAWDATA_FILE="${OPTARG}"
                ;;
        m   ) METRICS_FILE="${OPTARG}"
                ;;
        \?) usage
                exit 1
                ;;
    esac
done

if [ $# -ne 4 ]; then
    echo "[ERROR] Scirpt is not collectly executed!"
    echo ""
    usage
    exit 1
fi

if [ ! -e ${OUTPUT_DIR} ]; then
    mkdir -p ${OUTPUT_DIR}
    echo [NOTICE] Created ${OUTPUT_DIR} because it does not exist.
fi

# yyyy-mm-dd HH:MM:SS
echo "LABEL" >> ${OUTPUT_DIR}/1_TIMESTAMP.csv
echo "DATE_TIME" >> ${OUTPUT_DIR}/1_TIMESTAMP.csv
awk '
/^TS/ {
  printf "%s %s\n", $3, $4
}' ${RAWDATA_FILE} >> ${OUTPUT_DIR}/1_TIMESTAMP.csv

# Elapsed time
ARRAY_UTC_TIMESTAMP=( `awk '/^TS/{ print $2 }' ${RAWDATA_FILE} | cut -d'.' -f1` )
COUNT_END_0=`expr ${#ARRAY_UTC_TIMESTAMP[*]} - 1`
echo "," >> ${OUTPUT_DIR}/2_Elapsed_Time.csv
echo ",Elapsed_Time" >> ${OUTPUT_DIR}/2_Elapsed_Time.csv

for array_key_num in `seq 0 1 ${COUNT_END_0}`
do

    if [[ FLAG -eq 0 ]]; then
        UTC_START=${ARRAY_UTC_TIMESTAMP[${array_key_num}]} 
        echo ,0
        FLAG=1
    else
        echo ,`expr ${ARRAY_UTC_TIMESTAMP[${array_key_num}]} - ${UTC_START}`
    fi 

done >> ${OUTPUT_DIR}/2_Elapsed_Time.csv

# Processing metrics from metric file
ARRAY_ALL_METRICS=( `cat ${METRICS_FILE} | sed -e 's/^-//g' -e '/^$/d' -e '/^#/d'` )
ARRAY_CURRENT_VALUE=( `awk '/^-.*/' ${METRICS_FILE} | sed -e 's/^-//g' -e '/^$/d' -e '/^#/d'` )
echo "[NOTICE] ${ARRAY_CURRENT_VALUE[@]}"
echo "[NOTICE] will be calculated as current value."

for METRICS in ${ARRAY_ALL_METRICS[@]}
do
    regexp="^${METRICS}$"

    if [[ ${ARRAY_CURRENT_VALUE[@]} =~ ${METRICS} ]] ; then
        awk -v METRICS=${METRICS} -v regexp="${regexp}" '
        $1 ~ regexp {
            if ( FLAG==0 ) {
                print ",Current value"
                FLAG++
                print ","METRICS
                printf ",%s\n" , $2
                }
            else {
                printf ",%s\n" , $2
                }
        }' ${RAWDATA_FILE} >> ${OUTPUT_DIR}/${METRICS}.csv
    else    
        awk -v METRICS=${METRICS} -v regexp="${regexp}" '
        $1 ~ regexp {
            if ( FLAG==0 ) {
                START_VALUE=$2
                FLAG++
                print ",Accumulative value"
                print ","METRICS
                printf ",%s\n" , START_VALUE
                }   
            else {
                ACCUMULATIVE_VALUE=$2-START_VALUE
                printf ",%s\n" , ACCUMULATIVE_VALUE
                }
        }' ${RAWDATA_FILE} >> ${OUTPUT_DIR}/${METRICS}.csv
    fi
done

# Query Cache hit ratio
if [[ ${ARRAY_ALL_METRICS[@]} =~ Qcache_hits ]] ; then
    if  [[ ${ARRAY_ALL_METRICS[@]} =~ Com_select ]] ; then

        echo "[NOTICE] Query Cache hit ratio is calculating..."

        ARRAY_Qcache_hits=( `awk '/^Qcache_hits/{ print $2 }' ${RAWDATA_FILE}` )
        ARRAY_Com_select=( `awk '/^Com_select/{ print $2 }' ${RAWDATA_FILE}` )
        COUNT_END_1=`expr ${#ARRAY_Qcache_hits[*]} - 1`
        
        echo "," >> ${OUTPUT_DIR}/_Query_cache_hit_ratio.csv
        echo ",Query_cache_hit_ratio" >> ${OUTPUT_DIR}/_Query_cache_hit_ratio.csv
        
        for array_key_num in `seq 0 1 ${COUNT_END_1}`
        do

            # To calculate after decimal point, use awk
            echo ${ARRAY_Qcache_hits[${array_key_num}]} ${ARRAY_Com_select[${array_key_num}]} |\
            awk '{ print ","$1 / ( $1 + $2 ) }'

        done >> ${OUTPUT_DIR}/_Query_cache_hit_ratio.csv

    fi
fi

# InnoDB buffer hit ratio
if [[ ${ARRAY_ALL_METRICS[@]} =~ Innodb_buffer_pool_read_requests ]] ; then
    if  [[ ${ARRAY_ALL_METRICS[@]} =~ Innodb_buffer_pool_reads ]] ; then

        echo "[NOTICE] InnoDB buffer hit ratio is calculating..."

        ARRAY_Innodb_buffer_pool_read_requests=( `awk '/^Innodb_buffer_pool_read_requests/{ print $2 }' ${RAWDATA_FILE}`)
        ARRAY_Innodb_buffer_pool_reads=( `awk '/^Innodb_buffer_pool_reads/{ print $2 }' ${RAWDATA_FILE}` )
        COUNT_END_2=`expr ${#ARRAY_Innodb_buffer_pool_read_requests[*]} - 1`

        echo "," >> ${OUTPUT_DIR}/_InnoDB_buffer_pool_hit_ratio.csv
        echo ",InnoDB_buffer_pool_hit_ratio" >> ${OUTPUT_DIR}/_InnoDB_buffer_pool_hit_ratio.csv

        for array_key_num in `seq 0 1 ${COUNT_END_2}`
        do

            # To calculate after decimal point, use awk
            echo ${ARRAY_Innodb_buffer_pool_read_requests[${array_key_num}]} ${ARRAY_Innodb_buffer_pool_reads[${array_key_num}]} |\
            awk '{ print ","$1 / ( $1 + $2 ) }'

        done >> ${OUTPUT_DIR}/_InnoDB_buffer_pool_hit_ratio.csv

    fi
fi

# Combining result files
ARRAY_FULLPATH=( `find ${OUTPUT_DIR} -maxdepth 1 -type f` )
for i in ${ARRAY_FULLPATH[@]}
do 
    echo $i; 
done | sort -t / -k 3,3 | xargs paste >> ${OUTPUT_DIR}/0_METRIC_ANALYZED.csv

#EOF