{"id":506,"date":"2023-05-19T08:05:13","date_gmt":"2023-05-19T08:05:13","guid":{"rendered":"https:\/\/www.aixperts.co.uk\/?p=506"},"modified":"2023-10-12T14:59:38","modified_gmt":"2023-10-12T14:59:38","slug":"capacity-and-performance-check-script","status":"publish","type":"post","link":"https:\/\/www.aixperts.co.uk\/?p=506","title":{"rendered":"Capacity and performance check script"},"content":{"rendered":"\n<p>Another little script I wrote to check capacity aspects of an AIX LPAR. I call it capacity checks as it is basing most of the checks on counters and averaging out over 90 days. Some of this is based on Earl Jew&#8217;s excellent vmstat presentation to the IBM POWER VUG.<\/p>\n\n\n\n<p>The script checks memory and I\/O buffer over-flow counters as well as LPAR SRAD spreading.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/bin\/ksh93\n\n# Performance recommendation tool\n#\n# Copyright Henrik Morsing, 2022\n#\n# Initial version 1.0\n# 09-11-2022    Henrik Morsing  1.1     Added more informative output\n#                                       and correct when to alert (6 digits, not 5)\n\n# Set a reference to current days up\n\nref=\"$(uptime | grep days | awk '{ print $3 }')\"\n\n#\n\n# If less than a day or two, exit, less than twenty, warn\n\nif &#091;&#091; \"${ref}\" == \"\" ]]\nthen\n   echo \"System uptime too low.\"\n   exit 1\nelseif &#091;&#091; \"${ref}\" -lt 20 ]]\n   echo \"System uptime too low to give accurate results.\"\nfi\n\necho\necho \"Starting System Performance Analyser v1.0\"\necho\necho \"System Name: $(uname -n) - System Uptime Days: ${ref}\"\necho\necho \"Please bear in mind, as stats used are accumulated over time,\"\necho \"they can be a view of the past and issues may already have been rectified.\"\necho\necho\n\n#####################\n# MEMORY\n#####################\n\necho \"\\t *** MEMORY CHECKS ***\"\necho\necho \"Add more memory to rectify these\"\necho\n\n# Start by checking some memory variables\n# Read paging space page outs, revolutions of the clock hand, free frame waits\n\nvmstat -s | grep -E 'paging space page outs|revolutions of the clock hand|free frame waits' | awk '{ print $1 } ' | tr '\\n' ' ' | read page_outs revolutions frame_waits\n\n# First, convert to 90 day reference\npage_outs_90=$(( ${page_outs}\/${ref}*90 ))\nrevolutions_90=$(( ${revolutions}\/${ref}*90 ))\nframe_waits_90=$(( ${frame_waits}\/${ref}*90 ))\n\n# echo ${page_outs_90}\n# echo ${revolutions_90}\n# echo ${frame_waits_90}\n\n# Then, find number of digits\npage_outs_digits=${#page_outs_90}\nframe_waits_digits=${#frame_waits_90}\n\n# echo \"${page_outs_digits}\"\n# echo \"${frame_waits_digits}\"\n\n# Check on numbers and warn as needed\n\nif &#091;&#091; ${page_outs_digits} -gt 7 || ${revolutions} -gt $(( ${ref}*100 )) || ${frame_waits_digits} -gt 6 ]]\nthen\n   echo \"You are extremely memory constrained:\"\n   &#091;&#091; ${page_outs_digits} -gt 7 ]] &amp;&amp; echo \"- \\033&#091;1;31m'paging space page outs' extremely is high:\\033&#091;m ${page_outs} -&gt; ${page_outs_90} per 90 days (${page_outs_digits} digits)\"\n   &#091;&#091; ${revolutions} -gt $(( ${ref}*100 )) ]] &amp;&amp; echo \"- \\033&#091;1;31m'revolutions of the clock hand' is extremely high:\\033&#091;m ${revolutions} -&gt; ${revolutions_90} per 90 days\"\n   &#091;&#091; ${frame_waits_digits} -gt 6 ]] &amp;&amp; echo \"- \\033&#091;1;31m'free frame waits' is extremely high:\\033&#091;m ${frame_waits} -&gt; ${frame_waits_90} per 90 days (${frame_waits_digits} digits)\"\n\nelif &#091;&#091; ${page_outs_digits} -gt 6 || ${revolutions} -gt $(( ${ref}*10 )) || ${frame_waits_digits} -gt 5 ]]\nthen\n   echo \"You are very memory constrained:\"\n   &#091;&#091; ${page_outs_digits} -gt 6 ]] &amp;&amp; echo \"- \\033&#091;1;33m'paging space page outs' very is high:\\033&#091;m ${page_outs} -&gt; ${page_outs_90} per 90 days (${page_outs_digits} digits)\"\n   &#091;&#091; ${revolutions} -gt $(( ${ref}*10 )) ]] &amp;&amp; echo \"- \\033&#091;1;33m'revolutions of the clock hand' is very high:\\033&#091;m ${revolutions} -&gt; ${revolutions_90} per 90 days\"\n   &#091;&#091; ${frame_waits_digits} -gt 5 ]] &amp;&amp; echo \"- \\033&#091;1;33m'free frame waits' is very high:\\033&#091;m ${frame_waits} -&gt; ${frame_waits_90} per 90 days (${frame_waits_digits} digits)\"\n\nelif &#091;&#091; ${page_outs_digits} -gt 5 || ${revolutions} -gt ${ref} || ${frame_waits_digits} -gt 4 ]]\nthen\n   echo \"You could benefit from adding more memory:\"\n   &#091;&#091; ${page_outs_digits} -gt 5 ]] &amp;&amp; echo \"- 'paging space page outs' is high: ${page_outs} -&gt; ${page_outs_90} per 90 days  (${page_outs_digits} digits)\"\n   &#091;&#091; ${revolutions} -gt ${ref} ]] &amp;&amp; echo \"- 'revolutions of the clock hand' is high: ${revolutions} -&gt; ${revolutions_90} per 90 days\"\n   &#091;&#091; ${frame_waits_digits} -gt 4 ]] &amp;&amp; echo \"- 'free frame waits' is high: ${frame_waits} -&gt; ${frame_waits_90} per 90 days (${frame_waits_digits} digits)\"\nfi\n\n\n#####################\n# PROCESSOR\n#####################\n\necho\necho \"\\t *** PROCESSOR CHECKS ***\"\necho\n\n# Checking for LPAR SRAD spreading\n\nnum_srads=\"$(lssrad -a | grep -v SRAD | wc -l)\"\nvCPUs_online=\"$(lparstat -i | grep 'Online Virtual CPUs' | awk '{ print $NF }').0\"\nvCPUs_max=\"$(lparstat -i | grep \"Maximum Virtual CPUs\" | awk '{ print $NF }')\"\nEntitlement=\"$(lparstat -i | grep \"Entitled Capacity\" | grep -v \"Pool\" | awk '{ print $NF }')\"\n\nif &#091;&#091; ${num_srads} -gt \"2\" ]]\nthen\n        echo \"LPAR is spread across multiple SRADs (${num_srads}). If memory (2TB?) and max processor allocations (less than 15 vCPUs, currently ${vCPUs_max}) suggests it can be contained within one SRAD, powering the LPAR off and on again might align it correctly.\"\nfi\n\necho\nprintf \"*** Checking spreading factor ***\"\n\nif &#091;&#091; ${vCPUs_online} -gt \"1\" ]]\nthen\n   if &#091;&#091; ${spreading} -gt 2 ]]\n   then\n      echo \"\\t&#091;\\033&#091;1;33mWARNING\\033&#091;m]\"\n      echo \"Number of virtual processors is high compared to entitlement.\"\n   else\n      echo \"\\t&#091;\\033&#091;1;32mOK\\033&#091;m]\"\n   fi\nfi\n\n\n#####################\n# I\/O\n#####################\n\n# Starting from the top, VGs first\n\necho\necho \"\\t *** I\/O CHECKS ***\"\necho\n\nfor volgroup in $(lsvg -o)\ndo\n\n   printf \"*** Checking ${volgroup} ***\"\n   msg=false\n\n   ##################\n   # Checking pbufs #\n   ##################\n\n   # Count blocked I\/Os with no pbuf\n   pervg_blocked_io_count=$(\/usr\/sbin\/lvmo -v ${volgroup} -o pervg_blocked_io_count)\n\n   # Reference to 90 days\n   pbio_90=$(( ${pervg_blocked_io_count}\/${ref}*90 ))\n\n   # Find number of digits\n   pbio_digits=${#pbio_90}\n\n   # Recommendation based on number of digits\n   if &#091;&#091; ${pbio_digits} -gt 6 ]]\n   then\n      url=true\n      echo \"\\t&#091;\\033&#091;1;33mWARNING\\033&#091;m]\"\n\n      # Calculate recommended pv_pbuf_count for VG\n      pbuf_curr=$(lvmo -v ${volgroup} -o pv_pbuf_count)\n      pbuf_vg=$(( ${pbuf_curr}+16384 ))\n\n      echo \"Volume group ${volgroup} is extremely low on pbufs\"\n      echo \"- \\033&#091;1;31m'pending disk I\/Os blocked with no pbuf' is extremely high:\\033&#091;m ${pbuf_curr}. Increase 'pv_pbuf_count' to ${pbuf_vg}.\\n\"\n   else\n      echo \"\\t&#091;\\033&#091;1;32mOK\\033&#091;m]\"\n   fi\ndone\n\n\n   ###################\n   # Checking psbufs #\n   ###################\n\n   # Count blocked paging space I\/O with no psbuf\n\n   vmstat -v | grep -E 'paging space I\/Os blocked with no psbuf|external pager filesystem I\/Os blocked with no fsbuf' | awk '{ print $1 } ' | tr '\\n' ' ' | read psbuf fsbuf\n\n   # Reference to 90 days\n   psio_90=$(( ${psbuf}\/${ref}*90 ))\n\n   # Any psbufs blocked is bad\n   if &#091;&#091; ${#psio_90} -gt 1 ]]\n   then\n      url=true\n      printf \"&#091;\\033&#091;1;33mWARNING\\033&#091;m] \"\n      echo \"\\033&#091;1;31mpsbufs is above 10\\033&#091;m, indicating severe memory restriction causing excessive paging. If you cannot add memory, alleviate by adding parallel paging spaces.\"\n   fi\n\n\n   ###################\n   # Checking fsbufs #\n   ###################\n   echo\n   # Count blocked external pager filesystem I\/O with no fsbuf\n\n   # Reference to 90 days\n   fsio_90=$(( ${fsbuf}\/${ref}*90 ))\n\n   # Any fsbufs blocked is bad\n   if &#091;&#091; ${#fsio_90} -gt 2 ]]\n   then\n      url=true\n      printf \"&#091;\\033&#091;1;33mWARNING\\033&#091;m] \"\n      echo \"\\033&#091;1;31mfsbufs is above 100\\033&#091;m, indicating filesystem I\/O over-load. Increase j2_dynamicBufferPreallocation with ioo to fix this. Start by doubling value.\"\n      echo \"Also consider splitting into smaller file systems.\"\n   fi\n\n   &#091;&#091; \"${url}\" == \"true\" ]] &amp;&amp; echo \"Info on I\/O buffers: https:\/\/www.ibm.com\/support\/pages\/blocked-ios-due-buffers-shortage\"\n\n   ###################\n   # Fibre Adapters  #\n   ###################\n\nadapters=$(lsdev -Ccadapter | grep fcs | awk '{ print $1 }')\n\n# Check No Command Resource Count (Update num_cmd_elems)\n\n   for adapter in ${adapters}\n   do\n      ncrc=$(fcstat -D ${adapter} | grep \"No Command Resource Count\" | awk '{ print $NF }')\n\n      # Reference to 90 days\n      ncrc_90=$(( ${ncrc}\/${ref}*90 ))\n\n      # No sure how many is bad, let's start with 6 digits\n\n      if &#091;&#091; ${#ncrc_90} -gt 6 ]]\n      then\n         url=true\n         printf \"&#091;\\033&#091;1;33mWARNING\\033&#091;m] \"\n         echo \"- \\033&#091;1;31mNo Command Resource Count for adapter ${adapter} is extremely high:\\033&#091;m ${ncrc} -&gt; ${ncrc_90} per 90 days (${#ncrc_90} digits)\"\n         echo \"Increase num_cmd_elems on ${adapter} to fix, but not higher than num_cmd_elems on the VIO physical adapter.\"\n      elif &#091;&#091; ${#ncrc_90} -gt 5 ]]\n      then\n         url=true\n         printf \"&#091;\\033&#091;1;33mWARNING\\033&#091;m] \"\n         echo \"- \\033&#091;1;31mNo Command Resource Count for adapter ${adapter} is very high:\\033&#091;m ${ncrc} -&gt; ${ncrc_90} per 90 days (${#ncrc_90} digits)\"\n         echo \"Increase num_cmd_elems on ${adapter} to fix, but not higher than num_cmd_elems on the VIO physical adapter.\"\n      fi\n   done\n\n   &#091;&#091; \"${url}\" == \"true\" ]] &amp;&amp; echo \"Info on fcs buffers: https:\/\/www.ibm.com\/support\/pages\/no-command-resource-count-and-high-water-mark-active-and-pending-commands\"\n   url=false\n\n   echo\n\n\n# Check High water mark of active\/pending commands (Update num_cmd_elems)\n\n   for adapter in ${adapters}\n   do\n      hwmac=$(fcstat -D ${adapter} | grep -p \"FC SCSI Adapter Driver Queue\" | grep \"High water mark  of active commands\" | awk '{ print $NF }')\n      hwmpc=$(fcstat -D ${adapter} | grep -p \"FC SCSI Adapter Driver Queue\" | grep \"High water mark of pending commands\" | awk '{ print $NF }')\n\n      # Reference to 90 days\n      hwmac_90=$(( ${hwmac}\/${ref}*90 ))\n      hwmpc_90=$(( ${hwmpc}\/${ref}*90 ))\n\n      hwm_summ=$(( ${hwmac} + ${hwmpc} ))\n\n      # We need the current num_cmd_elems setting\n\n      nce=$(lsattr -El fcs0 -a num_cmd_elems -F value)\n\n      if &#091;&#091; ${hwm_summ} -gt ${nce} ]]\n      then\n         url=true\n         printf \"&#091;\\033&#091;1;33mWARNING\\033&#091;m] \"\n         echo \"- \\033&#091;1;31mHigh water mark for active\/pending command for adapter ${adapter} is higher than num_cmd_elems:\\033&#091;m ${hwm_summ} vs. ${nce}\"\n         echo \"Increase num_cmd_elems on ${adapter} to fix, but not higher than num_cmd_elems on the VIO physical adapter.\"\n      fi\n   done\n\n   # Link to helpful web page.\n   echo\n   &#091;&#091; \"${url}\" == \"true\" ]] &amp;&amp; echo \"Info on fcs buffers: https:\/\/www.ibm.com\/support\/pages\/no-command-resource-count-and-high-water-mark-active-and-pending-commands\"\n   url=false\n   echo\n\n\n# Check No DMA Resource Count (Update max_xfer_size)\n\n   for adapter in ${adapters}\n   do\n      nodma=$(fcstat -D ${adapter} | grep \"No DMA Resource Count\" | awk '{ print $NF }')\n\n      # Reference to 90 days\n      nodma_90=$(( ${nodma}\/${ref}*90 ))\n\n      if &#091;&#091; ${#nodma_90} -gt 3 ]]\n      then\n         url=true\n         printf \"&#091;\\033&#091;1;33mWARNING\\033&#091;m] \"\n         echo \"- \\033&#091;1;31mNo DMA Resource Count for adapter ${adapter} is higher than 3 digits per 90 days:\\033&#091;m ${nodma_90}\"\n         echo \"Increase max_xfer_size on ${adapter} to fix, but not higher than max_xfer_size on the VIO physical adapter.\"\n      fi\n   done\n\n   # Link to helpful web page.\n   echo\n   &#091;&#091; \"${url}\" == \"true\" ]] &amp;&amp; echo \"Info on fcs buffers: https:\/\/www.ibm.com\/support\/pages\/no-command-resource-count-and-high-water-mark-active-and-pending-commands\"\n   url=false\n\necho\nexit 0\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Another little script I wrote to check capacity aspects of an AIX LPAR. I call it capacity checks as it is basing most of the checks on counters and averaging out over 90 days. Some of this is based on &hellip; <a href=\"https:\/\/www.aixperts.co.uk\/?p=506\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,147,146,148],"tags":[172,170,171],"class_list":["post-506","post","type-post","status-publish","format-standard","hentry","category-aix","category-ibm-power","category-performance-tuning","category-scripting","tag-memory","tag-srad","tag-vmstat"],"_links":{"self":[{"href":"https:\/\/www.aixperts.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/506","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.aixperts.co.uk\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.aixperts.co.uk\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.aixperts.co.uk\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.aixperts.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=506"}],"version-history":[{"count":4,"href":"https:\/\/www.aixperts.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/506\/revisions"}],"predecessor-version":[{"id":535,"href":"https:\/\/www.aixperts.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/506\/revisions\/535"}],"wp:attachment":[{"href":"https:\/\/www.aixperts.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=506"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.aixperts.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=506"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.aixperts.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=506"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}