Wednesday, October 28, 2009

Asus M51S, Fedora 11 and CPU temperature control

After I've installed Fedora 11 on my Asus notebook I've faced with issue I haven't seen before. This issue was CPU core overheating. So, if I'm running some very resource-intensive, like dozens of Java processes (compiling, server-side computing, database communication and so on) or some modern game, like "Open Arena" or " Yo Frankie!" - then after 20-30 minutes of work my computer is shutting down itself when one of two cores reaches maximal temperature of 90°C. It's very healthful for CPU but very annoying for me.

I'm sure problem isn't inside of Fedora but it depends on CPU architecture. Fedora 10 on my computer was i386 but Fedora 11 is x86_64. I didn't find any clean explanation of this behaviour in internet, but I think it was because applications that have x86_64 instructions use CPU with "full throttle", so CPU is heating very fast and inner fans have no time to cool it down.
Thus, the logical conclusion - I should find the meaning to cool CPU down. The only way I have for my Asus M51S notebook is dynamically decrease CPU frequency - it will give to fans time to cool CPU.

OK. First of all I started to search the possible solutions on inet. I found nothing useful. But I found the tool cpuspeed that does precisely I want. This tool sits in the package with the same name. This package among the other things contains init script /etc/rc.d/init.d/cpuspeed. "This is it!" I thought. But it wasn't.
It's very strange, but this script doesn't use temperature management functionality of cpuspeed tool!

After couple of tests, I was sure this tool works fine, but just together with started /etc/rc.d/init.d/cpuspeed. I didn't sort it out completely what this script does (it has very complicated logic inside), but it turns on CPU frequency governor - it was enough for my goal.
So, I decided it was better and simpler to write separate init script then change the existing one. I called this new script /etc/rc.d/init.d/cputempd. Its content is below:
#!/bin/bash

# the following is the LSB init header see
# http://www.linux-foundation.org/spec//booksets/LSB-Core-generic/LSB-Core-generic.html#INITSCRCOMCONV
#
### BEGIN INIT INFO
# Provides: cputempd
# Should-Start:
# Default-Start: 1 2 3 4 5
# Short-Description: processor temperature monitor
# Description: This program monitors the system's idle percentage and reduces or raises the
#              CPU cores' clock speeds accordingly to minimize power usage when idle and
#              maximize performance when needed. By default the program counts time used by
#              nice()d programs and time used waiting for IO as idle time.
#              The program may also optionally be configured to reduce the CPU cores' clock
#              speeds if the temperature gets too high or minimize their speeds if the
#              computer's AC adapter is disconnected.
#              By default this program will manage every CPU core found in the system.
### END INIT INFO

# the following is the chkconfig init header
#
# processname: cputempd
# description: This program monitors the system's idle percentage and reduces or raises the
#              CPU cores' clock speeds accordingly to minimize power usage when idle and
#              maximize performance when needed. By default the program counts time used by
#              nice()d programs and time used waiting for IO as idle time.
#              The program may also optionally be configured to reduce the CPU cores' clock
#              speeds if the temperature gets too high or minimize their speeds if the
#              computer's AC adapter is disconnected.
#              By default this program will manage every CPU core found in the system.
# chkconfig: 12345 06 99

. /etc/init.d/functions

TEMPERATURE_FILE="/proc/acpi/thermal_zone/THRM/temperature"
TEMPERATURE_MAX=80
AUX_OPTS=
# pull in sysconfig settings
[ -f /etc/sysconfig/cputempd ] && . /etc/sysconfig/cputempd

RETVAL=0
prog="CPU Temperature Monitor"
execfile="/usr/sbin/cpuspeed"
lockfile="/var/lock/subsys/cputempd"

start() {
        echo -n $"Starting $prog: "
        daemon $execfile -d -t "$TEMPERATURE_FILE" $TEMPERATURE_MAX $AUX_OPTS
        RETVAL=$?
        [ $RETVAL -eq 0 ] && touch $lockfile
        echo
}

stop() {
        echo -n $"Stopping $prog: "
        killproc $execfile
        RETVAL=$?
        [ $RETVAL -eq 0 ] && rm -f $lockfile
        echo 
}

#
#       See how we were called.
#
case "$1" in
  start)
        start
        ;;
  stop)
        stop
        ;;
  reload|restart)
        stop
        start
        RETVAL=$?
        ;;
  condrestart)
        if [ -f $lockfile ]; then
            stop
            start
        fi
        ;;
  status)
        status $execfile
        RETVAL=$?
        ;;
  *)
        echo $"Usage: $0 {condrestart|start|stop|restart|reload|status}"
        exit 1
esac

exit $RETVAL

So, after I wrote this script and initialized it and its "brother":
sudo chkconfig cpuspeed on
sudo chkconfig cputempd on
sudo service cpuspeed start
sudo service cputempd start
... I can work and play with no fear of CPU overheating.

No comments: