Archive for the ‘code’ Category

lazy function loading

December 8th, 2009

Even though bash is not my favorite programming language, I end up writing a bit of code in it. It's just super practical to have something in bash if you can. I mentioned in the past how it's a good idea to avoid unnecessary cpu/io while initializing the shell by doing deferred aliasing. That solves the alias problem, but I also have a bunch of bash code in terms of functions. So I was thinking the same principle could be applied again.

Let's use findpkgs as an example here. The function is defined in a separate file and the file is source'd. But this means that every time I start a new shell session the whole function has to be read and loaded into memory. That might not be convenient if there are a number of those. networktest, for instance, defines four function and is considerably longer.

So let's "compile to bytecode" again:

findpkgs ()
{
    [ -f ~/.myshell/findpkgs.sh ] && . ~/.myshell/findpkgs.sh;
    findpkgs $@
}

When the function is defined this way, the script actually hasn't been sourced yet (and that's precisely the idea), but it will be the minute we call the function. This, naturally, will rebind the name findpkgs, and then we call the function again, with the given arguments, but this time giving them to the actual function.

Okay, so that was easy. But what if you have a bunch of functions loaded like that? It's gonna be kinda messy copy-pasting that boilerplate code over and over. So let's not write that code, let's generate it:

lazyimport() {
# generate code for lazy import of function
	function="$1";shift;
	script="$1";shift;

	dec="$function() {
		[ -f ~/.myshell/$script.sh ] && . ~/.myshell/$script.sh
		$function \$@
	}"
	eval "$dec"
}

Don't worry, it's the same thing as before, just wrapped in quotes. And now we may import all the functions we want in the namespace by calling this import function with the name of the function we want to "byte compile" and the script where it is found:

## findpkgs.sh

lazyimport findpkgs findpkgs

## networktest.sh

lazyimport havenet networktest
lazyimport haveinet networktest
lazyimport wifi networktest
lazyimport wifiscan networktest

## servalive.sh

lazyimport servalive servalive

So let's imagine a hypothetical execution. It goes like this:

  1. Start a new bash shell.
  2. Source import.sh where all this code is.
  3. On each call to lazyimport a function declaration is generated, and eval'd. The function we want is now bound to its name in the shell.
  4. On the first call to the function, the generated code for the function is executed, which actually sources the script, which rebinds the name of the function to the actual code that belongs to it.
  5. The function is executed with arguments.
  6. On subsequent executions the function is already "compiled", ie. bound to its proper code.

So what happens, you may wonder, in cases like the above with networktest, where several mock functions are generated, all of which will source the same script? Well nothing, whichever of them is called first will source the script and overwrite all the bindings, remember? It only takes one call to whichever function and all of them become rebound to the real thing. So all is well. :)

peek: monitor files for changes

December 1st, 2009

It seems to me that we have pretty good tools for managing files that aren't changing. We have file managers that display all the pertinent details, they'll detect the file type, they'll even show a preview if the content is an image or a video.

But what about files that are changing? Files get transfered all the time, but network connections are not always reliable. Have you ever been in the middle of a transfer wondering if it just stopped dead, wondering if it's crawling along very slowly, too slow, almost, to notice? Or how about downloading something where the host doesn't transmit the size of the file, so you're just downloading not knowing how much there is left?

These things happen, not everyday, but from time to time they do. And it's annoying. A file manager doesn't really do a great job of showing you what's happening. Of course you can stare at the directory and reload the display to see if the file size is changing. (Or the time stamp? But that's not very convenient to check against the current time to measure how long it was since the last change.) Or maybe the file manager displays those updates dynamically? But it's still somewhat lacking.

Basically, what you want to know is:

  1. Is the file being written to right now?
  2. How long since the last modification?

And you want those on a second-by-second basis, ideally. Something like this perhaps?

peek

Here you have the files in this directory sorted by modification time (mtime). One file is actively being written to, you can see the last mtime was 0 seconds ago at the time of the last sampling. Sampling happens every second, so in that interval 133kb were written to the file and the mtime was updated. The other file has not been changed for the last 7 minutes.

The nice thing about this display is that whether you run the monitor while the file is being transfered or you start it after it's already finished, you see what is happening, and if nothing is, you see when the last action took place.

#!/usr/bin/env python
#
# Author: Martin Matusiak <numerodix@gmail.com>
# Licensed under the GNU Public License, version 3.
#
# <desc> Watch directory for changes to files being written </desc>

import os
import sys
import time


class Formatter(object):
    size_units = [' b', 'kb', 'mb', 'gb', 'tb', 'pb', 'eb', 'zb', 'yb']
    time_units = ['sec', 'min', 'hou', 'day', 'mon', 'yea']

    @classmethod
    def simplify_time(cls, tm):
        unit = 0
        if tm > 59:
            unit += 1
            tm = float(tm) / 60
            if tm > 59:
                unit += 1
                tm = float(tm) / 60
                if tm > 23:
                    unit += 1
                    tm = float(tm) / 24
                    if tm > 29:
                        unit += 1
                        tm = float(tm) / 30
                        if tm > 11:
                            unit += 1
                            tm = float(tm) / 12
        return int(round(tm)), cls.time_units[unit]

    @classmethod
    def simplify_filesize(cls, size):
        unit = 0
        while size > 1023:
            unit += 1
            size = float(size) / 1024
        return int(round(size)), cls.size_units[unit]

    @classmethod
    def mtime(cls, reftime, mtime):
        delta = int(reftime - mtime)
        tm, unit = cls.simplify_time(delta)
        delta_s = "%s%s" % (tm, unit)
        return delta_s

    @classmethod
    def filesize(cls, size):
        size, unit = cls.simplify_filesize(size)
        size_s = "%s%s" % (size, unit)
        return size_s

    @classmethod
    def filesizedelta(cls, size):
        size, unit = cls.simplify_filesize(size)
        sign = size > 0 and "+" or ""
        size_s = "%s%s%s" % (sign, size, unit)
        return size_s

    @classmethod
    def bold(cls, s):
        """Display in bold"""
        term = os.environ.get("TERM")
        if term and term != "dumb":
            return "\033[1m%s\033[0m" % s
        return s

class File(object):
    sample_limit = 60  # don't hold more than x samples

    def __init__(self, file):
        self.file = file
        self.mtimes = []

    def get_name(self):
        return self.file

    def get_last_mtime(self):
        tm, sz = self.mtimes[-1]
        return tm

    def get_last_size(self):
        tm, sz = self.mtimes[-1]
        return sz

    def get_last_delta(self):
        size_last = self.get_last_size()
        try:
            mtime_beforelast, size_beforelast = self.mtimes[-2]
            return size_last - size_beforelast
        except IndexError:
            return 0

    def prune_samples(self):
        """Remove samples older than x samples back"""
        if len(self.mtimes) % self.sample_limit == 0:
            self.mtimes = self.mtimes[-self.sample_limit:]

    def sample(self, mtime, size):
        """Sample file status"""
        # Don't keep too many samples
        self.prune_samples()
        # Update time and size
        self.mtimes.append((mtime, size))

class Directory(object):
    def __init__(self, path):
        self.path = path
        self.files = {}

    def prune_files(self):
        """Remove indexed files no longer on disk (by deletion/rename)"""
        for f in self.files.values():
            name = f.get_name()
            file = os.path.join(self.path, name)
            if not os.path.exists(file):
                del(self.files[name])

    def scan_files(self):
        # remove duds first
        self.prune_files()
        # find items, grab only files
        items = os.listdir(self.path)
        items = filter(lambda f: os.path.isfile(os.path.join(self.path, f)),
                       items)
        # stat files, building/updating index
        for f in items:
            st = os.stat(os.path.join(self.path, f))
            if not self.files.get(f):
                self.files[f] = File(f)
            self.files[f].sample(st.st_mtime, st.st_size)

    def display_line(self, name, time_now, tm, size, sizedelta):
        time_fmt = Formatter.mtime(time_now, tm)
        size_fmt = Formatter.filesize(size)
        sizedelta_fmt = Formatter.filesizedelta(sizedelta)
        line = "%6.6s   %5.5s   %6.6s   %s" % (time_fmt, size_fmt,
                                               sizedelta_fmt, name)
        if time_now - tm < 6:
            line = Formatter.bold(line)
        return line

    def sort_by_name(self, files):
        return sorted(self.files.values(), key=lambda x: x.get_name())

    def sort_by_mtime(self, files):
        return sorted(self.files.values(),
                      key=lambda x: (x.get_last_mtime(),x.get_name()))

    def display(self):
        time_now = time.time()
        files = self.sort_by_mtime(self.files.values())
        print("\nmtime>   size>   delta>   name>")
        for f in files:
            line = self.display_line(f.get_name(),
                                     time_now, f.get_last_mtime(),
                                     f.get_last_size(), f.get_last_delta())
            print(line)


def main(path):
    directory = Directory(path)
    while True:
        try:
            directory.scan_files()
            directory.display()
            time.sleep(1)
        except KeyboardInterrupt:
            print("\rUser terminated")
            return


if __name__ == '__main__':
    try:
        path = sys.argv[1]
    except IndexError:
        print("Usage:  %s /path" % sys.argv[0])
        sys.exit(1)
    main(path)

findpkgs: Find packages for application

October 14th, 2009

Every distribution has a package manager and a whole lot of work goes into maintaining packages and correctly resolving their dependencies. This is a descriptive kind of dependency tracking.

The other day I had the idea of using a more "evidence based" method. Given a linked binary, you can find out what libraries it uses with ldd. (This, however, will not account for any dynamic linking that happens during runtime.) More interestingly, perhaps, given a running process, you can figure out which files it using to run. There is lsof, and if not, /proc/pid/maps has that information too.

Such a list of files can then be fed to the package manager to find the packages which own them.

For instance, which package owns init (on an Ubuntu system)?

$ findpkgs 1
upstart

What's needed to run ls (on a Gentoo system)?

$ findpkgs ls
sys-apps/acl
sys-apps/attr
sys-apps/coreutils
sys-libs/glibc

What about a Python application like iotop?

$ findpkgs `pgrep iotop`
dev-lang/python
dev-libs/openssl
sys-libs/glibc
sys-libs/ncurses
sys-libs/zlib

The query-package-manager-for-owner-of-file tries to figure out which package manager is used on the system in this order:

  1. paludis
  2. qfile
  3. equery
  4. dpkg
  5. rpm

To be honest I'm not really sure how useful this is, I just put it together since I figured out it could be done. It *can* answer the question: which packages are required to run this application? (Or to be more precise: to achieve this specific runtime state of the application.) So if you write an app, send it to a friend and he can't make it run, you could use findpkgs to get a list of them he needs to install (provided he's on the same distro and all that).

# Author: Martin Matusiak <numerodix@gmail.com>
# Licensed under the GNU Public License, version 3
#
# <desc> Find packages by binary or process pid </desc>
#
# <usage>
# source this file in bash, then run `findpkgs`
# </usage>


function _findpkgfor() {
	local file="$1";shift;

	if which paludis &>/dev/null; then
		paludis -o "$file" 2>/dev/null | grep '::installed' \
			| sed "s/::installed//g" | tr -d ' '
	elif which qfile &>/dev/null; then
		qfile "$file" 2>/dev/null | awk '{print $1}'
	elif which equery &>/dev/null; then
		equery belongs "$file" 2>/dev/null | awk '{print $1}'
	elif which dpkg &>/dev/null; then
		dpkg -S "$file" 2>/dev/null | awk '{print $1}' | tr -d ':'
	elif which rpm &>/dev/null; then
		rpm -qf "$file" 2>/dev/null | grep -v "not owned"
	else
		echo "No known package manager found"
	fi
}

function findpkgs() {
	local arg="$1";shift;

	if [ ! "$arg" ]; then
		echo "Usage:  findpkgs [ pid | /path/to/binary ]"
		return
	fi

	local pid=
	local arg_new=
	local bin=
	if echo "$arg" | grep "^[0-9]*$" &>/dev/null; then
		pid="$arg"
	else
		arg_new=$(which "$arg" 2>/dev/null)
		[ "$arg_new" ] && arg="$arg_new"
		if ! echo "$arg" | grep '^/' &>/dev/null; then
			echo "Can't find absolute path (or not a binary) for: $arg" >&2
			return
		fi
		arg=$(readlink -f "$arg")
		if ! file "$arg" | grep 'ELF' &>/dev/null; then
			echo "Not a binary: $arg" >&2
			return
		fi
		bin="$arg"
	fi


	local fst=
	local fst_new=
	local files=
	if [ "$pid" ]; then
		fst=$(ps aux \
					| sed "s/^[^ ]* *//g" \
					| grep "^$pid " \
					| awk '{print $10}' \
					| tr -d ':')
		fst_new=$(which "$fst" 2>/dev/null)
		[ "$fst_new" ] && fst="$fst_new"
		if ! echo "$fst" | grep '^/' &>/dev/null; then
			echo "Can't find absolute path for: $fst" >&2
			unset fst
		fi

		if $(which lsof &>/dev/null); then
			files=$(lsof \
						| sed "s/^[^ ]* *//g" \
						| grep "^$pid " \
						| awk '{print $8}' \
						| grep '^/' \
						| sort \
						| uniq)
		else
			files=$(cat "/proc/$pid/maps" \
						| awk '{print $6}' \
						| grep '^/' \
						| sort \
						| uniq)
		fi

		files="$fst $files"
		for file in `echo $files`; do
			_findpkgfor "$file"
		done | sort | uniq

	elif [ "$bin" ]; then
		files=$(ldd "$bin" \
					| awk '{print $3}' \
					| grep '^/' \
					| sort \
					| uniq)
		files="$bin $files"
		for file in `echo $files`; do
			_findpkgfor "$file"
		done | sort | uniq
	fi
}

networktest: improved network detection

June 13th, 2009

As a follow up to the network perimeter test I have expanded the code a bit. It now shows also the interface names to help explain what's what, and it also tries to match the gateway to the ip addresses found. The strategy, however, has changed somewhat. At first the goal was to find all the networks and proceed from there. I decided this was not really the best approach, given that a misconfigured network connection could possibly contain, say, a gateway not on any network. It therefore seems more sensible to display the information read from route and ifconfig as is than try to infer too much from it.

Here for instance the loopback ip is on a network that is not known, but still a working ip nonetheless.

The probing strategy also includes nmap (if available) probes to have some redundancy in the process (say for instance outbound icmp is blocked by the firewall). And the code has been made more portable, so on platforms other than linux (where linux networking tools aren't present) there is a graceful degradation of features.

havenet1

Naturally, much networking happens over wireless these days, so there's also a wifi command that displays the status of all the wireless interfaces. This is again nothing more than is revealed by iwconfig, but in a considerably more human readable form I would argue.

wifi

Then there is wifiscan, which not surprisingly scans for access points. The output is a considerably more space efficient and usable counterpart to what iwlist prints.

wifiscan

One thing to keep in mind about these detection commands is that many of the system tools being used offer less (or none) information to unprivileged users, so running these commands as root may produce fuller output.

kill obsolete dotfiles

May 23rd, 2009

So my portable dotfiles are working out really well. There is only one fly in the ointment left. When a file has changed, it gets overwritten with a newer version, that's fine. But when a file has been renamed or removed, it will stick around in ~, creating the false impression that it's supposed to be there.

This is not a huge problem, but it does get to be tiresome when you're moving around directory hierarchies. I recently started using emacs more seriously and I expect a lot of stuff will eventually pile up in .emacs.d. Meanwhile, obsolete files and directories will clutter the hierarchy and possibly interfere with the system.

What can we do about it? The archive that holds the dotfiles is a sufficient record of what is actually in the dotfiles at any given moment. We can diff that with the files found on the system locally and pinpoint the ones that have been made obsolete.

For example, I initially had an ~/.emacs file, but then I moved it to ~/.emacs.d/init.el. So ~/.emacs is obsolete. But when I sync my dotfiles on a machine with an older version of my dotfiles, it will still have ~/.emacs around.

Not anymore. Now this happens:

dotfiles_sync

Files directly in ~ have to be listed explicitly, because we don't know anything about ~ as a whole. But files in known subdirs of ~, like ~/.emacs.d, are detected automatically.

killobsolete() {
	# files directly in ~ formerly used, now obsolete
	# subdirs of ~ checked automatically
	local suspected=".emacs"

	# init tempfiles we need to run diff on
	local found="/tmp/.myshell_found"
	local pulled="/tmp/.myshell_pulled"

	# detect files found locally
	for i in $(gunzip -c cfg.tar.gz | tar -tf -); do
		echo $(dirname "$i");
	done | grep -v '^.$' | sort | uniq | xargs find | sort | uniq | \
		grep -v ".myshell/.*_local*" > "$found"

	# detect suspected files
	for f in $(find $suspected 2>/dev/null); do
		echo "$f" >> "$found";
	done

	# sort found list
	cat "$found" | sort | uniq > "$found.x" ; mv "$found.x" "$found"

	# list files pulled from upstream
	for i in $(gunzip -c cfg.tar.gz | tar -tf -); do
		echo "$i";
		echo $(dirname "$i");
	done | grep -v '^.$' | sed "s|\/$||g" | sort | uniq > "$pulled"

	# list obsolete files
	local num=$(diff "$pulled" "$found" | grep '>' | cut -b3- | wc -l)
	if [ $num -gt 0 ]; then
		echo -e "${cyellow} ++ files found to be obsolete${creset}";
		diff "$pulled" "$found" | grep '>' | cut -b3-;

		# prompt for deletion
		echo -e "${cyellow} ++ kill obsolete? [yN]${creset}"
		read ans
		if [[ "$ans" == "y" ]]; then
			for i in $(diff "$pulled" "$found" | grep '>' | cut -b3-); do
				if [ -d "$i" ]; then
					rm -rf "$i"
				else
					rm -f "$i"
				fi
			done
		fi
	fi

	# dispose of tempfiles
	rm -f "$pulled" "$found"
}