Sunday, 23 May 2010

Visualize perforce change log with gource

Having seen amazing videos of gource I was keen to vizualize the changelog of the codebase I work on. The only trouble was that gource does not understand perforce directly. Fortunately, gource can read text files where each line is a change to one file in the format "timestamp|author|filename|action|colour".

Perforce logs produced by `p4 describe -s` command have the following format:
Change 106 by max@max_perforce on 2008/10/20 16:17:31
mt makefiles

Affected files ...

... //depot/infra/main/src/infra/Makefile#3 edit
... //depot/infra/main/src/infra/Makefile.mt#3 delete

Which should be converted for gource to:
1224515851|max|M|//depot/infra/main/src/infra/Makefile|
1224515851|max|D|//depot/infra/main/src/infra/Makefile.mt|
It is a two-stage process to convert perforce logs to gource logs. The first stage is to extract perforce logs into a file, because it takes a while and you may like to make several passes over it. It can be done with a couple of lines of bash:
[max@truth etc]$ p4 changes -m 1 | awk '{ print $2 }'
17146
[max@truth etc]$ for ((i=1; i<17146; ++i)); do p4 describe -s $i >> p4.log; done
The second stage is to filter and convert perforce changelog into gsource log. Here is how to convert changes only in //depot/infra/main with the python script I wrote:
[max@truth etc]$ ./p4-gource.py -p //depot/infra/main -o main.gource p4.log
Now gource can be used to vizualize the changelog:
[max@truth etc]$ gource --highlight-all-users main.gource
Enjoy!

Download p4-gource.py.

Tuesday, 23 March 2010

0 vs NULL in C++

Just thought I'd drop a line.

Consider this snippet:
1 #include <stddef.h> // NULL macro
2 void foo(int);
3 void foo(void*);
4 int main()
5 {
6     foo(0);
7     foo(NULL);
8 }
Compiling it gives an error on line 7:
[max@truth test]$ g++ -Wall -o test test.cc
test.cc: In function ‘int main()’:
test.cc:7: error: call of overloaded ‘foo(NULL)’ is ambiguous
test.cc:2: note: candidates are: void foo(int)
test.cc:3: note:                 void foo(void*)
Why is the error? Is NULL different from 0 in C++? Let's check it out:
[max@truth test]$ g++ -E -dD -Wall test.cc | grep NULL
#undef NULL
#define NULL __null
#undef __need_NULL
So, NULL is defined as a gcc keyword __null. The difference between __null and integer constant 0 is that 0 converts to int more easily than to a pointer, hence foo(0) calls foo(int). Whereas __null is not an integer constant and converts to int and a pointer equally easy, so that the compiler can not figure the best overload of foo() from a set of viable functions foo(int) and void(void*).

Using NULL can be helpful in C++ with function overloading, when 0 can silently convert to int where conversion to a pointer is expected.

Thursday, 3 December 2009

Profiling function calls with Systemtap on Fedora 12

Linux now has Systemtap, a system-wide tool to observe and investigate the inner workings of software stacks. Being similar to DTrace, Systemtap can dynamically instrument the Linux kernel and applications.

Having read an excellent introduction Linux introspection and SystemTap, I tried using it to dynamically instrument an application to profile function call times on Fedora 12. Dynamic instrumentation requires debuginfo to be available for the application being instrumented.

The application that was dynamically instrumented is /bin/ls. It was invoked as `ls -Rl ~/src > /dev/null`.

Here is the output of the profile script:

[max@truth ~]$ uname -a
Linux truth 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

[max@truth ~]$ stap -V
SystemTap translator/driver (version 1.0/0.143 non-git sources)
Copyright (C) 2005-2009 Red Hat, Inc. and others
This is free software; see the source for copying conditions.

[max@truth ~]$ sudo stap --vp 01 ~/src/systemtap-scripts/profile.stp /bin/ls
Pass 2: analyzed script: 421 probe(s), 4 function(s), 3 embed(s), 3 global(s) in 20usr/0sys/19real ms.
Collecting data for /bin/ls
process 10505 started
process 10505 terminated
978009 calls 11411514046nsec gnu_mbswidth
978009 calls 5095937872nsec mbsnwidth
553455 calls 6516854084nsec xstrcoll_name
553455 calls 2973420055nsec xstrcoll
432234 calls 2443338583nsec human_readable
279432 calls 5578451810nsec format_user_or_group_width
279431 calls 1454525884nsec getuser
279431 calls 1476571224nsec umaxtostr
279431 calls 1462740182nsec getgroup
279430 calls 5592908987nsec format_user_or_group
205189 calls 1096534601nsec xmalloc
165892 calls 2214172043nsec xmemdup
165890 calls 3276158518nsec xstrdup
152831 calls 3470895161nsec quote_name
152831 calls 2164873563nsec quotearg_buffer
152831 calls 857089762nsec quotearg_buffer_restyled
139744 calls 4347714683nsec print_name_with_quoting
139716 calls 25976606795nsec gobble_file
139716 calls 4942869834nsec format_user_width
139716 calls 2251024688nsec file_has_acl
139715 calls 727046826nsec strmode
139715 calls 3132965701nsec align_nstrftime
139715 calls 27707055961nsec print_long_format
139715 calls 1972675668nsec nstrftime
139715 calls 4871097460nsec format_group
139715 calls 4933490513nsec format_user
139715 calls 1627706138nsec filemodestring
139715 calls 818129922nsec strftime_case_
131267 calls 10444585148nsec mpsort_with_tmp
26174 calls 144849501nsec free_pending_ent
26174 calls 500319229nsec hash_find_entry
26174 calls 939925939nsec queue_directory
26174 calls 139966373nsec dev_ino_hash
26172 calls 140919473nsec last_component
16519 calls 87194194nsec dev_ino_compare
13088 calls 9420945286nsec sort_files
13088 calls 1642026249nsec extract_dirs_from_files
13088 calls 9303928062nsec mpsort
13088 calls 82016194nsec clear_files
13087 calls 67338468933nsec print_dir
13087 calls 411821913nsec hash_delete
13087 calls 308501198nsec hash_insert
13087 calls 68924698nsec dev_ino_free
13086 calls 287384316nsec mfile_name_concat
13086 calls 68382189nsec base_len
13086 calls 396034744nsec file_name_concat
13056 calls 28224393588nsec print_current_files
29 calls 323169nsec areadlink_with_size
4 calls 156509nsec xrealloc
2 calls 27356nsec close_stream
2 calls 50274nsec clone_quoting_options
1 calls 41537nsec close_stdout
1 calls 12722nsec xstrtoul
1 calls 6598nsec set_char_quoting
1 calls 12513nsec set_program_name
1 calls 7107nsec check_tuning
1 calls 7215nsec hard_locale
1 calls 7616nsec hash_free
1 calls 9458nsec hash_get_n_entries
1 calls 32489nsec hash_initialize
1 calls 6820nsec get_quoting_style
1 calls 7110nsec gettime
1 calls 9946nsec argmatch
1 calls 6972nsec compute_bucket_size
1 calls 10856nsec human_options
1 calls 20279nsec __xargmatch_internal
It shows the number of calls and the total time spent in a function. The total time function is inclusive, that is, it includes the time a function spent calling other functions.

Here is the script:

[max@truth ~]$ cat ~/src/systemtap-scripts/profile.stp
global gpid = -1
global times, entry

probe begin {
printf("Collecting data for %s\n", @1)
}

probe process(@1).begin {
gpid = pid()
printf("process %u started\n", gpid)
}

probe process(@1).end {
if(gpid != pid())
next

printf("process %u terminated\n", gpid)

foreach(fn in times-)
printf("%6u calls %12unsec %s\n", @count(times[fn]), @sum(times[fn]), fn)
exit()
}

probe process(@1).function("*").call {
now = gettimeofday_ns()
entry[probefunc()] = now
}

probe process(@1).function("*").return {
now = gettimeofday_ns()
time = now - entry[probefunc()]
times[probefunc()] <<< time
}

It looks quite simple.

Systemtap appears to be quite a useful tool, will try to use it more.

Friday, 20 November 2009

Android and iPhone ping times

Did a quick experiment just for fun.

Compared ping times in a local 802.11g wireless network from a wired desktop (running Fedora 12 (Linux kerner 2.6.31) with Intel E8400 and Marvell 88E8056 Gigabit Ethernet adapter) connected through 100Mbit/s ethernet to a router to the following wireless devices:

  • iPhone 1st Gen, version 2.2 (5G77) (a freedom-friendly build).

  • Android Dep Phone 1 (HTC Dream), firmware 1.6 (Linux kernel 2.6.29).

  • An Intel Centrino notebook running Fedora 11 (Linux kernel 2.6.27) with Pentuim M 1.86GHz running at 798MHz and Intel PRO/Wireless 2200BG Network adapter wireless network adapter.

  • THOMSON ST585 Speedtouch wireless router and ADSL modem, software release 6.2.29.2.

The command I ran was `sudo ping -f -c 1000 <host>`. This command sends 1000 ICMP echo requests to the host with no delay beetween requests (-f stands for flood). ICMP requests are normally handled by the kernel without a roundtrip to a user-land process (might not be true for Symbian OS, which is a joke anyway):

The results follow sorted by the total ping time in the worst-to-best order:

--- iphone ping statistics ---
1000 packets transmitted, 948 received, 5% packet loss, time 3453ms
rtt min/avg/max/mdev = 2.106/2.813/81.745/3.936 ms, pipe 6, ipg/ewma 3.457/2.497 ms

--- android ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 2480ms
rtt min/avg/max/mdev = 2.117/2.433/16.472/0.779 ms, ipg/ewma 2.483/2.395 ms

--- centrino-notebook ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 1351ms
rtt min/avg/max/mdev = 1.138/1.335/9.912/0.581 ms, ipg/ewma 1.352/1.208 ms

--- wireless-router ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 664ms
rtt min/avg/max/mdev = 0.572/0.633/4.329/0.128 ms, ipg/ewma 0.664/0.640 ms

What is interesting about iPhone is that its ping packet loss was normally around ~11% in this test, the 5% above was the least packet loss rate I observed. Another interesting point is that when iPhone screen is off it does not respond to ping at all. It does respond when the screen is on but still locked.

The results are probably of little use, just killed some time for fun formatting this blog in emacs html mode.

Saturday, 14 November 2009

Using Linux ps utility to display the command line of a process

Sometimes it is useful to see the full command line of a process when using Linux ps utility. Linux ps utility obeys COLUMNS environment variable or --columns command line option to limit the line size of the output. The default values are normally too small and cause ps to truncate the command line:
[max@truth ~]$ ps -u $USER -fH
UID PID PPID C STIME TTY TIME CMD
max 7421 7385 0 17:04 ? 00:00:00 gnome-session
max 7527 7421 0 17:04 ? 00:00:02 metacity
max 7539 7421 0 17:04 ? 00:00:00 gnome-panel
max 7540 7421 0 17:04 ? 00:00:00 nautilus
max 7541 7421 0 17:04 ? 00:00:00 /usr/libexec/gdu-notification-
max 7543 7421 0 17:04 ? 00:00:00 /usr/bin/seapplet
max 7544 7421 0 17:04 ? 00:00:00 gnome-power-manager
max 7545 7421 0 17:04 ? 00:00:00 gpk-update-icon
max 7550 7421 0 17:04 ? 00:00:00 bluetooth-applet
max 7551 7421 0 17:04 ? 00:00:00 gnome-volume-control-applet
max 7552 7421 0 17:04 ? 00:00:00 nm-applet --sm-disable
max 7559 7421 0 17:04 ? 00:00:00 kerneloops-applet
max 7561 7421 0 17:04 ? 00:00:00 python /usr/share/system-confi
max 8127 1 0 17:25 ? 00:00:01 gcalctool
max 7942 1 0 17:12 ? 00:00:00 /bin/sh /usr/lib64/firefox-3.5.4
max 7954 7942 2 17:12 ? 00:01:34 /usr/lib64/firefox-3.5.4/firef
max 7774 1 0 17:06 ? 00:00:09 emacs
max 7821 7774 0 17:06 pts/0 00:00:00 /bin/bash --noediting -i
max 8600 7821 0 18:15 pts/0 00:00:00 ps -u max -fH
max 7740 1 0 17:05 ? 00:00:00 gnome-screensaver
max 7730 1 2 17:05 ? 00:01:30 rhythmbox
max 7728 1 0 17:05 ? 00:00:00 /usr/libexec/gvfsd-burn --spawne
max 7719 1 0 17:05 ? 00:00:00 /usr/libexec/notification-area-a
max 7717 1 0 17:05 ? 00:00:02 /usr/libexec/clock-applet --oaf-
max 7715 1 0 17:05 ? 00:00:08 /usr/libexec/multiload-applet-2
max 7711 1 0 17:05 ? 00:00:00 /usr/libexec/cpufreq-applet --oa
max 7708 1 0 17:05 ? 00:00:00 /usr/libexec/wnck-applet --oaf-a
max 7698 1 0 17:04 ? 00:00:00 /usr/libexec/gvfs-gphoto2-volume
max 7693 1 0 17:04 ? 00:00:00 /usr/libexec/gvfs-gdu-volume-mon
max 7691 1 0 17:04 ? 00:00:00 /usr/libexec/bonobo-activation-s
max 7687 1 0 17:04 ? 00:00:00 /usr/libexec/gvfsd-trash --spawn
max 7685 1 0 17:04 ? 00:00:00 /usr/libexec/gconf-im-settings-d
max 7572 1 0 17:04 ? 00:00:00 /usr/libexec/im-settings-daemon
max 7569 1 0 17:04 ? 00:00:00 /usr/libexec/notification-daemon
max 7535 1 0 17:04 ? 00:00:00 /usr/libexec//gvfs-fuse-daemon /
max 7529 1 0 17:04 ? 00:00:00 /usr/libexec/gvfsd
max 7519 1 0 17:04 ? 00:00:00 /usr/libexec/gnome-settings-daem
max 7516 1 0 17:04 ? 00:00:00 /usr/libexec/gconfd-2
max 7433 1 0 17:04 ? 00:00:00 dbus-launch --sh-syntax --exit-w
max 7432 1 0 17:04 ? 00:00:00 /bin/dbus-daemon --fork --print-
max 7405 1 0 17:04 ? 00:00:00 /usr/bin/gnome-keyring-daemon --
max 7320 1 1 17:04 ? 00:01:07 /usr/bin/pulseaudio --start --lo
max 7323 7320 0 17:04 ? 00:00:00 /usr/libexec/pulse/gconf-helpe
The maximum command line length of a process can be read from ARG_MAX sysconf variable
[max@truth ~]$ getconf ARG_MAX
2621440
On my system it is 2.5Mb. The first idea that comes to mind to make ps show the full command line is to pass ARG_MAX (+ some constant for other ps output fields) as the line limit to ps.
[max@truth ~]$ ps -u $USER -fH --columns=`getconf ARG_MAX`
Fix bigness error.
UID PID PPID C STIME TTY TIME CMD
Fix bigness error.
max 7421 7385 0 17:04 ? 00:00:00 gnome-session
Fix bigness error.
max 7527 7421 0 17:04 ? 00:00:02 metacity
Fix bigness error.
...
However, ps does not seem to like the columns value. A quick binary search for the maximum of --columns value reveals:
[max@truth ~]$ ps -u $USER -fH --columns=131073
Fix bigness error.
UID PID PPID C STIME TTY TIME CMD
Fix bigness error.
max 7421 7385 0 17:04 ? 00:00:00 gnome-session
Fix bigness error.
max 7527 7421 0 17:04 ? 00:00:02 metacity
Fix bigness error.
...

[max@truth ~]$ ps -u $USER -fH --columns=131072
UID PID PPID C STIME TTY TIME CMD
max 7421 7385 0 17:04 ? 00:00:00 gnome-session
max 7527 7421 0 17:04 ? 00:00:02 metacity
max 7539 7421 0 17:04 ? 00:00:00 gnome-panel
max 7540 7421 0 17:04 ? 00:00:00 nautilus
max 7541 7421 0 17:04 ? 00:00:00 /usr/libexec/gdu-notification-daemon
max 7543 7421 0 17:04 ? 00:00:00 /usr/bin/seapplet
max 7544 7421 0 17:04 ? 00:00:00 gnome-power-manager
max 7545 7421 0 17:04 ? 00:00:00 gpk-update-icon
max 7550 7421 0 17:04 ? 00:00:00 bluetooth-applet
max 7551 7421 0 17:04 ? 00:00:00 gnome-volume-control-applet
max 7552 7421 0 17:04 ? 00:00:00 nm-applet --sm-disable
max 7559 7421 0 17:04 ? 00:00:00 kerneloops-applet
max 7561 7421 0 17:04 ? 00:00:00 python /usr/share/system-config-printer/applet.py
max 8127 1 0 17:25 ? 00:00:01 gcalctool
max 7942 1 0 17:12 ? 00:00:00 /bin/sh /usr/lib64/firefox-3.5.4/run-mozilla.sh /usr/lib64/firefox-3.5.4/firefox
max 7954 7942 2 17:12 ? 00:01:36 /usr/lib64/firefox-3.5.4/firefox
max 7774 1 0 17:06 ? 00:00:10 emacs
max 7821 7774 0 17:06 pts/0 00:00:00 /bin/bash --noediting -i
max 8629 7821 0 18:21 pts/0 00:00:00 ps -u max -fH --columns=131072
max 7740 1 0 17:05 ? 00:00:00 gnome-screensaver
max 7730 1 2 17:05 ? 00:01:37 rhythmbox
max 7728 1 0 17:05 ? 00:00:00 /usr/libexec/gvfsd-burn --spawner :1.7 /org/gtk/gvfs/exec_spaw/1
max 7719 1 0 17:05 ? 00:00:00 /usr/libexec/notification-area-applet --oaf-activate-iid=OAFIID:GNOME_NotificationAreaApplet_Factory --oaf-ior-fd=27
max 7717 1 0 17:05 ? 00:00:02 /usr/libexec/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=30
max 7715 1 0 17:05 ? 00:00:09 /usr/libexec/multiload-applet-2 --oaf-activate-iid=OAFIID:GNOME_MultiLoadApplet_Factory --oaf-ior-fd=21
max 7711 1 0 17:05 ? 00:00:00 /usr/libexec/cpufreq-applet --oaf-activate-iid=OAFIID:GNOME_CPUFreqApplet_Factory --oaf-ior-fd=24
max 7708 1 0 17:05 ? 00:00:00 /usr/libexec/wnck-applet --oaf-activate-iid=OAFIID:GNOME_Wncklet_Factory --oaf-ior-fd=18
max 7698 1 0 17:04 ? 00:00:00 /usr/libexec/gvfs-gphoto2-volume-monitor
max 7693 1 0 17:04 ? 00:00:00 /usr/libexec/gvfs-gdu-volume-monitor
max 7691 1 0 17:04 ? 00:00:00 /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=19
max 7687 1 0 17:04 ? 00:00:00 /usr/libexec/gvfsd-trash --spawner :1.7 /org/gtk/gvfs/exec_spaw/0
max 7685 1 0 17:04 ? 00:00:00 /usr/libexec/gconf-im-settings-daemon
max 7572 1 0 17:04 ? 00:00:00 /usr/libexec/im-settings-daemon
max 7569 1 0 17:04 ? 00:00:00 /usr/libexec/notification-daemon
max 7535 1 0 17:04 ? 00:00:00 /usr/libexec//gvfs-fuse-daemon /home/max/.gvfs
max 7529 1 0 17:04 ? 00:00:00 /usr/libexec/gvfsd
max 7519 1 0 17:04 ? 00:00:00 /usr/libexec/gnome-settings-daemon
max 7516 1 0 17:04 ? 00:00:00 /usr/libexec/gconfd-2
max 7433 1 0 17:04 ? 00:00:00 dbus-launch --sh-syntax --exit-with-session
max 7432 1 0 17:04 ? 00:00:00 /bin/dbus-daemon --fork --print-pid 9 --print-address 11 --session
max 7405 1 0 17:04 ? 00:00:00 /usr/bin/gnome-keyring-daemon --daemonize --login
max 7320 1 1 17:04 ? 00:01:13 /usr/bin/pulseaudio --start --log-target=syslog
max 7323 7320 0 17:04 ? 00:00:00 /usr/libexec/pulse/gconf-helper
It looks like there is a hardcoded maximum of the line length in Linux Fedora 11 ps utility equal to 128Kb (0x20000 in hex). It would be nice to remove that limit.

p.s.: The lines of this post get truncated when viewed as a web-page using Firefox or Chromium. To view untruncated use an RSS reader, such as Google Reader.

p.p.s.: And yes, I should have updated my Firefox to 3.5.5 which is available on Fedora 11 updates repository now.

Thursday, 12 November 2009

Benchmarking function call overhead in C++


I've caught some cold and been staying at home. Just for fun, decided to compare the price of calls to:
  • a regular function from the same non-position-independent-code (non-PIC) executable object
  • a regular function from a PIC shared library object
  • a virtual function from the same non-PIC executable object
  • a virtual function from a PIC shared library object
Here is the source code of the benchmark:
[max@truth test]$ cat a.h
#pragma once

#include <memory>

namespace max {

void funA(long*); // same non-pic executable, different .o
void funB(long*); // another pic shared library

struct X
{
virtual ~X() {}
virtual void fun(long*) = 0;
};

std::auto_ptr<X> createA(); // same non-pic executable, different .o
std::auto_ptr<X> createB(); // another pic shared library

}

[max@truth test]$ cat a.cc
#include "a.h"

void max::funA(long* x) { ++*x; }

namespace {

struct ImpOfX : max::X
{
void fun(long* x) { ++*x; }
};

}

std::auto_ptr<max::X> max::createA()
{
return std::auto_ptr<max::X>(new ImpOfX);
}

[max@truth test]$ cat b.cc
#include "a.h"

void max::funB(long* x) { ++*x; }

namespace {

struct ImpOfX2 : max::X
{
void fun(long* x) { ++*x; }
};

}

std::auto_ptr<max::X> max::createB()
{
return std::auto_ptr<max::X>(new ImpOfX2);
}

[max@truth test]$ cat main.cc
#include "a.h"
#include <stdio.h>
#include <time.h>

using namespace max;

namespace {

typedef unsigned long long nsec_t;

nsec_t now()
{
timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec * nsec_t(1000000000) + ts.tv_nsec;
}

long benchmark(long n, X& a, X& b, double times[4])
{
long sum[4] = {};

nsec_t start, stop;
start = now();
for(int i = n; i--;) {
funA(sum + 0);
}
stop = now();
times[0] = stop - start;
start = stop;

for(int i = n; i--;) {
funB(sum + 1);
}
stop = now();
times[1] = stop - start;
start = stop;

for(int i = n; i--;) {
a.fun(sum + 2);
}
stop = now();
times[2] = stop - start;
start = stop;

for(int i = n; i--;) {
b.fun(sum + 3);
}
stop = now();
times[3] = stop - start;
start = stop;

return sum[0] + sum[1] + sum[2] + sum[3];
}

}

int main()
{
std::auto_ptr<X> a(createA());
std::auto_ptr<X> b(createB());

double times[4];
// warm-up CPU caches and branch predictors
long sum = benchmark(100, *a, *b, times);
// benchmark
long const N = 1e9;
sum += benchmark(N, *a, *b, times);

printf(
"%lu calls to: \n"
" a regular function from the same executable: %.3lfnsec/call\n"
" a regular function from a shared library: %.3lfnsec/call\n"
" a virtual function from the same executable: %.3lfnsec/call\n"
" a virtual function from a shared library: %.3lfnsec/call\n"
, N
, times[0] / N
, times[1] / N
, times[2] / N
, times[3] / N
);
}
To get nanosecond-resolution times clock_gettime() function is used.

This is how it is compiled:
[max@truth test]$ g++ -m64 -Wall -Wextra -march=native -O3 -c main.cc
[max@truth test]$ g++ -m64 -Wall -Wextra -march=native -O3 -c a.cc
[max@truth test]$ g++ -m64 -Wall -Wextra -march=native -O3 -fpic -c b.cc
[max@truth test]$ g++ -m64 -Wl,-rpath,'$ORIGIN' -Wl,-z,now -shared -o libb.so b.o
[max@truth test]$ g++ -m64 -Wl,-rpath,'$ORIGIN' -Wl,-z,now -o test main.o a.o libb.so -lrt
When linking -Wl,-z,now linker option disables lazy symbol binding. Lazy symbol binding resolves calls to functions from PIC objects on the first call. Disabling lazy binding makes the dynamic linker resolve all symbols prior to executing any application pre, rather than on the first access.

The other linker option -Wl,-rpath,'$ORIGIN' tells the dynamic linker to search for the shared libraries first in the same folder where the executable is, so that there is no need to fiddle with LD_LIBRARY_PATH variable or /etc/ld.so.conf.

And here are the results of the benchmark:
[max@truth test]$ sudo chrt -f 99 /usr/bin/time -f "%E elapsed, %c context switches" ./test
1000000000 calls to:
a regular function from the same executable: 1.999nsec/call
a regular function from a shared library: 2.665nsec/call
a virtual function from the same executable: 2.332nsec/call
a virtual function from a shared library: 2.332nsec/call
0:09.32 elapsed, 0 context switches
The test is invoked as a FIFO realtime class process with priority 99 (highest on my system). FIFO realtime class processes run until they voluntarily block themselves, time quantums don't apply.

In the above invocation GNU /usr/bin/time utility is used to report the number of context switches during the run to make sure that the process has executed uninterrupted. This is to eliminate scheduling noise from the timings completely.

The test was invoked on Intel E8400 CPU running at full 3GHz speed.

Interpretation of the results:
  • Calls to regular functions in non-PIC object files are the cheapest.
  • Calls to virtual member functions are more expensive. An interesting fact here is that calls to virtual functions always get dispatched through the virtual table. This means that the procedure linking tables (PLT) used for regular functions in PIC object files is bypassed entirely. There is no difference whether a virtual function resides in PIC or non-PIC object, they get called the same way through the virtual table.
  • Calls to regular functions in PIC objects are most expensive. This is because such calls in fact call code in the PLT which (when the function has been resolved) jumps to the actual function.
Hm..., interesting!

Sunday, 26 July 2009

How to build an anonymous peer-to-peer network

In BitTorrent the ip address of the host sharing data with you is available through the TCP protocol which is used for the data transfer. This may reveal the identity of the owner of the host. If privacy is desired there must be a way to make the source ip address unavailable.

Here is a design sketch for a data transfer protocol where the ip address of the data sender is not revealed.

The the frames of the TCP protocol, which is used for transferring data in BitTorrent, carry the ip addresses of the source and the destination of the data. The source ip address is needed for the control part of the protocol to send back acknowledgements. If this back path is eliminated it should be possible to not have to provide the ip address of the source.

UDP protocol, on the other hand, allows to send datagrams with an arbitrary source ip address. Thus, the sender of the data can hide its true ip address by altering the source ip address of the datagram.

The control part of such data transfer protocol should go through Tor network. Tor network provides anonymity but has very limited bandwidth and thus can not be used for the actual data transfer. This is where UDP can be used.

First, the host joins a tracker through Tor network.

The host then finds seeders through the tracker. Using Tor, the host requests seeders to send data to its real ip address. The seeders receive the request and send data using UDP. Which allows to hide the true ip address of the source.

This may work quite well I think. Comments are more then welcome.