tag:blogger.com,1999:blog-80848200234438938022024-02-08T10:29:10.004+00:00random thoughtsMaxim Yegorushkinhttp://www.blogger.com/profile/13335665761944189051noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-8084820023443893802.post-47939757017620855882010-05-23T21:38:00.008+01:002022-11-27T05:35:50.357+00:00Visualize perforce change log with gource<div>Having seen amazing videos of <a href="https://gource.io/">gource</a> I wanted to vizualize the changelog of the codebase I work on. The only trouble was that gource does not understand perforce change log format directly. Fortunately, gource can read text files where each line is a change to one file in the format "timestamp|author|filename|action|colour".<br />
<br />
Perforce logs produced by `p4 describe -s` command have the following format:<br />
</div><div><pre>Change 106 by max@max_perforce on 2008/10/20 16:17:31
mt makefiles
Affected files ...
... //depot/infra/main/src/infra/Makefile#3 edit
... //depot/infra/main/src/infra/Makefile.mt#3 delete
</pre></div><div>Which should be converted for gource to:<br />
</div><div><pre>1224515851|max|M|//depot/infra/main/src/infra/Makefile|
1224515851|max|D|//depot/infra/main/src/infra/Makefile.mt|
</pre></div><div>It is a two-stage process to convert perforce logs to gource logs. The first stage is to extract perforce logs into a file, because it takes a while and you may like to make several passes over it. It can be done with a couple of lines of bash:<br />
</div><div><pre>$ p4_head=$(p4 changes -m 1 | { read -a words && echo ${words[1]}; })
$ { for((i = 0; ++i <= p4_head;)); do p4 describe -s $i; done } > p4.log
</pre></div><div>The second stage is to filter and convert perforce changelog into gsource log. Here is how to convert changes only in //depot/infra/main with the python script I wrote:<br />
</div><div><pre>$ ./p4-gource.py -p //depot/infra/main -o main.gource p4.log
</pre></div><div>Now gource can be used to vizualize the changelog:<br />
</div><div><pre>$ gource --highlight-all-users main.gource
</pre></div><div>Enjoy!<br />
<br /><a href="https://github.com/max0x7ba/p4-gource" target="_blank">Source code</a></div>Maxim Yegorushkinhttp://www.blogger.com/profile/13335665761944189051noreply@blogger.com14tag:blogger.com,1999:blog-8084820023443893802.post-32709096053023977222010-03-23T21:36:00.000+00:002010-03-23T21:36:16.669+00:000 vs NULL in C++<div>Just thought I'd drop a line.<br />
<br />
Consider this snippet:<br />
</div><div><pre>1 #include <stddef.h> // NULL macro
2 void foo(int);
3 void foo(void*);
4 int main()
5 {
6 foo(0);
7 foo(NULL);
8 }
</pre></div><div>Compiling it gives an error on line 7:<br />
</div><div><pre>[max@truth test]$ g++ -Wall -o test test.cc
test.cc: In function ‘int main()’:
test.cc:7: error: call of overloaded ‘foo(NULL)’ is ambiguous
test.cc:2: note: candidates are: void foo(int)
test.cc:3: note: void foo(void*)
</pre></div><div>Why is the error? Is NULL different from 0 in C++? Let's check it out:<br />
</div><div><pre>[max@truth test]$ g++ -E -dD -Wall test.cc | grep NULL
#undef NULL
#define NULL __null
#undef __need_NULL
</pre></div><div>So, NULL is defined as a gcc keyword __null. The difference between __null and integer constant 0 is that 0 converts to int more easily than to a pointer, hence foo(0) calls foo(int). Whereas __null is not an integer constant and converts to int and a pointer equally easy, so that the compiler can not figure the best overload of foo() from a set of viable functions foo(int) and void(void*).<br />
<br />
Using NULL can be helpful in C++ with function overloading, when 0 can silently convert to int where conversion to a pointer is expected.<br />
</div>Maxim Yegorushkinhttp://www.blogger.com/profile/13335665761944189051noreply@blogger.com0tag:blogger.com,1999:blog-8084820023443893802.post-15113936290281327192009-12-03T21:52:00.000+00:002009-12-03T21:54:50.098+00:00Profiling function calls with Systemtap on Fedora 12<div>Linux now has Systemtap, a system-wide tool to observe and investigate the inner workings of software stacks. Being similar to DTrace, Systemtap can dynamically instrument the Linux kernel and applications.<br /><br />Having read an excellent introduction <a href="http://www.ibm.com/developerworks/linux/library/l-systemtap/index.html">Linux introspection and SystemTap</a>, I tried using it to dynamically instrument an application to profile function call times on Fedora 12. Dynamic instrumentation requires debuginfo to be available for the application being instrumented.<br /><br />The application that was dynamically instrumented is /bin/ls. It was invoked as `ls -Rl ~/src > /dev/null`.<br /><br />Here is the output of the profile script:</div><div><pre><br />[max@truth ~]$ uname -a<br />Linux truth 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux<br /><br />[max@truth ~]$ stap -V<br />SystemTap translator/driver (version 1.0/0.143 non-git sources)<br />Copyright (C) 2005-2009 Red Hat, Inc. and others<br />This is free software; see the source for copying conditions.<br /><br />[max@truth ~]$ sudo stap --vp 01 ~/src/systemtap-scripts/profile.stp /bin/ls<br />Pass 2: analyzed script: 421 probe(s), 4 function(s), 3 embed(s), 3 global(s) in 20usr/0sys/19real ms.<br />Collecting data for /bin/ls<br />process 10505 started<br />process 10505 terminated<br />978009 calls 11411514046nsec gnu_mbswidth<br />978009 calls 5095937872nsec mbsnwidth<br />553455 calls 6516854084nsec xstrcoll_name<br />553455 calls 2973420055nsec xstrcoll<br />432234 calls 2443338583nsec human_readable<br />279432 calls 5578451810nsec format_user_or_group_width<br />279431 calls 1454525884nsec getuser<br />279431 calls 1476571224nsec umaxtostr<br />279431 calls 1462740182nsec getgroup<br />279430 calls 5592908987nsec format_user_or_group<br />205189 calls 1096534601nsec xmalloc<br />165892 calls 2214172043nsec xmemdup<br />165890 calls 3276158518nsec xstrdup<br />152831 calls 3470895161nsec quote_name<br />152831 calls 2164873563nsec quotearg_buffer<br />152831 calls 857089762nsec quotearg_buffer_restyled<br />139744 calls 4347714683nsec print_name_with_quoting<br />139716 calls 25976606795nsec gobble_file<br />139716 calls 4942869834nsec format_user_width<br />139716 calls 2251024688nsec file_has_acl<br />139715 calls 727046826nsec strmode<br />139715 calls 3132965701nsec align_nstrftime<br />139715 calls 27707055961nsec print_long_format<br />139715 calls 1972675668nsec nstrftime<br />139715 calls 4871097460nsec format_group<br />139715 calls 4933490513nsec format_user<br />139715 calls 1627706138nsec filemodestring<br />139715 calls 818129922nsec strftime_case_<br />131267 calls 10444585148nsec mpsort_with_tmp<br /> 26174 calls 144849501nsec free_pending_ent<br /> 26174 calls 500319229nsec hash_find_entry<br /> 26174 calls 939925939nsec queue_directory<br /> 26174 calls 139966373nsec dev_ino_hash<br /> 26172 calls 140919473nsec last_component<br /> 16519 calls 87194194nsec dev_ino_compare<br /> 13088 calls 9420945286nsec sort_files<br /> 13088 calls 1642026249nsec extract_dirs_from_files<br /> 13088 calls 9303928062nsec mpsort<br /> 13088 calls 82016194nsec clear_files<br /> 13087 calls 67338468933nsec print_dir<br /> 13087 calls 411821913nsec hash_delete<br /> 13087 calls 308501198nsec hash_insert<br /> 13087 calls 68924698nsec dev_ino_free<br /> 13086 calls 287384316nsec mfile_name_concat<br /> 13086 calls 68382189nsec base_len<br /> 13086 calls 396034744nsec file_name_concat<br /> 13056 calls 28224393588nsec print_current_files<br /> 29 calls 323169nsec areadlink_with_size<br /> 4 calls 156509nsec xrealloc<br /> 2 calls 27356nsec close_stream<br /> 2 calls 50274nsec clone_quoting_options<br /> 1 calls 41537nsec close_stdout<br /> 1 calls 12722nsec xstrtoul<br /> 1 calls 6598nsec set_char_quoting<br /> 1 calls 12513nsec set_program_name<br /> 1 calls 7107nsec check_tuning<br /> 1 calls 7215nsec hard_locale<br /> 1 calls 7616nsec hash_free<br /> 1 calls 9458nsec hash_get_n_entries<br /> 1 calls 32489nsec hash_initialize<br /> 1 calls 6820nsec get_quoting_style<br /> 1 calls 7110nsec gettime<br /> 1 calls 9946nsec argmatch<br /> 1 calls 6972nsec compute_bucket_size<br /> 1 calls 10856nsec human_options<br /> 1 calls 20279nsec __xargmatch_internal<br /></pre></div><div>It shows the number of calls and the total time spent in a function. The total time function is inclusive, that is, it includes the time a function spent calling other functions.<br /><br />Here is the script:</div><div><pre><br />[max@truth ~]$ cat ~/src/systemtap-scripts/profile.stp<br />global gpid = -1<br />global times, entry<br /><br />probe begin {<br /> printf("Collecting data for %s\n", @1)<br />}<br /><br />probe process(@1).begin {<br /> gpid = pid()<br /> printf("process %u started\n", gpid)<br />}<br /><br />probe process(@1).end {<br /> if(gpid != pid())<br /> next<br /><br /> printf("process %u terminated\n", gpid)<br /><br /> foreach(fn in times-)<br /> printf("%6u calls %12unsec %s\n", @count(times[fn]), @sum(times[fn]), fn)<br /> exit()<br />}<br /><br />probe process(@1).function("*").call {<br /> now = gettimeofday_ns()<br /> entry[probefunc()] = now<br />}<br /><br />probe process(@1).function("*").return {<br /> now = gettimeofday_ns()<br /> time = now - entry[probefunc()]<br /> times[probefunc()] <<< time<br />}<br /></pre></div><div><br />It looks quite simple.<br /><br />Systemtap appears to be quite a useful tool, will try to use it more.<br /></div>Maxim Yegorushkinhttp://www.blogger.com/profile/13335665761944189051noreply@blogger.com4tag:blogger.com,1999:blog-8084820023443893802.post-78522308265161221792009-11-20T21:35:00.000+00:002009-11-20T21:47:02.027+00:00Android and iPhone ping times<div>Did a quick experiment just for fun.<br /><br />Compared ping times in a local 802.11g wireless network from a wired desktop (running Fedora 12 (Linux kerner 2.6.31) with Intel E8400 and Marvell 88E8056 Gigabit Ethernet adapter) connected through 100Mbit/s ethernet to a router to the following wireless devices:<br /><ul><br /><li>iPhone 1st Gen, version 2.2 (5G77) (a freedom-friendly build).</li><br /><li>Android Dep Phone 1 (HTC Dream), firmware 1.6 (Linux kernel 2.6.29).</li><br /><li>An Intel Centrino notebook running Fedora 11 (Linux kernel 2.6.27) with Pentuim M 1.86GHz running at 798MHz and Intel PRO/Wireless 2200BG Network adapter wireless network adapter.</li><br /><li>THOMSON ST585 Speedtouch wireless router and ADSL modem, software release 6.2.29.2.</li></ul><br />The command I ran was `sudo ping -f -c 1000 <host>`. This command sends 1000 ICMP echo requests to the host with no delay beetween requests (-f stands for flood). ICMP requests are normally handled by the kernel without a roundtrip to a user-land process (might not be true for Symbian OS, which is a joke anyway):<br /><br />The results follow sorted by the total ping time in the worst-to-best order:</div><div><pre><br />--- iphone ping statistics ---<br />1000 packets transmitted, 948 received, 5% packet loss, time 3453ms<br />rtt min/avg/max/mdev = 2.106/2.813/81.745/3.936 ms, pipe 6, ipg/ewma 3.457/2.497 ms<br /><br />--- android ping statistics ---<br />1000 packets transmitted, 1000 received, 0% packet loss, time 2480ms<br />rtt min/avg/max/mdev = 2.117/2.433/16.472/0.779 ms, ipg/ewma 2.483/2.395 ms<br /><br />--- centrino-notebook ping statistics ---<br />1000 packets transmitted, 1000 received, 0% packet loss, time 1351ms<br />rtt min/avg/max/mdev = 1.138/1.335/9.912/0.581 ms, ipg/ewma 1.352/1.208 ms<br /><br />--- wireless-router ping statistics ---<br />1000 packets transmitted, 1000 received, 0% packet loss, time 664ms<br />rtt min/avg/max/mdev = 0.572/0.633/4.329/0.128 ms, ipg/ewma 0.664/0.640 ms</pre></div><div><br />What is interesting about iPhone is that its ping packet loss was normally around ~11% in this test, the 5% above was the least packet loss rate I observed. Another interesting point is that when iPhone screen is off it does not respond to ping at all. It does respond when the screen is on but still locked.<br /><br />The results are probably of little use, just killed some time for fun formatting this blog in emacs html mode.</div>Maxim Yegorushkinhttp://www.blogger.com/profile/13335665761944189051noreply@blogger.com0tag:blogger.com,1999:blog-8084820023443893802.post-52057609889242348862009-11-14T20:18:00.000+00:002009-11-14T20:20:07.417+00:00Using Linux ps utility to display the command line of a process<div>Sometimes it is useful to see the full command line of a process when using Linux ps utility. Linux ps utility obeys COLUMNS environment variable or --columns command line option to limit the line size of the output. The default values are normally too small and cause ps to truncate the command line:</div><div><pre>[max@truth ~]$ ps -u $USER -fH<br />UID PID PPID C STIME TTY TIME CMD<br />max 7421 7385 0 17:04 ? 00:00:00 gnome-session<br />max 7527 7421 0 17:04 ? 00:00:02 metacity<br />max 7539 7421 0 17:04 ? 00:00:00 gnome-panel<br />max 7540 7421 0 17:04 ? 00:00:00 nautilus<br />max 7541 7421 0 17:04 ? 00:00:00 /usr/libexec/gdu-notification-<br />max 7543 7421 0 17:04 ? 00:00:00 /usr/bin/seapplet<br />max 7544 7421 0 17:04 ? 00:00:00 gnome-power-manager<br />max 7545 7421 0 17:04 ? 00:00:00 gpk-update-icon<br />max 7550 7421 0 17:04 ? 00:00:00 bluetooth-applet<br />max 7551 7421 0 17:04 ? 00:00:00 gnome-volume-control-applet<br />max 7552 7421 0 17:04 ? 00:00:00 nm-applet --sm-disable<br />max 7559 7421 0 17:04 ? 00:00:00 kerneloops-applet<br />max 7561 7421 0 17:04 ? 00:00:00 python /usr/share/system-confi<br />max 8127 1 0 17:25 ? 00:00:01 gcalctool<br />max 7942 1 0 17:12 ? 00:00:00 /bin/sh /usr/lib64/firefox-3.5.4<br />max 7954 7942 2 17:12 ? 00:01:34 /usr/lib64/firefox-3.5.4/firef<br />max 7774 1 0 17:06 ? 00:00:09 emacs<br />max 7821 7774 0 17:06 pts/0 00:00:00 /bin/bash --noediting -i<br />max 8600 7821 0 18:15 pts/0 00:00:00 ps -u max -fH<br />max 7740 1 0 17:05 ? 00:00:00 gnome-screensaver<br />max 7730 1 2 17:05 ? 00:01:30 rhythmbox<br />max 7728 1 0 17:05 ? 00:00:00 /usr/libexec/gvfsd-burn --spawne<br />max 7719 1 0 17:05 ? 00:00:00 /usr/libexec/notification-area-a<br />max 7717 1 0 17:05 ? 00:00:02 /usr/libexec/clock-applet --oaf-<br />max 7715 1 0 17:05 ? 00:00:08 /usr/libexec/multiload-applet-2<br />max 7711 1 0 17:05 ? 00:00:00 /usr/libexec/cpufreq-applet --oa<br />max 7708 1 0 17:05 ? 00:00:00 /usr/libexec/wnck-applet --oaf-a<br />max 7698 1 0 17:04 ? 00:00:00 /usr/libexec/gvfs-gphoto2-volume<br />max 7693 1 0 17:04 ? 00:00:00 /usr/libexec/gvfs-gdu-volume-mon<br />max 7691 1 0 17:04 ? 00:00:00 /usr/libexec/bonobo-activation-s<br />max 7687 1 0 17:04 ? 00:00:00 /usr/libexec/gvfsd-trash --spawn<br />max 7685 1 0 17:04 ? 00:00:00 /usr/libexec/gconf-im-settings-d<br />max 7572 1 0 17:04 ? 00:00:00 /usr/libexec/im-settings-daemon<br />max 7569 1 0 17:04 ? 00:00:00 /usr/libexec/notification-daemon<br />max 7535 1 0 17:04 ? 00:00:00 /usr/libexec//gvfs-fuse-daemon /<br />max 7529 1 0 17:04 ? 00:00:00 /usr/libexec/gvfsd<br />max 7519 1 0 17:04 ? 00:00:00 /usr/libexec/gnome-settings-daem<br />max 7516 1 0 17:04 ? 00:00:00 /usr/libexec/gconfd-2<br />max 7433 1 0 17:04 ? 00:00:00 dbus-launch --sh-syntax --exit-w<br />max 7432 1 0 17:04 ? 00:00:00 /bin/dbus-daemon --fork --print-<br />max 7405 1 0 17:04 ? 00:00:00 /usr/bin/gnome-keyring-daemon --<br />max 7320 1 1 17:04 ? 00:01:07 /usr/bin/pulseaudio --start --lo<br />max 7323 7320 0 17:04 ? 00:00:00 /usr/libexec/pulse/gconf-helpe</pre></div><div>The maximum command line length of a process can be read from ARG_MAX sysconf variable</div><div><pre>[max@truth ~]$ getconf ARG_MAX<br />2621440</pre></div><div>On my system it is 2.5Mb. The first idea that comes to mind to make ps show the full command line is to pass ARG_MAX (+ some constant for other ps output fields) as the line limit to ps.</div><div><pre>[max@truth ~]$ ps -u $USER -fH --columns=`getconf ARG_MAX`<br />Fix bigness error.<br />UID PID PPID C STIME TTY TIME CMD<br />Fix bigness error.<br />max 7421 7385 0 17:04 ? 00:00:00 gnome-session<br />Fix bigness error.<br />max 7527 7421 0 17:04 ? 00:00:02 metacity<br />Fix bigness error.<br />...</pre></div><div>However, ps does not seem to like the columns value. A quick binary search for the maximum of --columns value reveals:</div><div><pre>[max@truth ~]$ ps -u $USER -fH --columns=131073<br />Fix bigness error.<br />UID PID PPID C STIME TTY TIME CMD<br />Fix bigness error.<br />max 7421 7385 0 17:04 ? 00:00:00 gnome-session<br />Fix bigness error.<br />max 7527 7421 0 17:04 ? 00:00:02 metacity<br />Fix bigness error.<br />...<br /><br />[max@truth ~]$ ps -u $USER -fH --columns=131072<br />UID PID PPID C STIME TTY TIME CMD<br />max 7421 7385 0 17:04 ? 00:00:00 gnome-session<br />max 7527 7421 0 17:04 ? 00:00:02 metacity<br />max 7539 7421 0 17:04 ? 00:00:00 gnome-panel<br />max 7540 7421 0 17:04 ? 00:00:00 nautilus<br />max 7541 7421 0 17:04 ? 00:00:00 /usr/libexec/gdu-notification-daemon<br />max 7543 7421 0 17:04 ? 00:00:00 /usr/bin/seapplet<br />max 7544 7421 0 17:04 ? 00:00:00 gnome-power-manager<br />max 7545 7421 0 17:04 ? 00:00:00 gpk-update-icon<br />max 7550 7421 0 17:04 ? 00:00:00 bluetooth-applet<br />max 7551 7421 0 17:04 ? 00:00:00 gnome-volume-control-applet<br />max 7552 7421 0 17:04 ? 00:00:00 nm-applet --sm-disable<br />max 7559 7421 0 17:04 ? 00:00:00 kerneloops-applet<br />max 7561 7421 0 17:04 ? 00:00:00 python /usr/share/system-config-printer/applet.py<br />max 8127 1 0 17:25 ? 00:00:01 gcalctool<br />max 7942 1 0 17:12 ? 00:00:00 /bin/sh /usr/lib64/firefox-3.5.4/run-mozilla.sh /usr/lib64/firefox-3.5.4/firefox<br />max 7954 7942 2 17:12 ? 00:01:36 /usr/lib64/firefox-3.5.4/firefox<br />max 7774 1 0 17:06 ? 00:00:10 emacs<br />max 7821 7774 0 17:06 pts/0 00:00:00 /bin/bash --noediting -i<br />max 8629 7821 0 18:21 pts/0 00:00:00 ps -u max -fH --columns=131072<br />max 7740 1 0 17:05 ? 00:00:00 gnome-screensaver<br />max 7730 1 2 17:05 ? 00:01:37 rhythmbox<br />max 7728 1 0 17:05 ? 00:00:00 /usr/libexec/gvfsd-burn --spawner :1.7 /org/gtk/gvfs/exec_spaw/1<br />max 7719 1 0 17:05 ? 00:00:00 /usr/libexec/notification-area-applet --oaf-activate-iid=OAFIID:GNOME_NotificationAreaApplet_Factory --oaf-ior-fd=27<br />max 7717 1 0 17:05 ? 00:00:02 /usr/libexec/clock-applet --oaf-activate-iid=OAFIID:GNOME_ClockApplet_Factory --oaf-ior-fd=30<br />max 7715 1 0 17:05 ? 00:00:09 /usr/libexec/multiload-applet-2 --oaf-activate-iid=OAFIID:GNOME_MultiLoadApplet_Factory --oaf-ior-fd=21<br />max 7711 1 0 17:05 ? 00:00:00 /usr/libexec/cpufreq-applet --oaf-activate-iid=OAFIID:GNOME_CPUFreqApplet_Factory --oaf-ior-fd=24<br />max 7708 1 0 17:05 ? 00:00:00 /usr/libexec/wnck-applet --oaf-activate-iid=OAFIID:GNOME_Wncklet_Factory --oaf-ior-fd=18<br />max 7698 1 0 17:04 ? 00:00:00 /usr/libexec/gvfs-gphoto2-volume-monitor<br />max 7693 1 0 17:04 ? 00:00:00 /usr/libexec/gvfs-gdu-volume-monitor<br />max 7691 1 0 17:04 ? 00:00:00 /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=19<br />max 7687 1 0 17:04 ? 00:00:00 /usr/libexec/gvfsd-trash --spawner :1.7 /org/gtk/gvfs/exec_spaw/0<br />max 7685 1 0 17:04 ? 00:00:00 /usr/libexec/gconf-im-settings-daemon<br />max 7572 1 0 17:04 ? 00:00:00 /usr/libexec/im-settings-daemon<br />max 7569 1 0 17:04 ? 00:00:00 /usr/libexec/notification-daemon<br />max 7535 1 0 17:04 ? 00:00:00 /usr/libexec//gvfs-fuse-daemon /home/max/.gvfs<br />max 7529 1 0 17:04 ? 00:00:00 /usr/libexec/gvfsd<br />max 7519 1 0 17:04 ? 00:00:00 /usr/libexec/gnome-settings-daemon<br />max 7516 1 0 17:04 ? 00:00:00 /usr/libexec/gconfd-2<br />max 7433 1 0 17:04 ? 00:00:00 dbus-launch --sh-syntax --exit-with-session<br />max 7432 1 0 17:04 ? 00:00:00 /bin/dbus-daemon --fork --print-pid 9 --print-address 11 --session<br />max 7405 1 0 17:04 ? 00:00:00 /usr/bin/gnome-keyring-daemon --daemonize --login<br />max 7320 1 1 17:04 ? 00:01:13 /usr/bin/pulseaudio --start --log-target=syslog<br />max 7323 7320 0 17:04 ? 00:00:00 /usr/libexec/pulse/gconf-helper</pre></div><div>It looks like there is a hardcoded maximum of the line length in Linux Fedora 11 ps utility equal to 128Kb (0x20000 in hex). It would be nice to remove that limit.</div><br /><div>p.s.: The lines of this post get truncated when viewed as a web-page using Firefox or Chromium. To view untruncated use an RSS reader, such as Google Reader.</div><br /><div>p.p.s.: And yes, I should have updated my Firefox to 3.5.5 which is available on Fedora 11 updates repository now.</div>Maxim Yegorushkinhttp://www.blogger.com/profile/13335665761944189051noreply@blogger.com2tag:blogger.com,1999:blog-8084820023443893802.post-38574060268976733702009-11-12T20:06:00.000+00:002009-11-12T22:43:20.245+00:00Benchmarking function call overhead in C++<div><br />I've caught some cold and been staying at home. Just for fun, decided to compare the price of calls to:<br /><ul><li>a regular function from the same non-position-independent-code (non-PIC) executable object</li><li>a regular function from a PIC shared library object</li><li>a virtual function from the same non-PIC executable object</li><li>a virtual function from a PIC shared library object</li></ul>Here is the source code of the benchmark:<br /></div><div><pre>[max@truth test]$ cat a.h<br />#pragma once<br /><br />#include <memory><br /><br />namespace max {<br /><br />void funA(long*); // same non-pic executable, different .o<br />void funB(long*); // another pic shared library<br /><br />struct X<br />{<br /> virtual ~X() {}<br /> virtual void fun(long*) = 0;<br />};<br /><br />std::auto_ptr<X> createA(); // same non-pic executable, different .o<br />std::auto_ptr<X> createB(); // another pic shared library<br /><br />}<br /><br />[max@truth test]$ cat a.cc<br />#include "a.h"<br /><br />void max::funA(long* x) { ++*x; }<br /><br />namespace {<br /><br />struct ImpOfX : max::X<br />{<br /> void fun(long* x) { ++*x; }<br />};<br /><br />}<br /><br />std::auto_ptr<max::X> max::createA()<br />{<br /> return std::auto_ptr<max::X>(new ImpOfX);<br />}<br /><br />[max@truth test]$ cat b.cc<br />#include "a.h"<br /><br />void max::funB(long* x) { ++*x; }<br /><br />namespace {<br /><br />struct ImpOfX2 : max::X<br />{<br /> void fun(long* x) { ++*x; }<br />};<br /><br />}<br /><br />std::auto_ptr<max::X> max::createB()<br />{<br /> return std::auto_ptr<max::X>(new ImpOfX2);<br />}<br /><br />[max@truth test]$ cat main.cc<br />#include "a.h"<br />#include <stdio.h><br />#include <time.h><br /><br />using namespace max;<br /><br />namespace {<br /><br />typedef unsigned long long nsec_t;<br /><br />nsec_t now()<br />{<br /> timespec ts;<br /> clock_gettime(CLOCK_MONOTONIC, &ts);<br /> return ts.tv_sec * nsec_t(1000000000) + ts.tv_nsec;<br />}<br /><br />long benchmark(long n, X& a, X& b, double times[4])<br />{<br /> long sum[4] = {};<br /><br /> nsec_t start, stop;<br /> start = now();<br /> for(int i = n; i--;) {<br /> funA(sum + 0);<br /> }<br /> stop = now();<br /> times[0] = stop - start;<br /> start = stop;<br /><br /> for(int i = n; i--;) {<br /> funB(sum + 1);<br /> }<br /> stop = now();<br /> times[1] = stop - start;<br /> start = stop;<br /><br /> for(int i = n; i--;) {<br /> a.fun(sum + 2);<br /> }<br /> stop = now();<br /> times[2] = stop - start;<br /> start = stop;<br /><br /> for(int i = n; i--;) {<br /> b.fun(sum + 3);<br /> }<br /> stop = now();<br /> times[3] = stop - start;<br /> start = stop;<br /><br /> return sum[0] + sum[1] + sum[2] + sum[3];<br />}<br /><br />}<br /><br />int main()<br />{<br /> std::auto_ptr<X> a(createA());<br /> std::auto_ptr<X> b(createB());<br /><br /> double times[4];<br /> // warm-up CPU caches and branch predictors<br /> long sum = benchmark(100, *a, *b, times);<br /> // benchmark<br /> long const N = 1e9;<br /> sum += benchmark(N, *a, *b, times);<br /><br /> printf(<br /> "%lu calls to: \n"<br /> " a regular function from the same executable: %.3lfnsec/call\n"<br /> " a regular function from a shared library: %.3lfnsec/call\n"<br /> " a virtual function from the same executable: %.3lfnsec/call\n"<br /> " a virtual function from a shared library: %.3lfnsec/call\n"<br /> , N<br /> , times[0] / N<br /> , times[1] / N<br /> , times[2] / N<br /> , times[3] / N<br /> );<br />}</pre></div><div>To get nanosecond-resolution times clock_gettime() function is used.<br /><br />This is how it is compiled:</div><div><pre>[max@truth test]$ g++ -m64 -Wall -Wextra -march=native -O3 -c main.cc<br />[max@truth test]$ g++ -m64 -Wall -Wextra -march=native -O3 -c a.cc<br />[max@truth test]$ g++ -m64 -Wall -Wextra -march=native -O3 -fpic -c b.cc<br />[max@truth test]$ g++ -m64 -Wl,-rpath,'$ORIGIN' -Wl,-z,now -shared -o libb.so b.o<br />[max@truth test]$ g++ -m64 -Wl,-rpath,'$ORIGIN' -Wl,-z,now -o test main.o a.o libb.so -lrt</pre></div><div>When linking -Wl,-z,now linker option disables lazy symbol binding. Lazy symbol binding resolves calls to functions from PIC objects on the first call. Disabling lazy binding makes the dynamic linker resolve all symbols prior to executing any application pre, rather than on the first access.<br /><br />The other linker option -Wl,-rpath,'$ORIGIN' tells the dynamic linker to search for the shared libraries first in the same folder where the executable is, so that there is no need to fiddle with LD_LIBRARY_PATH variable or /etc/ld.so.conf.<br /><br />And here are the results of the benchmark:</div><div><pre>[max@truth test]$ sudo chrt -f 99 /usr/bin/time -f "%E elapsed, %c context switches" ./test<br />1000000000 calls to:<br /> a regular function from the same executable: 1.999nsec/call<br /> a regular function from a shared library: 2.665nsec/call<br /> a virtual function from the same executable: 2.332nsec/call<br /> a virtual function from a shared library: 2.332nsec/call<br />0:09.32 elapsed, 0 context switches</pre></div><div>The test is invoked as a FIFO realtime class process with priority 99 (highest on my system). FIFO realtime class processes run until they voluntarily block themselves, time quantums don't apply.<br /><br />In the above invocation GNU /usr/bin/time utility is used to report the number of context switches during the run to make sure that the process has executed uninterrupted. This is to eliminate scheduling noise from the timings completely.<br /><br />The test was invoked on Intel E8400 CPU running at full 3GHz speed.<br /><br />Interpretation of the results:<ul><li>Calls to regular functions in non-PIC object files are the cheapest.</li><li>Calls to virtual member functions are more expensive. An interesting fact here is that calls to virtual functions always get dispatched through the virtual table. This means that the procedure linking tables (PLT) used for regular functions in PIC object files is bypassed entirely. There is no difference whether a virtual function resides in PIC or non-PIC object, they get called the same way through the virtual table.</li><li>Calls to regular functions in PIC objects are most expensive. This is because such calls in fact call code in the PLT which (when the function has been resolved) jumps to the actual function.</li></ul>Hm..., interesting!</div>Maxim Yegorushkinhttp://www.blogger.com/profile/13335665761944189051noreply@blogger.com3tag:blogger.com,1999:blog-8084820023443893802.post-25270584016000334202009-07-26T22:07:00.000+01:002009-07-26T22:50:12.011+01:00How to build an anonymous peer-to-peer networkIn BitTorrent the ip address of the host sharing data with you is available through the TCP protocol which is used for the data transfer. This may reveal the identity of the owner of the host. If privacy is desired there must be a way to make the source ip address unavailable.<br /><br />Here is a design sketch for a data transfer protocol where the ip address of the data sender is not revealed.<br /><br />The the frames of the TCP protocol, which is used for transferring data in BitTorrent, carry the ip addresses of the source and the destination of the data. The source ip address is needed for the control part of the protocol to send back acknowledgements. If this back path is eliminated it should be possible to not have to provide the ip address of the source.<br /><br />UDP protocol, on the other hand, allows to send datagrams with an arbitrary source ip address. Thus, the sender of the data can hide its true ip address by altering the source ip address of the datagram.<br /><br />The control part of such data transfer protocol should go through Tor network. Tor network provides anonymity but has very limited bandwidth and thus can not be used for the actual data transfer. This is where UDP can be used.<br /><br />First, the host joins a tracker through Tor network.<br /><br />The host then finds seeders through the tracker. Using Tor, the host requests seeders to send data to its real ip address. The seeders receive the request and send data using UDP. Which allows to hide the true ip address of the source.<br /><br />This may work quite well I think. Comments are more then welcome.Maxim Yegorushkinhttp://www.blogger.com/profile/13335665761944189051noreply@blogger.com0