this is an archived version of the old doink.ch blog (only selected posts from the period 2006-2008 have been kept)

how to mount a remote filesystem on ubuntu

requirements:

  1. you need to be able to connect to the remote machine with ssh
  2. you need sudo privileges on your local machine

steps for user joe with a remote machine moria:

sudo apt-get install sshfs fuse-utils
sudo adduser joe fuse
sudo modprobe fuse
sudo gedit /etc/modules # put ‘fuse’ on a new line
mkdir moria-home
sshfs joe@moria:/home/joe moria-home

after this, you may browse the remote filesystem just like it were on your local machine, including using everyone’s favourite filebrowser, opening files in an editor or mediaplayer or any other program which does not even know it’s actually dealing with remote files. very nice!

how to mount an ftp-account like a filesystem on ubuntu

in addition to the post on mounting a remote filesystem with ssh, i’ll try to explain how to do the same thing for an ftp-account.

requirements:

  1. an FTP account on a remote machine
  2. sudo privileges on your local machine

steps for user joe with remote machine moria:

sudo apt-get install sshfs fuse-utils
sudo adduser joe fuse
sudo modprobe fuse
sudo gedit /etc/modules # put ‘fuse’ on a new line

then head over to http://curlftpfs.sourceforge.net/, download, compile and install curlftpfs - or get the deb package that checkinstall made for me and install it.

    mkdir moria-ftp
    curlftpfs ftp://joes-ftp-login:joes-ftp-password@ftp.moria.bla moria-ftp

      it’s really nice to be able to just drag and drop stuff on your FTP account just like it were local!

gpg explained

when i’ve first come into contact with gpg keys, i’ve struggled a bit to understand the concept behind it. why do i need two keys, and which one do i use for what, in which order?

now that i’ve grasped it, i’ll try to explain the idea with a real-world analogy. imagine you want to send some confidential note to another person, but you can only interact with that person over a public network. in order to guarantee that noone else can read the note, you want to put it into a safe, and send the safe to that other person. however, you will also need to send the number-code required to open that safe, because the addressee obviously needs to be able to unlock it in order to read the message. but sending both the number-code as well as the safe over a public network is of course dangerous, because if someone intercepted both of these items, he could unlock the safe and see your secret.

to overcome this problem, people have come up with an alternative strategy. this strategy requires that the addressee (!) owns a safe and a corresponding number-code. now, if you want to send your secret to the addressee, you will ask him to send you his safe. sending an empty safe over a public network is of course not dangerous, because no secret is stored in it yet. once you receive his safe, you put your message in it, and lock it. now sending this safe with the secret inside back to the addressee is of course not dangerous, because no one except the addressee could possibly know the number-code for his safe. This number-code is never sent over a public network. In fact, the person who owns it will never give it away to anyone else, and least of all send it over a public network. once the safe arrives at the addressee’s place, he may unlock it using his number-code and see the secret note. that way, it is guaranteed that noone else could have seen it, even if he intercepted every single message on the public network.

what has been described as a ’safe’ in this analogy is of course gpg’s ‘public key’, and what i called the ‘number code’ is gpg’s ‘private key’. so if you want to send someone a secret message, you ask him for his public key, you encrypt the message with this public key, and send the result to the addressee. he will be able to encrypt it using his private key. the risk involved with someone intercepting the first message with the public key is of course zero, because the public key is open for everyone to see, anyway. the risk involved with someone intercepting the second, encrypted message, is also zero, because noone could possibly have acquired the addressee’s private key (the only way to decrypt the message), since it never left the hands of the addressee. got it?

part of the problem why i struggled a bit with the concept is imho the bad terminology. calling both the ‘public key’ and the ‘private key’ a ‘key’ is silly, if you ask me, because any analogy to real life involving two keys is bound to fail :) besides, the concept of a key being ‘public’ is nonsense, IMHO: the very idea of a key is that it is private! therefore, i think that calling those two numbers the ’safe’ and the ‘number-code’ would have been smarter. oh well, it’s too late for that now :[

screencasting on linux

i’ve invested quite some time to figure out a reasonable way to make screencasts of my ubuntu desktop. here’s what i’ve been looking for:

i’ve spent a lot of time getting used to (or rather: fighting with) xvidcap. my conclusion: don’t use it. IMHO xvidcap is a nice tool, but it has a few deficiencies which need to be urgently resolved before i would call it ‘usable’:

so does that mean i have to give up plans for screencasting from linux? googling the internet, i’ve found a few other tools. there is wink, which isn’t opensource. then there’s istanbul, which has not worked for me so far (i tried it more than once already). but finally, i found a tool that works as expected, and that was a big relief for me: it’s called recordmydesktop, and i would recommend it above all other screen recording tools i’ve seen so far. it’s in the ubuntu feisty repo, so just do sudo apt-get install gtk-recordmydesktop, and you should be all set. the tool recorded my session with audio out-of-the-box, without any problems. it will by default capture your screen and record the audio first, and as soon as you stop the recording, it will perform the encoding to ogg/theora. that’s just what i wanted, and it seems to work just fine up till now. i’ve still seen a few dropped frames, but that’s not a big issue for me at this moment.

here are a few other tricks i learned along the way:

record an mp3 from the command line:

convert an ogg/theora video to mp4:

other links:
RecordingScreencasts The Ubuntu ScreencastingTeam explains how they make their videos
ScreencastHowto A Screencast Howto I wrote last year on the Ubuntu Wiki (using vnc2swf to produce a flash video for your online viewing pleasure)

unix/linux oneliners

this is my growing list of useful unix/linux oneliners.

remove the unicode BOM from a file:

perl -pi -w -e 's/\x{EF}\x{BB}\x{BF}//g';

remove the unicode BOM from a number of files:

ls *.jsp | xargs perl -pi -w -e 's/\x{EF}\x{BB}\x{BF}//g';

replace in files (e.g. in all JSP-files):

find . | grep '.jsp$' | xargs perl -pi -w -e 's/SOMETHING/SOMETHINGELSE/g;'

record an mp3 from the command line:

arecord -f cd -t raw | lame -x - test.mp3

convert an ogg/theora video to mp4:

mencoder somevideo.ogg -ovc lavc -oac mp3lame -lavcopts vcodec=mpeg4 -o somevideo.mp4

…to be continued…

Ἅρειος Ποτήρ

here’s your chance to learn ancient greek with an easy and fun text. the translator maintains a special homepage dedicated to the book, where he provides convenient notes to the first two chapters, and a comprehensive vocabulary. since his pages are not really printer-friendly, and he uses some weird font which you might not want to bother to install, i prepared PDFs from the notes and the vocabulary: notes to chapter 1, notes to chapter 2 and the vocabulary. have fun :)

latex

i maintain some sort of a love-hate relationship with latex. we’re almost like a married couple: we can hardly get along on some days, but we couldn’t imagine living without each other anymore, either.

on the one hand, latex is so much superior to WYSIWYG word processors that it almost hurts me physically when i see my friends suffer from microsoft word woes when writing their university papers. honestly, ms word is a moody snob that will give you random crap on any given day; not because of a specific reason, but just because it feels like it. we all know that software tends to get sassy when it has “its days” once a month, but ms word seems to have “its days” just about any freaking day i have ever used it. here’s a quick summary why i would never go back to WYSIWYG word processing:

on the other hand, some aspects of using latex make we want to bang my head against the wall, or at least torture someone uninvolved who happens to be around. for example, could someone please explain the rationale behind this?


bibliography{vgs,gr} % works
bibliography{vgs, gr} % does not work

i’m not kidding, a single whitespace at this position really breaks a document.

the main problems i have with latex all stem from the same cause: (la)tex is just way too antiquated. in fact, it is so old, it makes a stegosaurus look like it lived yesterday. consider that tex was started in the early eighties. the early eighties? heck, i was born in the early eighties! no wonder some parts of it are fossil.

alright, here comes the unordered list of my biggest gripes with latex:

drum roll, flash, bang - enter xetex
the good news is that some of the problems i just described have been solved by the awesome jonathan kew, who came up with xetex. that software is an extension to latex that will solve 3 of my main problems: unicode support, easy image import (JPG, PNG etc.), and easy custom TTF font support. xetex was originally designed for macOS, but luckily, it will also run on linux. better yet, it has already been included into tex-live, so that it has found its way into debian and consequently into ubuntu, which means that a quick ’sudo apt-get install texlive-xetex’ will solve all these problems in a single swoop on my ubuntu system. i’ve been using xetex for about half a year now, and i’m so happy with it, there’s no way i would go back to using plain latex again. i’m actually tempted to look at xetex as a kind of latex3 (the real latex3 seems to be dead, AFAICS).

the right setup
in order to work efficiently with latex (xetex), you should put some thought into your working environment. first of all, it is a good idea to work on linux or another unix-like system where you have a decent console. personally, i recommend to use a SVN repository to store the sources for easy version tracking. if you work on different documents, i recommend to create a folder for every document, and a ‘common’ folder that holds stuff like bibliography files that you use for all documents. then, you write a Makefile for each document which will compile your sources, do any preprocessing, fetch files from the ‘common’ dir etc. in my case, i use a small preprocessor in python which will basically do some regex-replacements in my source files (for example, i use a shorthand expression for italicizing words: “engl. /hound/ and lat. /canis/” should be converted into “engl. \textsl{hound} and lat. \textsl{canis}”, etc.). also, you can have the Makefile call a pdf-viewer, so that a quick “make” on the console will compile your document and immediately show you the result. some people use fancy latex-editors, but i prefer to use a simple editor with latex syntax highlighting, and use the console for everything else. of course, tastes are different, but this is what works for me. if someone is interested, i could explain my setup in some more detail, and share my preprocessor, Makefiles etc., so let me know if that’s the case.

freaking long post, i hope it is of use to someone :)

further hints:

programmiersprachen und sprachen

gerade habe ich das buch ‘eine kleine geschichte der sprache’ von steven roger fischer gelesen (2. auflage 2004; engl. original aus dem jahr 1999). es ist ein nettes kleines büchlein, das einen sehr allgemeinen und für laien ausgericheten überblick über die entstehung der sprache, über die sprachwissenschaft, die wichtigsten sprachfamilien, sprachtypologie etc. bietet. natürlich kratzt der autor bei den einzelnen themengebieten nur an der oberfläche, doch liegt dies in der natur eines solchen buches, das auf ein breites, nicht ausgebildetes publikum zielt. im teil zu den germanischen sprachen, den ich aufgrund meiner ausbildung beurteilen kann, hat es zwar einige ungenauigkeiten und fehler drin (z.B. kann man altnordisch nicht als die ‘ursprüngliche germanische sprache’ bezeichnen, S. 132), aber es ist nichts allzu schlimmes dabei.

allerdings, und dies ist der grund für meinen blog post, bin ich in einem punkt ganz und gar nicht mit fischer einverstanden: nämlich damit, dass er programmiersprachen immer wieder mit natürlichen sprachen in einem atemzug nennt, und damit die grundlegenden unterschiede zwischen beiden missachtet. um es auf den punkt zu bringen: programmiersprachen sind überhaupt keine sprachen. sie haben, wenn man es genau nimmt, in einem linguistischen einführungsbuch über die geschichte der sprache nichts verloren.

zwei punkte mögen genügen, um dies zu klären:

in meinen augen ist die benennung von programmiersprachen als ’sprachen’ nur eine metapher, die aufgrund von einigen oberflächlichen eigenschaften (z.B. dass beide eine syntax haben) zustande gekommen ist. damit sollte aber nur gesagt werden, dass c++, assembler, fortran und java so etwas ähnliches wie sprachen sind, aber keineswegs, dass es sich tatsächlich um sprachen handelt. aussagen in der art, dass computer miteinander “sprechen” könnten, dass sie programmiersprachen “benutzten” und dass dies ganz ähnlich wie bei der kommunikation zwischen mensch und tier ablaufen solle (alles nachzulesen auf s. 223), sind total irreführend und zeugen von einem grundsätzlichen unverständnis darüber, was programmiersprachen sind und wie sie funktionieren.

get the full command of a process on solaris

the commands ps and top are commonly used on unix/linux to show information about the processes currently running on a system. however, if the command that started a specific process is very long, it will be chopped off at the end, so that you will only see the meaningless first part of it. this can be annoying sometimes, when you want to know in detail which command was used to start some process. so how do you get the full command?

first, i found out that using ps -ef will show a bit more of the command, but still not everything. so i looked further and finally found a solution that does what i want (only on solaris, though): use /usr/ucb/ps -auxww . you’ll probably want to use grep to display only a specific process, so the command would look like this:

/usr/ucb/ps -auxww | grep tomcat

thanks to robert maldon for blogging this!

umweltschutz

mein beitrag zum umweltschutz ist es, keine kinder zu haben.

das meine ich übrigens wirklich ernst. das problem ist nämlich nicht, wie man derzeit wieder einmal in den medien lesen kann, dass die leute ihren fernseher auf stand-by laufen lassen, anstatt ihn ganz auszuschalten, und ihr natel-ladegerät über nacht nie richtig ausstecken. nein, das problem ist vielmehr, dass 6.7 milliarden menschen einfach zu viel ist für diesen planeten. wir brauchen nämlich als gesamte menschheit nicht deshalb zu viel energie, weil jeder einzelne zuviel verbraucht, sondern weil es schlicht zuviele von uns gibt. diese menschen wollen alle essen, einkaufen, haus heizen/kühlen, autofahren, medien konsumieren, in die ferien fliegen, usw. dafür brauchen sie jede menge energie, produzieren müll, und nehmen eine menge weiterer umweltschäden (abholzung des regenwaldes, ölkatastrophen, auslaufendes uranhaltiges wasser, ozonloch, schmelzende gletscher etc.) in kauf. jeder mensch, der so lebt wie wir westeuropäer, hat also notwendigerweise eine stark negative ökobilanz (125 kilowattstunden pro tag verbrauche so ein durchschnittlicher westeuropäer, sagt professor mackay). da nützt es doch nichts, wenn einzelne leute ein bisschen an ihrer persönlichen bilanz rumschrauben… hier ein paar kilowatt eingespart, dort ein paar co2-moleküle gestoppt… das bringt unter dem strich nicht viel. eine signifikante verbesserung der globalen ökobilanz kann in meinen augen nur auf eine weise erreicht werden: durch eine verkleinerung der weltbevölkerung!

in diesem sinne lautet mein aufruf an die menschheit: “leistet euren beitrag - lasst euch sterilisieren!”. ist doch ganz einfach: halb so viele leute, halb so viel umweltverschmutzung!

Q.E.D.

und übrigens: selbst wenn die leute trotzdem weiter kinder haben, bin ich optimistisch, dass die natur einen weg finden wird, um die menschliche bevölkerung wieder auf einen vertretbaren wert zu dezimieren. darin ist die natur nämlich wirklich gut: im wiederherstellen von gleichgewichten. die bevölkerungsexplosion der menschheit dürfte also so oder so als episode in die geschichte des planeten eingehen, die im nachhinein betrachtet als “leicht korrigierbar” gelten wird.

p.s.: erwähnenswert ist in diesem zusammenhang sicher auch vhemt - the Voluntary Human Extinction Movement.

against redundancy in academic papers

it is by many considered good practice to write academic papers with a threefold structure, consisting of an introduction, a main part, and a conclusion. in addition, you are often expected to include an abstract with a summary of the content at the beginning of the paper.

i think that this structure is in fact often a bad idea, because it leads to a lot of redundancy. the result is a paper like this:

Abstract: In this paper it will be shown that chicken lay eggs.
Introduction: We are going to show that chicken lay eggs.
Main part: Chicken lay eggs.
Conclusion: We have shown that chicken lay eggs.

come on, i’m not stupid, there’s no need to tell me the same thing four (!) times. if you don’t have anything to say in the introduction or the conclusion, then skip it, but don’t just repeat what you have said elsewhere pro forma, sometimes even reusing the exact same wording multiple times! i don’t want to read that, and the author(s) probably don’t want to write it, so why not just concentrate on the relevant parts? looking at the “paper” above, isn’t in fact everything relevant included in the main part?

make your work reproducible

a thing i’ve learned during the last few years working in the department of computational linguistics is that you should always aim to make your work as reproducible as possible. there’s a number of reasons for this:

the result of this is that people (possibly you) will spend large amounts of time to solve problems over and over againg which have been solved before. therefore, you should try to make your work reproducible. that means:

my geany color scheme

my custom color scheme for geany. check out this on how to set it up. the color scheme is work in progress and will be updated in the future.

color-scheme-geany.png


# Based on a scheme by John M. Gabriele

[colors]
# fg
dark_gray=0x393939
# bg
light_gray=0xE9E9E9

red=0x952424
lighter_red=0xC65E5E
brown=0x986C1E
magenta=0xAC3C90
green=0x528649
dark_yellow=0xACA61E
aqua=0x17978D
blue=0x4852AC
violet=0x7C46B8
orange=0xD4903A
black=0x000000
light_blue=0xb0d2c1

# for selection bg
middle_gray=0xA0A0A0
middle_gray2=0xB9B9B9

# for curr line bg
light_mid_gray=0xDEDEDE
light_yellow=0xFFFFF9

[colorscheme]
default_fg=dark_gray
default_bg=light_gray
operator=dark_gray
preprocessor=red

number=red
character=aqua,bold
string_1=green
string_2=aqua

comment_1=middle_gray
comment_2=middle_gray

class=dark_yellow,bold
definition=orange,bold

identifier_1=dark_gray
identifier_2=dark_gray
identifier_3=dark_gray

attribute=lighter_red

function_1=blue,bold
function_2=violet,bold

word_1=blue,bold
word_2=aqua,bold

notice=brown;red,bold

selection_bg=light_blue
curr_line_bg=light_mid_gray

brace_good=dark_gray;middle_gray
brace_bad=red;middle_gray,bold

margin_linenumber=light_gray;middle_gray
margin_folding=violet;dark_gray

whitespace=middle_gray2
indent_guide=middle_gray2

Ad-kardinalien und Ad-adjektive

Nach den Guidelines des Stuttgart Tübingen Tagsets für das Tagging deutscher Text müsste man rund, gut oder knapp in den folgenden Beispielen als Adverb taggen (S. 57f):

rund zwei Kilo Staub
gut zwei Drittel der Teigmasse
knapp vierzig Personen

Kann mir jemand erklären, was daran “adverbial” sein soll? M.E. beziehen sich diese Modifikatoren sowohl syntaktisch wie auch semantisch eindeutig auf den Quantifikator (Kardinalzahl). Die syntaktische Verbundenheit lässt sich durch Satzumformungen und Ersetzungsprobe zeigen: Ich liege in rund zwei Kilo Staub. Rund zwei Kilo Staub sollten reichen. Sie sollten reichen. Dies zeigt, dass rund zwei Kilo Staub eine Nominalphrase ist. Auch die semantische Verbindung von rund und zwei ist einsichtig: Es ist die Quantität zwei, die mit rund näher spezifiziert wird, weil tatsächlich vielleicht nur 1.9 oder sogar 2.1 Kilo vorliegen. Eine Verbindung zu einem Verb, was durch die Bezeichnung als “Ad-verb” nahe gelegt wird, ist m.E. weder syntaktisch noch semantisch gerechtfertigt. Sollte man sie also nicht besser “Ad-Kardinalien” nennen?

Tagging als Adverb verlangen die Leute hinter STTS übrigens auch im folgenden, vielleicht eher diskutablen Fall:

es sind weit mehr als 100 Gäste

Aber ist es nicht auch hier so, dass weit mehr näher zu als 100 Gäste gehört als zum Verb? In meinen Augen ist weit mehr als 100 Gäste jedenfalls die sinnvollere Einheit als es sind weit mehr.

Schwer entscheidbar bezüglich Wortart sind auch modifizierende Partikel zu Adjektiven, wie in den folgenden Fällen, wo die STTS Guidelines ebenfalls eine Bestimmung als Adverb fordern:

es war ganz dunkel
die Pfanne ist sehr heiss [dieser Fall steht so nicht direkt in den Guidelines, LT]

Also “Ad-Adjektive”? ;D

“stop the numbers game”

thank you very much, mr. parnas, it was about time someone said that. (read the full text in this blog post).

ps: check out scigen, the automatic computer science paper generator.