Added Varnish, just works.
October 17th, 2008
Till today I used Lighttpd to distribute the load to multiple FastCGI backends. This worked “ok” until now but I’m really impressed about the performance gained by adding Varnish 2.0 (a high performance HTTP reverse proxy) in front of the web cluster. Additionally this is much easier than always fixing the FastCGI stuff when it is broken again.
However, Varnish 2.0 was released just 2 days ago, so there are still some issues. It seems that the random director (directors are used to distribute the load to multiple backend webservers) is broken currently but round-robin is working fine and should last for a while. As a workaround for using the random .weight option you can just add a backend more than once to the round-robin director.
There was also a problem with the varnishstat Munin plugin at Munin Exchange, so I uploaded a fixed version here and also added some installation instructions.
Do you wonder why Varnish seems not to cache anything?
For me (in fact I think for most people) the default varnish configuration is a bit too restrictive when it comes to cookies. All requests that contain a “Cookie” request header will not be cached – never. As soon as a cookie is set by the site, also all static files like images, scripts and styles will no longer be cached cause the cookie is sent along with every client request, even to static files. In most situations (for almost every site that requires a login or uses adsense or similar and sets a cookie) this will render Varnish absolutely useless. A better approach is to let the dynamic pages always set a cookie, so that a “Set-Cookie” response header is created every time. Varnish also will not cache when a “Set-Cookie” response header is present so we don’t need to care about the “Cookie” request header anymore. Knowing this, all we need to do is make Varnish ignore the cookies for static files – and this is easy:
# in vcl_recv
if (req.url ~ “\.(png|gif|jpg|swf|css|js)$”) {
unset req.http.Cookie;
}
This should work for 99% of all somewhat modern websites where users can log in. Because of this I think the documentation should mention this more clearly. Also don’t forget to set etag.use-inode = “disable” in lighttpd.conf to sync the ETags when using multiple backend servers. Anyhow, Varnish is great, so have fun! :)
The Chrome Buzz – Nothing new about this.
September 6th, 2008
If you are one of these people complaining about Chrome and privacy only now, you should never ever have used Google Search in the past, too - or do you still think that Google does not know what search results you click on after it has recorded your search terms? Google already knows your web history (at least when you search and/or the website has Google ads) and with Chrome they even know more about what and when you visit it without even searching. Hey, why does Google create a Browser?
- Combine your internet habbits with already collected data (e.g. from Google Search, Adsense and of course the tracking cookies) so that they can show you better ads to make more money
- Collect even more data for better ads
- Make Google even more present on your desktop and thereby make you use more of their products so they can display you more ads (and collect even more information again) to make more money
- Promote Google as a brand to make you use more of their products (ok, this seems to backfire slowly but surely)
What else should Google do than making more money? So why did Google not create Chrome?
- Know who you are – even if they (already) do
- Piss off the competition (primarily)
- Steal your private data to give it to third parties
Mails, Dates, Searches, Ads clicked, Websites visited, Conversations had etc. – with Chrome Google will know qualitatively and quantitatively more of the stuff it already knows. Nothing new about this for ten years - we’re still responsible for our data ourself. I still admire Google’s business :-).
Chrome – Pointless leecher?
September 3rd, 2008
I think you already read enough about the new Google Browser somewhere else. So additionally, these are just my two cents:

As you may know, I’m sometimes limited to an ISDN internet connection (64kbit/s). This is why I just cannot use chrome because it downloads (or uploads) permanently. I’ve been running it for about two hours now and there still nothing changed. It also shows network usage “Not available” for the browser component.
I somewhat expected that a google browser would transfer various (…) data in the background, but this is tough ;-). No idea what data is transfered here – currently the problem is that data is transfered so that my internet connection is unusable.
Want to read more?
Schmalband-Flatrate für Modem und ISDN – oder doch nicht?
September 1st, 2008
Wäre ja auch zu schön gewesen, um wahr zu sein. Ich hätte mich schon beinahe darüber gefreut, wieder eine kostengünstige ”Schmalband”-Flatrate verwenden zu dürfen, bis mir nun nach einmonatiger Nutzung folgende Nettigkeit auf der Arcor Internet by Call flatrate-Anmeldeseite begegnet ist:
Anmeldung abgebrochen
Lieber Kunde,vielen Dank für Ihr Interesse an der Arcor-Internet by Call flatrate.
Leider sind Sie uns in der Vergangenheit durch Ihre weit überdurchschnittliche Flatrate-Nutzung aufgefallen. Bitte haben Sie Verständnis dafür, dass wir Ihnen deswegen diesen Tarif nicht mehr anbieten können.
Wir empfehlen Ihnen daher, unsere günstigen (!) Basistarife zu nutzen. [...]
Ihr Arcor Team
Mal im Ernst: Wenn es keine Flatrate ist, dann sollte man sie auch nicht als solche anbieten. Natürlich bewundere ich Arcor’s Engagement (wenn es denn wirklich ernst gemeint ist) für unfreiwillige DSL-Vergessene wie mich auf der einen Seite, keine Frage, auf der anderen ist es in dieser Umsetzung aber nunmal einfach keine Flatrate und somit nicht mehr als eine weitere unseriöse Täuschung (und von der automatischen 2-Stunden-Trennung will ich mal gar nicht anfangen). Ich überlege tatsächlich, ob ich meinen gut bezahlten DSL-Anschluss in Aachen nicht zu einem anderen Anbieter verlege, der mich nicht am einen Standort nach Gutdünken aus dem Internet aussperrt und am anderen weiterhin gerne meine Gebühren einsackt. Na vielen Dank.
Wieviele Menschen in diesem doch so fortschrittlichen Land müssen immer noch mit 64k-ISDN oder gar mit einem 56k-Modem durch die Gegend gurken, können nicht anständig im Internet arbeiten und müssen auf sämtliche Webseiten mit größeren Bild- oder Videoangeboten verzichten (oder nach einer 2-3 Stunden Ladezeit für den notwendigen 50MB großen Grafiktreiber nach dem X. Verbindungsabbruch wahnsinnig werden) und dafür geschätzt den 12-fachen Preis für eben diese mindestens 12-fach langsamere Internetverbindung berappen.
Schön, dass die T-Com nun nach knapp einem Jahrzehnt beschlossen hat, den “ländlichen” DSL-Ausbau wieder fortzusetzen (Subventionierung sei Dank). Trotzdem glaube ich nicht daran, dass dieser Ort, diese halbe Straße oder eher sogar dieser Häuserblock jemals ans DSL-Netz angeschlossen wird.
Jaja, wen kümmert’s schon, wenn eine Hand voll Leute, die beruflich oder schulisch auf das Internet angewiesen sind, aufgrund von mittelalterlicher Technik und total durchdachter Firmenpolitik niemals die gleichen Chancen haben werden, wie diejenigen, die 100 Meter entfernt rundherum wohnen. Immerhin gibt’s ja eine 64k-ISDN-Flatrate für rund 90€ im Monat (zum Vergleich: 250x schnelleres 16000k DSL gibt’s ab gut 20€). Also was bleibt: Internet haben wollen und nicht bekommen oder teuer für den letzten Müll bezahlen. Tolle Firmenpolitik, nur aus dem Grunde keine erschwinglichen ISDN-Tarife anzubieten, um möglichst viele Leute für den DSL-Zug zu gewinnen – auf den Rest wird einfach geschissen. Ist ja nicht so, als würde sowieso niemand mehr freiwillig möglichst teures “ich hätte gerne nur ein ganz klein Bisschen”-Internet kaufen.
Kurzer Exkurs: Wie muss man sich den denn vorstellen, diesen - wie hieß er gleich – Schmalbandsurfer?
Der durchschnittliche Schmalbandsurfer kann nach jedem Klick auf einer Webseite vom Stuhl aufstehen und im Schnitt gemütlich 5 Runden um selbigen herum gehen, ohne etwas zu verpassen.
Der durchschnittliche Schmalbandsurfer verwendet keine aktuelle Antiviren-Software. Nicht etwa, weil er dumm oder zu sorglos ist – nein, weil ein Update der Virendefinitionsdatei über das Internet 10 Minuten dauern würde und er dafür entweder in dieser Zeit Besseres zu tun hat oder er nach dem 100. Update verzweifelt den Virenschutz wieder deinstalliert.
Der durchschnittliche Schmalbandsurfer entwickelt den Reflex, Fenster, die sich automatisch auf dem Desktop öffnen, ohne großes Zögern wegzuklicken, da sie möglicherweise ein Update starten könnten.
Der durchschnittliche Schmalbandsurfer verwendet keinen Mozilla Firefox, es sei denn, er weiß ganz genau, welche Hebel er zum Deaktivieren aller Updates umlegen muss, damit er sich nicht einmal die Woche die Frage stellen muss, ob die ISDN-Karte schon wieder defekt ist.
Der durchschnittliche Schmalbandsurfer hat bisher kaum etwas vom Internet gesehen, wenn er Microsoft Windows verwendet, es sei denn, er weiß wie er die automatischen Updates und die Fehlerberichterstattung deaktiviert. Nur der ganz harte Kern lässt den PC jeden Mittwoch für 1-2 Stunden unbeaufsichtigt.
Der durchschnittliche Schmalbandsurfer hat zuhause noch nie einen Youtube-Film geschaut. Er hat es jedoch schon 100 mal probiert.
Der durchschnittliche Schmalbandsurfer benötigt ebenso lange zum Anschauen der neuesten Partyfotos, wie er auch Zeit auf der Party verbracht hat.
Der durchschnittliche Schmalbandsurfer spielt nicht im Internet, es sei denn, er besitzt sowohl eine Internet-fähige Spielesammlung aus den Zeiten, zu denen es noch keine Breitbandverbindungen gab als auch einen Freund, der die gleiche Spielesammlung und das gleiche Schicksal teilt.
Der durchschnittliche Schmalbandsurfer hat beim Chatten wortwörtlich immer das letzte Wort.
Nunja, geh ich halt aus Frustration und zum Zeitvertreib auf der Straße randalieren - ich bin ja schließlich Deutschland! Ich mein’, ich würde ja im GSM-Netz telefonieren, mir ein UMTS-Modem zulegen oder einen Internet-fähigen Kabelanschluss erwerben … wenn es das hier gäbe, gell. Oder ich lese mir die Berge an DSL-Werbung durch, die tagtäglich ins Haus flattern und erfreue mich an deren schönem Aussehen, auch wenn der Sender wohl einen am Absender hat. Oder ich gehe mal wieder in den T-Punkt und lasse mir Sprüche wie “Wie, sie haben noch kein DSL? Das würde ich aber beantragen, das ist sonst doch viel zu langsam und macht gar keinen Spaß! Wie wäre es mit T-DSL 2000?” reindrücken. Und wenn gar nichts mehr hilft unterhalte ich mich gerne auch wieder mit den netten Telefonisten: “Hallo, XY-Internet hier! Ich habe ein großartiges Angebot für Sie, da ich gesehen habe, dass sie noch gar keinen DSL-Anschluss besitzen. Damit auch Sie schnell im Internet unterwegs sind, haben wir unsere Preise gesenkt und das perfekte DSL-Paket für Sie im Angebot!” – hach, was hatte ich da schon einen Spaß.
Puh, das musste mal wieder sein. Und das reicht jetzt auch erst mal wieder für ein Jahr, bis ich den nächsten Rechner fröhlichsts auf die Straße getreten hab ;-). In diesem Sinne…
Schmalbandsurfen fügt Ihnen und den Menschen in Ihrer Umgebung erheblichen Schaden zu.
‘llo again
August 26th, 2008
Seems that I’m finally getting better. I’m still wondering which nasty disease is responsible for the blackout, but I’m looking forward to return back to normal soon :).
Having a break.
July 21st, 2008
No more news? ;) Seems that it was a little bit too much for me over the last few months and I stumbled into an unavoidable time-out. I’ll come back to business when I’m better again :). Happy holidays!
Apache Mina + DNS
May 31st, 2008
I’m currently implementing DNS on top of Mina 2. If you are interested, feel free to take a look at the makeshift page.
Maybe I’ll create a DNS based load balancing solution from this somewhen, featuring a nice Web backend for editing and adding routines for node failure detection to it (maybe Rhino based?). So, when a server is down, DNS servers will notice this and deactivate the corresponding nodes until they are back online. In combination with a good clustered filesystem this could be the base for a highly available cluster on cheap linux boxes without huge configuration needs (ok, a ~10 second downtime of single nodes still needs to be tollerated). I think I’ll look for a domain company supporting real NS entries soon :-).
Update:
Ok, I got most of the features that I personally need ready. Now you can define a datasource for DNS records and for now this can be either “Memory” (records created at runtime and fully cached) or “Database” (a JDBC layer). With the database datasource it is possible to update records on the fly while the caching features of “Memory” still remain as long as records do not change (think I’ll create a quick PHP backend for this). You could also run status daemons in the background that automatically deactivate unreachable hosts (UPDATE dns SET active=’0′ WHERE …).
You can also use every database that is supported by JDBC (e.g. MySQL, PGSQL, MSSQL, Java DB, SQLite etc.) – I think this is a big feature plus that I never saw before for a DNS server. Of course you can also implement your own custom datasources (maybe LDAP) and I am thinking about adding a simple zone file datasource so you can use the DNS server as a BIND replacement easily.
Ways to code PHP
May 29th, 2008
Prolog: In the last few years, I wrote PHP code mostly within Notepad. At some point I felt a strong desire for syntax highlighting and a structured source tree and therefore I am still exploring the possibilities.
As the first real need was syntax highlighting, I ended up using SciTE first. It is pretty much a more extended Notepad with the ability to highlight PHP code via a syntax highlight definition that ships with it. However, SciTE has too many different half-working highlight definitions and seems not to like wide pages so that scrolling to a far-right-away position in the middle of a line is somewhat painful. Maybe they fixed this already, but the last version I used still did so. There is also no source tree for a project folder.
Afterwards I read about Textmate, a much praised editor for Mac and I liked the screenshots/screencasts (my first from the Ruby “A blog in 15 minutes” buzz). It comes with a source tree view and pretty syntax highlighting. Unfortunately it is a commercial Mac only product and if you own one, you will pretty sure already know about it.
Not much later, E – Texteditor, which claimed to be Textmate for Windows, appeared and I gave it a try. In fact it was not hard to set it up using a really nice look and it was much fun to use it. However, there is no php documentation available directly from the editor – ok, it’s not an IDE, it’s an editor. In the first releases there were some bugs and there still seem some to be left especially when quickly closing and reopening E, but I can recommend this editor to everyone who has $ 34,95, wants a very pretty editor and currently misses syntax highlighting and a source view. Imho E is the best Notepad replacement for editing PHP code available on Windows and it can also do much more.
On Linux, gedit is an excelent tool that also ships with gnome. It’s basically the same like E but does not look so pretty. However, editing PHP files with it does just work and there has been some progress recently to make PHP highlighting more comfortable. You may need to tweak it a little to fit your needs, but because you are using Linux, I assume you know where to find more information on this :P.
Not long ago I also noticed Netbeans’ approach to PHP editing and because I have been using Netbeans for a long time to write Java already, it was a must to try it out. At the first look it is just good old Netbeans as you know it from Java but the Early Access for PHP edition features PHP only instead including code completition and so on. Unfortunately creating a new project from existing source folders is not yet trivial and there are some settings that you will never need without appserver integration. However it is possible to create a new project just by using the default settings, steal the “nbproject” directory from it, copy it to the root folder of your PHP app and edit the “project.properties” and “project.xml” to fit your needs. For example:
C:\yourapp\nbproject\project.properties:
copy.src.files=false
copy.src.target=
include.path=\
${php.global.include.path}
index.file=index.php
source.encoding=UTF-8
src.dir=C:\\yourapp
url=http://localhost/yourapp_path_maybeC:\yourapp\nbproject\project.xml:
<?xml version=”1.0″ encoding=”UTF-8″?>
<project xmlns=”http://www.netbeans.org/ns/project/1″>
<type>org.netbeans.modules.php.project</type>
<configuration>
<data xmlns=”http://www.netbeans.org/ns/php-project/1″>
<name>YourappName</name>
</data>
</configuration>
</project>
Done so, when opening the project (just the root folder of your app) you are presented with your PHP application loaded inside Netbeans with all your files in the source tree view. From now on you can edit your project easily and, if you setup a local webserver (take a look at Server2Go or Lighty2Go if you want a small but just working local webserver/mysql/php stack), get live results. For serious PHP programming (yes, not everyone acknoledges this ^^) the free Netbeans for PHP is a nice tool. I am looking forward to see the next releases. Till then I will use it for a while and hope that it will become simpler to use in the future without all the bloat :).
There is also a PHP plugin available for integration with Eclipse (this is also official supported by Zend) but because I never were the typical Eclipse guy (don’t know, just not the way I like things to look / be done), I did not test this in much detail. As far as I read the Eclipse PDT will integrate somewhat deeper with PHP thus featuring a debugger and easy integration with the Zend Framework. As I’m still a Notepad-Era guy running his own framework code, this may not be exactly what I am looking for (I also complained about the somewhat forced appserver integration in netbeans which is quite the same, you remember). However, if you plan to use most of the features of a complete IDE supported by the Zend guys, Eclipse PDT will be your way to go currently.
<?php die(“So far…”); ?>
What about a “Cluster Filesystem”?
May 28th, 2008
I have been striving around the open source scene for a cluster filesystem that fits to my needs for some time now. I found out about Redhat’s GFS, Apache’s Hadoop that seems to be very similar to Google’s Filesystem and I read a lot about GlusterFS that I did not attach much value to till today because there are no ready-made packages available and I suspected the term “GlusterFS” and the strange logo designs on it’s website a bit. I also struggled with NFS for some time and had a look at MogileFS.
However, it turns out that GluterFS is exactly what I need. Some thoughts about the different filesystems I read about:
Hadoop, which is written in Java, is made for very large files inside heavy computing tasks. It is not made for small files and development of a FUSE driver does not seem to be high priority. It also comes with a grid computing engine and is, as said, created for processing large amounts of data (e.g. big search engines, large datastores that do not change for long times i.e. videos) instead of small shared datastores. Therefore it is not an option.
I used NFS not long ago (for about a year) to distribute a single data store to all of our servers but it stands out in bad performance as far as I can tell. There are so many absurdities inside NFS that make it imho practically unusable in production environments. Our servers are located inside different datacenters that are hosted by the provider of our choice, but everytime when it came to minimal data loss in between the datacenters, NFS ran amok even when configured with TCP/IP. Another point was, that I was not able to load balance the NFS server on different machines because I did not want to use drdb on the partitions supplied by the hoster’s default debian installation to keep the possibility to be supported on heavy failures. NFS works to some extend but used with more than 3 servers it starts to mutate into a performance bottleneck and general headaces.
I am currently just using rsync to distribute the webserver root folder to all nodes. This may be the simplest way but it’s not meant to be a solution for the long term because update rate is bad and it gets even worse with every node added to the cluster. There is also no possibility to modify file contents or to add/remove files from somewhere else than the server - the nodes must operate read-only.
GFS turns out to be more a package of utilities to handle requests of different servers to a commonly shared storage. Because I need to build the infrastructure for our servers on top of default-configured cheap linux boxes, this is not an option, yet.
MogileFS is a filesystem living in userspace that is able to balance files between different storage nodes (e.g. 3 copies per file). Unfortunately it is written in Perl (does this perform?) and FUSE support seems to be an initially unintended byproduct in a very early stage. MogileFS is divided into differnt types of modules that hold the namespace inside a relational database (trackers) and the storage nodes. I am not convinced that MogileFS will scale and will be easy to use when used as a general filesystem with FUSE, but the idea behind MogileFS as an automatically balancing storage network is absolutely great.
GlusterFS (GNU Cluster Filesystem) is completely distributed with no single point of failure and as easy to maintain like NFS but has some major benefits when it comes to webserver clusters. It is possible to setup some boxes as the underlying datastore servers with automatic file replication done on client or server side. There is no special filesystem required to run it because it just sits on the already existing (ext3 in my case) partitions and exports directories in a similar manner as NFS. It also allows adding clients to the cluster without the need to copy around configuration files because the servers are able to submit the required configuration to connecting clients. However there are no official debian packages available yet to automate the installation tasks and you will need a patched (and newer) FUSE kernel module (at least on debian etch) to get it to work and to support distributed flock() calls (it will not configure the fuse client module against the default 2.5.bla FUSE in etch) [Update: Found some debian packages released by the GlusterFS guys on their homepage - may be worth a try; Update: These packages seem to be too old, but the maintainer has been informed - use the source instead: rpmbuild it (rpmbuild -bb glusterfs.spec) and convert it to deb (alien glusterfs-1.3.9.rpm, dpkg -i glusterfs_1.3.9.deb) but note that you will also need a recent FUSE]. There are also possibilites to automatically replicate files of specific types to big clusters (e.g. all *.jpg files replicated to at least 3 datastores with the unify translator). However, I’m quite new to it and ran it inside virtual boxes only yet but as far as I can tell it will scale easily by nature and supply a good infrastructure against data loss. I really wonder why GlusterFS has not got more attention, yet.
There are other clustered filesystems available for such clusters, too, but most of them seem to be outdated or are still in an early stage. If you know about another filesystem for my needs, feel free to comment and I’ll give it a look. So far I will try out GlusterFS in a real environment very soon if I do not find something better.
(P.S: Yes, technically versed people will say that a cluster filesystem like I’m describing does not ship with a functional data store, but I will use this term because this is what most people expect from it, including me.)
Preparing The Switch
May 22nd, 2008
I’m up to switch to Linux now – definitively. I’m already using Debian for our servers for a couple of years now and because I have done some heavy development in the last few months that could be so much easier using some Linux deriviate, I think that now is the time.
As the most people who need to switch between both worlds sometimes, I installed a fresh copy of the newest linux distributions from time to time in the past but I could not chum up with it in the end because of the impossibility to play games (yey, I’m somewhat older now ;)) and to run some essential tools (at least for me). In the last few weeks I once more took notice of the ongoing efforts in the linux scene to make Linux a first choice also for desktops including the more and more progressing wine windows layer. So, I’ll go for Ubuntu now as it seems to be the most popular desktop Linux available and forms a perfect match with my humble Debian experience. I know, Debian could do the job too, but it might be easier for me to count on the advancing Ubuntu community for desktops. Of course, my Windows OS will stay for testing and gaming :).
So then, I’ll finish backing up my stuff – wish me luck that I’ll go through it once and for all without killing my partition table :).
A few hours later: Guess what… ^^
The current result of my efforts is a previously half-filled corrupted partition of 400 GB in total size because my computer just turned off for absolutely no reason while resizing / moving it and the used partitioning program sucks in all kinds of protections agains this. Ok, I backed up all the really important stuff but this screwed up my initial enthusiasm definitely. All attempts to continue with rescue disks etc. did not work out. So, it will take a while till I have restored all the data. However, I already located the old MFT and I hope that not too much damage was done so that I can reanimate it with most of the files still beeing intact. If the program is not too idiotic, it tried to move the files one by one by updating the previous MFT, so that there is max. 1 corrupted file – but I doubt that this piece of s…oftware was clever enough ;). The recue process will run for 22 hours now, so if you do not hear from me for a while, you know what I am waiting for…cheers!
Another few hours later:
I managed to restore the partition table with an excelent tool named TestDisk. However, Vista seems to be somewhat damanged afterwards but the most contents seem to be still intact. I’ll head to my favorite computer parts store later and buy a bigger external disk so that I can kill the damaged partition entirely when all the data is moved. Nice Job, TestDisk! For all of you considering to buy a commercial partition management tool – buy it if you like, but never ever use it if you have no idea how to get your data back on failure! :)
I feel lucky!
Ok then, recovery worked out pretty well in the end, I backed up some more medium-important data already and chkdsk fixed the remaining issues. Say that I am crazy but I’m currently retrying the resize and move process with a newer version of the partitioning app that screwed it up before :). The only change is, that I’m running it on the same partition marked as inactive from within Vista instead of using the ugly dos-look-on-restart interface ^^.
Q.E.D.
Guess what…again ^^. “Do you really want to remove XY from your computer? Yes/No”. But I got a fresh big external disk now. Think it’ll be a lot of fun to copy 200 GB via USB at effectively about 80GB/h ;).
Finally…
I am on Ubuntu now. However, my Vista partition is dead and I need to reinstall a fresh one later. The hard part was to make my intel fake raid, that claimed to be a real hardware raid, work somehow using the experimental dmraid package. So then, time to get some sleep.
Added Browserpark
Because Vista was killed somewhen in the process, I decided to switch back to XP for my Windows work. One of the first things to do was to setup a fresh browser park for cross browser development again and as far as I can tell, I got all the major browser aboard – IE 5, IE 5.5, IE 6, IE 7, Firefox 2, Firefox 3, Opera 9 and Safari 3. Unfortunately I could not get any version of IE running on Wine inside Ubuntu, d’oh - it just crasahes.
