Not just a bug
August 25th, 2009
Today I moved our main MySQL database from one server to a fresh new one and of course I had to configure the new mysql daemon for heavy load and much more memory like I always do.
Nothing exotic so far. However, MySQL (5.0, Debian Lenny) comes with a default InnoDB whatsoever with empty InnoDB data and it is now the second time for me, that the InnoDB engine was silently disabled by the MySQL daemon after changing the configuration to something usable. The only way to solve this issue is, to delete all the empty default InnoDB files from /var/lib/mysql and restart the database server. From then on, InnoDB shows up in “SHOW ENGINES” again.
But the clue is: When importing InnoDB tables, there is no error, no warning, no notice that InnoDB is disabled and all tables silently fall back to MyISAM while killing every single foreign key and not using the tweaked InnoDB settings (because there is none) at all. What a mess.
As you can guess, I will have much fun for the next few days switching hundreds of tables back to InnoDB and recreating every single foreign key after tidying up the unreferenced mess. Hooray!
Burning Varnish at the stake
August 22nd, 2009
I have to revise everything I said about varnish before. Meanwhile it is too hard to count the number of times, that I had to restart my instances in the last few weeks and especially today. While packet loss keeps increasing inside the Hetzner network, varnish more and more shows up as a real problem, refusing to serve images randomly with increasing frequency.
I used Varnish till today to cache images hosted on S3, but I am finally so absolutely disappointed (mostly because our users are also with us just because of varnish), that I decided to go back to good ol’ lighttpd with a proxy and caching solution I wrote in PHP (like I did before). The initial reason to switch over to varnish was, that I searched for an uncomplicated way of proxying the images to save bandwidth costs including that I do not have to touch the servers every week like I did before, but unfortunately I ended up touching them twice a day just to restart Varnish. I don’t even dare to think about the traffic peaks caused by this and all the hours I lost while logging into several servers.
I guess this is a clear case of “if you don’t do everything yourself”. However, I completely rewrote my own caching solution, made it completely independent of the backend systems and implemented a SQLite database into every instance so that it’s possible to let it clean up itself once a day via a cron job and generate some basic statistics. -> “Minimal Image Proxy 2″ was born and is already running on our image servers. It’s 3,2kb zipped.
Sigh. Finally I am really looking forward to get some peace of mind and significantly lower traffic bills from Amazon now. And it definitely feels really good again to replace troublemaking programs by a hand-crafted beautiful piece of specialiced software.
Still LB: You live and learn…
December 6th, 2008
Seems that DNS is not the cure I hoped for. I should have thought about this earlier, seems that my insight is a little bit late ;).
Explanation:
The replies that our own nameservers generate are either way reparsed by internet providers, and even with a TTL set to 0 some ISPs seem to cache anyway. Because of this is does not ultimately matter how small timeouts and what loadbalancing scheme our nameservers use -> the ISP’s caches understand round-robin only and will in some cases cache for at least 1 minute or so anyway. Of course it is possible to distribute the load to multiple servers and to implement basic failover and this is fine, but you cannot weight the different servers. For example my provider filters out all duplicate A records, so that it is not possible to do a weighting like 5xA, 2xB, 1xC. The aux config value in MyDNS is also useless because of the mentioned fact that ISP caches don’t even know the aux values and do thoughtless round-robin of the unweighted data only.
As a result of this I am trying LVS now but I am pretty sure that our hosting company will block the response packets – this is why I did not try it before. I decided to head for LVS with tunneling and I quickly set up a testing environment. Currently I am not sure if I am doing something wrong or the theory is correct, that response packets (which are sent on behalf of other physical machines) are blocked by the routers/switchports.
If all this stuff does not work, I am against my will and knowledge forced back to the http proxy/fastcgi backend SPOF stoneage cave. Depressing.
So…given the following facts….does anybody know another good hosting company where customer wishes have at least a minimal chance to become true? :-(
- HTTP proxies become ineconomic when it comes to more than X TB traffic/month per machine
- FastCGI backends suck because you are permanently fixing them (and includes point 1)
- I have no money for a double pack of high available hardware loadbalancers
- Even if I had, I have no idea where to locate them for small money (mostly because of traffic and the need to place all real servers in the same network…)
- DNS is not the cure, too (see above)
- Moreover I have no money to rent a nice cluster at Hosteurope or similar :-)
- At Hetzner it seems to be impossible to get some sort of a virtual IP or routing exception set up so that you cannot do anything even if you have the knowlege
- Apart from this Hetzner seems to be the best/keen hosting company available atm
- A fortiori I have no money for a leased dedicated line…and of course no location for this ;-)
Heh, it’s enough to drive you up the wall. Or am I just expecting too much for too few money? It could be so simple :-).
DNS Loadbalancing & Failover?
November 25th, 2008
For now I’m looking for an easy way to set up DNS loadbalancing with basic failover capabilities. The first thing you find on big G is MyDNS, an ageing DNS server that serves DNS records from a MySQL database. Setup is easily done the debian way, adding records with PHPMyAdmin takes its time but is no big problem and it is also not a big deal to set up a secondary NS when simply using MySQL replication. The only thing I don’t understand is the stupid weighting algorithm. It would have been so much better to implement something that is not unpredictable. Seems to become some fun to figure this out, hah. Anyway, I already set up a primary/secondary DNS with the most important records for a bunch of zones so that I am in need of a domain hosting company that enables me to define own NS entries. We will see and I will continue on this as soon as I have learned my lessons :-).
Update: Lesson 1: in-addr.arpa SOAs seem to require some kind of mask -> 0/7.111.133.213.in-addr.arpa. works but I have no idea what this is all about. I guess: 0-255 for fixed bits in the last byte? Or is it a real netmask? Heh? ^^
Update: Lesson 2: Ok, a big IN-ADDR.ARPA. SOA suffices and holds all the reverse records. So I don’t need to think about the mask anymore :-).
Update: Lesson 3: MyDNS comes with an administration interface written in PHP. It’s broken in conjunction with PHP 5.2.0 but changing only two lines of code fixes this (change $this to something else on line 2484 and 2485). Seems to be quite nice.
Another DNS server capable of using MySQL as a backend is PowerDNS, but this seems to be just too much for my needs. There even is commercial support available what definitely turns me off. => When MyDNS does not work out, I’ll try it.
And yeah, you are right, I implemented the DNS protocol on my own on top of Apache Mina not long ago exactly for this task but as there does not seem to be any interest in neither AsyncFCGI nor AsyncDNS, there is currently no motivation left to carry on the work. However, it has been some fun and you are still free to become interested ;-).
Happy coding!
Update: Ok, I tried out MyDNS and PowerDNS now. Both of them come with everything, a good DNS server needs. In detail: PowerDNS comes with much more advanced stuff like multiple backends, master/slave configuration etc. and has got a nice documentation about every aspect of the server. However, it is more complicated than MyDNS because of this. I like the administration interface shipped with MyDNS because it is really easy to understand, while PowerAdmin for PowerDNS will get frustrating over time. However, both servers don’t really satisfy me, so I may continue my own stuff or at least create a modern admin UI for one of them when I find the time. Stay tuned! :-)
Added Varnish, just works.
October 17th, 2008
Till today I used Lighttpd to distribute the load to multiple FastCGI backends. This worked “ok” until now but I’m really impressed about the performance gained by adding Varnish 2.0 (a high performance HTTP reverse proxy) in front of the web cluster. Additionally this is much easier than always fixing the FastCGI stuff when it is broken again.
However, Varnish 2.0 was released just 2 days ago, so there are still some issues. It seems that the random director (directors are used to distribute the load to multiple backend webservers) is broken currently but round-robin is working fine and should last for a while. As a workaround for using the random .weight option you can just add a backend more than once to the round-robin director.
There was also a problem with the varnishstat Munin plugin at Munin Exchange, so I uploaded a fixed version here and also added some installation instructions.
Do you wonder why Varnish seems not to cache anything?
For me (in fact I think for most people) the default varnish configuration is a bit too restrictive when it comes to cookies. All requests that contain a “Cookie” request header will not be cached – never. As soon as a cookie is set by the site, also all static files like images, scripts and styles will no longer be cached cause the cookie is sent along with every client request, even to static files. In most situations (for almost every site that requires a login or uses adsense or similar and sets a cookie) this will render Varnish absolutely useless. A better approach is to let the dynamic pages always set a cookie, so that a “Set-Cookie” response header is created every time. Varnish also will not cache when a “Set-Cookie” response header is present so we don’t need to care about the “Cookie” request header anymore. Knowing this, all we need to do is make Varnish ignore the cookies for static files – and this is easy:
# in vcl_recv
if (req.url ~ “\.(png|gif|jpg|swf|css|js)$”) {
unset req.http.Cookie;
}
This should work for 99% of all somewhat modern websites where users can log in. Because of this I think the documentation should mention this more clearly. Also don’t forget to set etag.use-inode = “disable” in lighttpd.conf to sync the ETags when using multiple backend servers. Anyhow, Varnish is great, so have fun! :)
What about a “Cluster Filesystem”?
May 28th, 2008
I have been striving around the open source scene for a cluster filesystem that fits to my needs for some time now. I found out about Redhat’s GFS, Apache’s Hadoop that seems to be very similar to Google’s Filesystem and I read a lot about GlusterFS that I did not attach much value to till today because there are no ready-made packages available and I suspected the term “GlusterFS” and the strange logo designs on it’s website a bit. I also struggled with NFS for some time and had a look at MogileFS.
However, it turns out that GluterFS is exactly what I need. Some thoughts about the different filesystems I read about:
Hadoop, which is written in Java, is made for very large files inside heavy computing tasks. It is not made for small files and development of a FUSE driver does not seem to be high priority. It also comes with a grid computing engine and is, as said, created for processing large amounts of data (e.g. big search engines, large datastores that do not change for long times i.e. videos) instead of small shared datastores. Therefore it is not an option.
I used NFS not long ago (for about a year) to distribute a single data store to all of our servers but it stands out in bad performance as far as I can tell. There are so many absurdities inside NFS that make it imho practically unusable in production environments. Our servers are located inside different datacenters that are hosted by the provider of our choice, but everytime when it came to minimal data loss in between the datacenters, NFS ran amok even when configured with TCP/IP. Another point was, that I was not able to load balance the NFS server on different machines because I did not want to use drdb on the partitions supplied by the hoster’s default debian installation to keep the possibility to be supported on heavy failures. NFS works to some extend but used with more than 3 servers it starts to mutate into a performance bottleneck and general headaces.
I am currently just using rsync to distribute the webserver root folder to all nodes. This may be the simplest way but it’s not meant to be a solution for the long term because update rate is bad and it gets even worse with every node added to the cluster. There is also no possibility to modify file contents or to add/remove files from somewhere else than the server - the nodes must operate read-only.
GFS turns out to be more a package of utilities to handle requests of different servers to a commonly shared storage. Because I need to build the infrastructure for our servers on top of default-configured cheap linux boxes, this is not an option, yet.
MogileFS is a filesystem living in userspace that is able to balance files between different storage nodes (e.g. 3 copies per file). Unfortunately it is written in Perl (does this perform?) and FUSE support seems to be an initially unintended byproduct in a very early stage. MogileFS is divided into differnt types of modules that hold the namespace inside a relational database (trackers) and the storage nodes. I am not convinced that MogileFS will scale and will be easy to use when used as a general filesystem with FUSE, but the idea behind MogileFS as an automatically balancing storage network is absolutely great.
GlusterFS (GNU Cluster Filesystem) is completely distributed with no single point of failure and as easy to maintain like NFS but has some major benefits when it comes to webserver clusters. It is possible to setup some boxes as the underlying datastore servers with automatic file replication done on client or server side. There is no special filesystem required to run it because it just sits on the already existing (ext3 in my case) partitions and exports directories in a similar manner as NFS. It also allows adding clients to the cluster without the need to copy around configuration files because the servers are able to submit the required configuration to connecting clients. However there are no official debian packages available yet to automate the installation tasks and you will need a patched (and newer) FUSE kernel module (at least on debian etch) to get it to work and to support distributed flock() calls (it will not configure the fuse client module against the default 2.5.bla FUSE in etch) [Update: Found some debian packages released by the GlusterFS guys on their homepage - may be worth a try; Update: These packages seem to be too old, but the maintainer has been informed - use the source instead: rpmbuild it (rpmbuild -bb glusterfs.spec) and convert it to deb (alien glusterfs-1.3.9.rpm, dpkg -i glusterfs_1.3.9.deb) but note that you will also need a recent FUSE]. There are also possibilites to automatically replicate files of specific types to big clusters (e.g. all *.jpg files replicated to at least 3 datastores with the unify translator). However, I’m quite new to it and ran it inside virtual boxes only yet but as far as I can tell it will scale easily by nature and supply a good infrastructure against data loss. I really wonder why GlusterFS has not got more attention, yet.
There are other clustered filesystems available for such clusters, too, but most of them seem to be outdated or are still in an early stage. If you know about another filesystem for my needs, feel free to comment and I’ll give it a look. So far I will try out GlusterFS in a real environment very soon if I do not find something better.
(P.S: Yes, technically versed people will say that a cluster filesystem like I’m describing does not ship with a functional data store, but I will use this term because this is what most people expect from it, including me.)
