E-mail
servers, data base servers, file servers, web servers struggle to cope
with LATENCY on accessing millions of small files under heavy concurrent
load.
Low latency is key factor to a good user experience.
With the Debian GNU / Linux 6.x kernel tuning, combined with
the previous article hints about XenServer I/O latency, tuning
filesystems configuring multipath, and data storage, we reduced
eighty-four times our virtual block device latency at our environment.
E-mail servers, data base servers, file servers, web servers struggle
to cope with LATENCY on accessing millions of small files under heavy
concurrent load.
Low latency is key factor to a good user experience.
With the Debian GNU / Linux 6.x kernel tuning, combined with
the previous article hints about XenServer I/O latency, tuning
filesystems configuring multipath, and data storage, we reduced
eighty-four times our virtual block device latency at our environment.
Thousands concurrent synchronous writes of random small files on
filesystems with millions of them while honoring write barriers for
POSIX compliance, stress the limits of the data storage, file systems,
kernel I/O, virtualization I/O, hardware data paths.
These are our findings and Linux kernel tunings in Cyrus IMAP e-mail
servers grid under heavy load production since November 2011.
First of all, do not blindly apply these settings at your servers.
Read the whole bibliography to understand what you are about to tune.
Understand what each parameter is doing.
Understand your server workload profile pattern.
Understand your hardware and storage performance and behaviour.
Understand your Fiber Channel SAN or iSCSI data path and network segment behaviour.
OUR environment has servers using Fiber Channel HBA multipath
connected to FC disks WAFL Data Storage with very low latency and lots
of ECC NVRAM for cache. It almost behaves as a giant SSD storage from
latency standpoint.
This is important: if you are going to “flush” your servers kernel
buffers as fast they could, you data storage and data path should be
able to cope with such high IOPS and trough output.
You could even get
out-of-memory errors caused
by high latency storage hardware. The linked article author gone the
opposite direction in tuning due to HIS hardware environment.
Read YOUR deployed kernel version documentation. The links below are for the latest kernel.org version.
http://www.kernel.org/doc/Documentation/block/queue-sysfs.txt
http://www.kernel.org/doc/Documentation/block/deadline-iosched.txt
http://www.kernel.org/doc/Documentation/sysctl/vm.txt
http://www.nextre.it/oracledocs/ioscheduler_01.html
Also, the net result is only as fast as your slowest link at your application stack, hardware and network stacks.
Carefully configure your application stack for low latency and performance.
At other articles we covered filesystem tuning, data storage LUN
configurations, and some aspects of Cyrus IMAP configurations. More on
future articles about performance tuning.
You will carefully watch the iostat output during the tests, looking for the bottlenecks.
Read the previous article bibliography and test results about XenServer I/O latency.
Learn how to correlate the output columns with the kernel parameters
and your data storage behaviour. Read the manpage of iostat.
iostat -dmxthN 1 /dev/xvd*
Non persistent between reboots configuration
Use these during your test, tuning and evaluation phase.
Logged as root:
sysctl vm.swappiness=10
sysctl vm.dirty_background_ratio=1
sysctl vm.dirty_expire_centisecs=500
sysctl vm.dirty_ratio=15
sysctl vm.dirty_writeback_centisecs=100
cat /sys/block/xvdb/queue/nr_requests
echo "4" > /sys/block/xvdb/queue/nr_requests
cat /sys/block/xvdb/queue/nr_requests
cat /sys/block/xvdb/queue/scheduler
echo deadline > /sys/block/xvdb/queue/scheduler
cat /sys/block/xvdb/queue/scheduler
cat /sys/block/xvdb/queue/iosched/front_merges
echo 1 >/sys/block/xvdb/queue/iosched/front_merges
cat /sys/block/xvdb/queue/iosched/front_merges
cat /sys/block/xvdb/queue/iosched/fifo_batch
echo 1 >/sys/block/xvdb/queue/iosched/fifo_batch
cat /sys/block/xvdb/queue/iosched/fifo_batch
A very aggressive configuration (watch your %sys during test phase):
sysctl vm.dirty_expire_centisecs=50
sysctl vm.dirty_writeback_centisecs=10
sysctl vm.dirty_background_ratio=0
Debian GNU/Linux configuration persistent between reboots
You will need the “sysfsutils” package installed.
Append to your /etc/sysfs.conf
#AFM 20120523
block/xvdb/queue/nr_requests = 4
block/xvdb/queue/scheduler = deadline
block/xvdb/queue/iosched/front_merges = 1
block/xvdb/queue/iosched/fifo_batch = 1
Append to your /etc/sysctl.conf
#AFM 20120523
vm.swappiness = 10
vm.dirty_background_ratio = 1
vm.dirty_expire_centisecs = 500
vm.dirty_ratio = 15
vm.dirty_writeback_centisecs = 100
Some default values for the deadline i/o scheduler
For Debian GNU / Linux 6.x Squeeze, kernel 2.6.32
cat /sys/block/xvdb/queue/iosched/fifo_batch
16
cat /sys/block/xvdb/queue/iosched/front_merges
1
cat /sys/block/xvdb/queue/iosched/read_expire
500
cat /sys/block/xvdb/queue/iosched/write_expire
5000
cat /sys/block/xvdb/queue/iosched/writes_starved
2
Bibliography
kswapd0 , ksoftirqd0
http://forums.citrix.com/message.jspa?messageID=1511105 ************************** tuning de kernel para arquivos pequenos
http://askubuntu.com/questions/7858/why-is-ksoftirqd-0-process-using-all-my-cpu *
http://askubuntu.com/questions/73639/what-are-kswapd0-kworker-numnum-ksoftirqd-num *
http://www.mail-archive.com/rhelv5-list@redhat.com/msg05345.html
http://lkml.indiana.edu/hypermail/linux/kernel/0506.0/1150.html
http://grokbase.com/t/centos.org/centos/2011/05/centos-kswapd-taking-100-cpu-with-no-swap-on-system/07g622yntfakij4lsruvrqxyjeom
http://answers.softpicks.net/answers/topic/High-CPU-load-with-kswapd-and-heavy-disk-I-O-1747240-1.htm
https://bugzilla.redhat.com/show_bug.cgi?id=115438
http://www.spinics.net/lists/linux-mm/msg16292.html
swappiness
http://feedblog.org/2006/09/27/stupid-linux-swap-tricks-with-swappiness/
http://www.linuxvox.com/2009/10/what-is-the-linux-kernel-parameter-vm-swappiness/
https://help.ubuntu.com/community/SwapFaq
http://kerneltrap.org/node/3000
http://lists.centos.org/pipermail/centos/2011-May/111433.html
https://bugzilla.redhat.com/show_bug.cgi?id=135312
https://bugs.launchpad.net/ubuntu/+bug/721896
https://bugzilla.kernel.org/show_bug.cgi?id=12309
http://us.generation-nt.com/kswapd-causing-giant-load-help-179939921.html
http://www.linuxquestions.org/questions/linux-server-73/100-swap-usage-with-loads-of-free-memory-849452/
https://www.centos.org/modules/newbb/viewtopic.php?topic_id=28555
https://bugzilla.redhat.com/show_bug.cgi?id=437202
http://kerneltrap.org/mailarchive/linux-kernel/2007/9/24/294490
http://www.linuxforums.org/forum/kernel/65380-what-does-kswapd0-do.html
Linux kernel tuning for low latency block devices and small files
http://www.cyrius.com/debian/nslu2/linux-on-flash.html
http://www.kernel.org/doc/Documentation/sysctl/vm.txt ****************
http://serverfault.com/questions/126413/limit-linux-background-flush-dirty-pages ******
http://www.westnet.com/~gsmith/content/linux-pdflush.htm ****************
http://www.redhat.com/archives/linux-lvm/2004-February/msg00115.html ****
http://www.fccps.cz/download/adv/frr/hdd/hdd.html#tuning-intro ***************
http://www.serverphorums.com/read.php?12,115991
http://www.r71.nl/kb/technical/102-tuning-journaling-file-systems
http://support.citrix.com/article/CTX127065
http://forums.citrix.com/thread.jspa?messageID=1487300
http://knol.google.com/k/linux-performance-tuning-and-measurement *********
http://kerneltrap.org/node/4462 ***
http://www.mjmwired.net/kernel/Documentation/block/barrier.txt *****
http://en.wikipedia.org/wiki/Slab_allocation *
http://www.ibm.com/developerworks/linux/library/l-linux-slab-allocator/
http://www.win.tue.nl/~aeb/linux/lk/lk-9.html **
http://www.fccps.cz/download/adv/frr/hdd/hdd.html ****
http://makarevitch.org/rant/raid/
http://www.softpanorama.org/Admin/performance_monitoring.shtml
http://www.linuxinsight.com/proc_sys_vm_vfs_cache_pressure.html *******
http://rudd-o.com/en/linux-and-free-software/tales-from-responsivenessland-why-linux-feels-slow-and-how-to-fix-that
https://bbs.archlinux.org/viewtopic.php?id=40111
http://duopetalflower.blogspot.com/2009/11/tuning-ubuntu-910-karmic-for-speed-and.html
http://www.pythian.com/news/247/basic-io-monitoring-on-linux/ ***
http://www.informit.com/articles/article.aspx?p=481867 ***
http://bhavin.directi.com/iostat-and-disk-utilization-monitoring-nirvana/ ****
http://www.xaprb.com/blog/2009/08/23/how-to-find-per-process-io-statistics-on-linux/
http://www.faqs.org/docs/linux_admin/buffer-cache.html ***********
http://docs.vmd.citrix.com/XenServer/5.0.0/1.0/en_gb/reference.html#disk_qos
http://docs.vmd.citrix.com/XenServer/5.0.0/1.0/en_gb/reference.html#disk_scheduler
linux kernel tuning for low latency
http://quantlabs.net/blog/2011/09/quant-development-choosing-and-configuring-linux-for-low-latency/
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2/html/Realtime_Tuning_Guide/chap-Realtime_Tuning_Guide-General_System_Tuning.html *************************************
http://www.redhat.com/solutions/financial/trade_performance.html ****
http://www.fizyka.umk.pl/~jkob/prace-mag/cfs-tuning.pdf ****************
http://www.ibm.com/developerworks/linux/library/l-cfs/ *******************
http://www.softpanorama.org/Commercial_linuxes/performance_tuning.shtml ********
http://lse.sourceforge.net/io/aio.html libaio *************
http://www.redhat.com/f/pdf/summit/RedHatEnterprisePerfTuning.pdf
http://doc.opensuse.org/products/opensuse/openSUSE/opensuse-tuning/cha.tuning.network.html ******
http://www.psc.edu/networking/projects/tcptune *******************
http://www.redbooks.ibm.com/redpapers/abstracts/redp4285.html ************
http://www.linuxvox.com/2009/11/what-is-the-linux-kernel-parameter-tcp_low_latency/ ****
http://hackingnasdaq.blogspot.com/2010/01/myth-of-procsysnetipv4tcplowlatency.html ************************
block devices
https://computing.llnl.gov/linux/ucrl-id-144213.html ****
http://dsstos.blogspot.com/2009/07/map-disk-block-devices-on-linux-host.html
ftp://ftp.ddnsupport.com/pub/da66ba4d/DCS9550_FC_LINUX_FAQ_v1.6.pdf **
http://www.puschitz.com/TuningLinuxForOracle.shtml#IOScheduler
http://linux.web.cern.ch/linux/scientific5/docs/rhel/Online_Storage_Reconfiguration_Guide.pdf
http://www.monperrus.net/martin/scheduler+queue+size+and+resilience+to+heavy+IO *
http://yoshinorimatsunobu.blogspot.com/2009/04/linux-io-scheduler-queue-size-and.html *
http://talk.maemo.org/showthread.php?t=69973 ***
http://www.redhat.com/magazine/008jun05/features/schedulers/ **
http://www.softpanorama.org/Commercial_linuxes/Performance_tuning/disk_subsystem_tuning.shtml
http://www.ibm.com/developerworks/wikis/display/LinuxP/Performance+Tuning
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Oracle_Tuning_Guide/RHELTuningandOptimizationforOracleV11.pdf***********
http://www.redhat.com/f/pdf/summit/RedHatEnterprisePerfTuning.pdf *******
http://www.redbooks.ibm.com/redpapers/pdfs/redp4285.pdf *******
search on the web:
- Whitepapers Red Hat Enterprise Linux 5 IO Tuning Guide Performance Tuning Whitepaper for Red Hat Enterprise Linux 5.2
http://www.open-mag.com/features/Vol_98/QLogic/QLogic.htm **************
http://filedownloads.qlogic.com/files/ms/74234/R2_QLogic_Edits_SC_VMM2008_PRO_Solution_Brief.pdf
http://searchstorage.techtarget.com/tip/QLogics-2GB-s-switches-missed-the-kudos-they-deserved
http://www.softpanorama.org/Commercial_linuxes/Performance_tuning/troubleshooting_linux_performance_issues.shtml*******
http://www.petroskoutoupis.com/lib/PetrosKoutoupis_EN32009.pdf
http://wikis.sun.com/download/attachments/216507788/03-ST6000-Host-Attach-Linux-RalfWerner.pdf ***
http://kernelnewbies.org/Linux_2_6_32#head-efc263362c0c9f40eb83bb2d5a224057ebf7a59f
http://support.sas.com/resources/papers/proceedings11/72480_RHEL6_Tuning_Tips.pdf
http://www.ufsdump.org/papers/io-tuning.pdf ****
http://www.mjmwired.net/kernel/Documentation/block/deadline-iosched.txt **********
http://kerneltrap.org/node/431 **********
http://docs.redhat.com/docs/pt-BR/Red_Hat_Enterprise_Linux/index.html *
http://support.sas.com/resources/papers/proceedings10/FAQforStorageConfiguration.pdf ***
http://support.sas.com/rnd/papers/sgf07/sgf2007-iosubsystem.pdf ***
http://filedownloads.qlogic.com/files/driver/35491/README_qla2xxx2-6.htm *******
http://publib.boulder.ibm.com/infocenter/storwize/ic/index.jsp?topic=%2Fcom.ibm.storwize.v7000.doc%2Fsvc_linsetqdepth_1dcv4w.html **
http://virtualizationinformation.com/docs/PerformanceTuningVI3withNote.pdf ***
http://www.redhat.com/f/pdf/RHEL4_EQL_ORA10G_v3.pdf ***** benchmarking comparação iSCSI x Fiber Channel
http://phx.corporate-ir.net/phoenix.zhtml?c=85695&p=irol-newsArticle&ID=211206&highlight= **** especificações da qla 2300.
http://www.datasheetarchive.com/qlogic%202300-datasheet.html **** mais especificações da qla 2300
http://www.nextre.it/oracledocs/ioscheduler_01.html ************
http://talk.maemo.org/showthread.php?t=70073 *******
http://www.citrix.nl/site/resources/dynamic/partnerDocs/XSandNetAppstoragebestpractices_7.15.10.pdf Citrix XenServer and NetApp Storage Best Practices
http://docs.vmd.citrix.com/XenServer/5.5.0/1.0/en_gb/reference.htm
http://www.virtualistic.nl/archives/673 The complete Q&A from the Citrix XEN Masterclass webinar
Comentários
Postar um comentário