8 memslap - Load testing and benchmarking tool for memcached
28 \ **memslap**\ is a load generation and benchmark tool for memcached(1)
29 servers. It generates configurable workload such as threads, concurrencies, connections,
30 run time, overwrite, miss rate, key size, value size, get/set proportion,
31 expected throughput, and so on. Furthermore, it also supports data
32 verification, expire-time verification, UDP, binary protocol, facebook test,
33 replication test, multi-get and reconnection, etc.
35 Memslap manages network connections like memcached with
36 libevent. Each thread of memslap is bound with a CPU core, all
37 the threads don't communicate with each other, and there are several socket
38 connections in each thread. Each connection keeps key size distribution,
39 value size distribution, and command distribution by itself.
41 You can specify servers via the \ **--servers**\ option or via the
42 environment variable \ ``MEMCACHED_SERVERS``\ .
50 Memslap is developed to for the following purposes:
53 Manages network connections with libevent asynchronously.
57 Set both TCP and UDP up to use non-blocking IO.
61 Improves parallelism: higher performance in multi-threads environments.
65 Improves time efficiency: faster processing speed.
69 Generates key and value more efficiently; key size distribution and value size distribution are configurable.
73 Supports get, multi-get, and set commands; command distribution is configurable.
77 Supports controllable miss rate and overwrite rate.
81 Supports data and expire-time verification.
85 Supports dumping statistic information periodically.
89 Supports thousands of TCP connections.
93 Supports binary protocol.
97 Supports facebook test (set with TCP and multi-get with UDP) and replication test.
107 Effective implementation of network.
108 ====================================
111 For memslap, both TCP and UDP use non-blocking network IO. All
112 the network events are managed by libevent as memcached. The network module
113 of memslap is similar to memcached. Libevent can ensure
114 memslap can handle network very efficiently.
117 Effective implementation of multi-threads and concurrency
118 =========================================================
121 Memslap has the similar implementation of multi-threads to
122 memcached. Memslap creates one or more self-governed threads;
123 each thread is bound with one CPU core if the system supports setting CPU
126 In addition, each thread has a libevent to manage the events of the network;
127 each thread has one or more self-governed concurrencies; and each
128 concurrency has one or more socket connections. All the concurrencies don’t
129 communicate with each other even though they are in the same thread.
131 Memslap can create thousands of socket connections, and each
132 concurrency has tens of socket connections. Each concurrency randomly or
133 sequentially selects one socket connection from its socket connection pool
134 to run, so memslap can ensure each concurrency handles one
135 socket connection at any given time. Users can specify the number of
136 concurrency and socket connections of each concurrency according to their
140 Effective implementation of generating key and value
141 ====================================================
144 In order to improve time efficiency and space efficiency,
145 memslap creates a random characters table with 10M characters. All the
146 suffixes of keys and values are generated from this random characters table.
148 Memslap uses the offset in the character table and the length
149 of the string to identify a string. It can save much memory.
150 Each key contains two parts, a prefix and a suffix. The prefix is an
151 uint64_t, 8 bytes. In order to verify the data set before,
152 memslap need to ensure each key is unique, so it uses the prefix to identify
153 a key. The prefix cannot include illegal characters, such as ‘\r’, ‘\n’,
154 ‘\0’ and ‘ ‘. And memslap has an algorithm to ensure that.
156 Memslap doesn’t generate all the objects (key-value pairs) at
157 the beginning. It only generates enough objects to fill the task window
158 (default 10K objects) of each concurrency. Each object has the following
159 basic information, key prefix, key suffix offset in the character table, key
160 length, value offset in the character table, and value length.
162 In the work process, each concurrency sequentially or randomly selects an
163 object from the window to do set operation or get operation. At the same
164 time, each concurrency kicks objects out of its window and adds new object
168 Simple but useful task scheduling
169 =================================
172 Memslap uses libevent to schedule all the concurrencies of
173 threads, and each concurrency schedules tasks based on the local task
174 window. Memslap assumes that if each concurrency keeps the same
175 key distribution, value distribution and commands distribution, from
176 outside, memslap keeps all the distribution as a whole.
177 Each task window includes a lot of objects, each object stores its basic
178 information, such as key, value, expire time, and so on. At any time, all
179 the objects in the window keep the same and fixed key and value
180 distribution. If an object is overwritten, the value of the object will be
181 updated. Memslap verifies the data or expire-time according to
182 the object information stored in the task window.
184 Libevent selects which concurrency to handle based on a specific network
185 event. Then the concurrency selects which command (get or set) to operate
186 based on the command distribution. If it needs to kick out an old object and
187 add a new object, in order to keep the same key and value distribution, the
188 new object must have the same key length and value length.
190 If memcached server has two cache layers (memory and SSD), running
191 memslap with different window sizes can get different cache
192 miss rates. If memslap adds enough objects into the windows at
193 the beginning, and the cache of memcached cannot store all the objects
194 initialized, then memslap will get some objects from the second
195 cache layer. It causes the first cache layer to miss. So the user can
196 specify the window size to get the expected miss rate of the first cache
200 Useful implementation of multi-servers , UDP, TCP, multi-get and binary protocol
201 ================================================================================
204 Because each thread is self-governed, memslap can assign
205 different threads to handle different memcached servers. This is just one of
206 the ways in which memslap supports multiple servers. The only
207 limitation is that the number of servers cannot be greater than the number
208 of threads. The other way to support multiple servers is for replication
209 test. Each concurrency has one socket connection to each memcached server.
210 For the implementation, memslap can set some objects to one
211 memcached server, and get these objects from the other servers.
213 By default, Memslap does single get. If the user specifies
214 multi-get option, memslap will collect enough get commands and
215 pack and send the commands together.
217 Memslap supports both the ASCII protocol and binary protocol,
218 but it runs on the ASCII protocol by default.
219 Memslap by default runs on the TCP protocol, but it also
220 supports UDP. Because UDP is unreliable, dropped packages and out-of-order
221 packages may occur. Memslap creates a memory buffer to handle
222 these problems. Memslap tries to read all the response data of
223 one command from the server and reorders the response data. If some packages
224 get lost, the waiting timeout mechanism can ensure half-baked packages will
225 be discarded and the next command will be sent.
234 Below are some usage samples:
237 memslap -s 127.0.0.1:11211 -S 5s
241 memslap -s 127.0.0.1:11211 -t 2m -v 0.2 -e 0.05 -b
245 memslap -s 127.0.0.1:11211 -F config -t 2m -w 40k -S 20s -o 0.2
249 memslap -s 127.0.0.1:11211 -F config -t 2m -T 4 -c 128 -d 20 -P 40k
253 memslap -s 127.0.0.1:11211 -F config -t 2m -d 50 -a -n 40
257 memslap -s 127.0.0.1:11211,127.0.0.1:11212 -F config -t 2m
261 memslap -s 127.0.0.1:11211,127.0.0.1:11212 -F config -t 2m -p 2
265 The user must specify one server at least to run memslap. The
266 rest of the parameters have default values, as shown below:
268 Thread number = 1 Concurrency = 16
270 Run time = 600 seconds Configuration file = NULL
272 Key size = 64 Value size = 1024
274 Get/set = 9:1 Window size = 10k
276 Execute number = 0 Single get = true
278 Multi-get = false Number of sockets of each concurrency = 1
280 Reconnect = false Data verification = false
282 Expire-time verification = false ASCII protocol = true
284 Binary protocol = false Dumping statistic information
288 Overwrite proportion = 0% UDP = false
290 TCP = true Limit throughput = false
292 Facebook test = false Replication test = false
294 Key size, value size and command distribution.
295 ==============================================
298 All the distributions are read from the configuration file specified by user
299 with “—cfg_cmd” option. If the user does not specify a configuration file,
300 memslap will run with the default distribution (key size = 64,
301 value size = 1024, get/set = 9:1). For information on how to edit the
302 configuration file, refer to the “Configuration File” section.
304 The minimum key size is 16 bytes; the maximum key size is 250 bytes. The
305 precision of proportion is 0.001. The proportion of distribution will be
306 rounded to 3 decimal places.
308 The minimum value size is 1 bytes; the maximum value size is 1M bytes. The
309 precision of proportion is 0.001. The proportion of distribution will be
310 rounded to 3 decimal places.
311 Currently, memslap only supports set and get commands. And it
312 supports 100% set and 100% get. For 100% get, it will preset some objects to
316 Multi-thread and concurrency
317 ============================
320 The high performance of memslap benefits from the special
321 schedule of thread and concurrency. It’s important to specify the proper
322 number of them. The default number of threads is 1; the default number of
323 concurrency is 16. The user can use “—threads” and “--concurrency” to
324 specify these variables.
326 If the system supports setting CPU affinity and the number of threads
327 specified by the user is greater than 1, memslap will try to
328 bind each thread to a different CPU core. So if you want to get the best
329 performance memslap, it is better to specify the number of
330 thread equal to the number of CPU cores. The number of threads specified by
331 the user can also be less or greater than the number of CPU cores. Because
332 of the limitation of implementation, the number of concurrencies could be
333 the multiple of the number of threads.
335 1. For 8 CPU cores system
339 --threads=2 --concurrency=128
341 --threads=8 --concurrency=128
343 --threads=8 --concurrency=256
345 --threads=12 --concurrency=144
347 2. For 16 CPU cores system
351 --threads=8 --concurrency=128
353 --threads=16 --concurrency=256
355 --threads=16 --concurrency=512
357 --threads=24 --concurrency=288
359 The memslap performs very well, when
360 used to test the performance of memcached servers.
361 Most of the time, the bottleneck is the network or
362 the server. If for some reason the user wants to
363 limit the performance of memslap, there
364 are two ways to do this:
366 Decrease the number of threads and concurrencies.
367 Use the option “--tps” that memslap
368 provides to limit the throughput. This option allows
369 the user to get the expected throughput. For
370 example, assume that the maximum throughput is 50
371 kops/s for a specific configuration, you can specify
372 the throughput equal to or less than the maximum
373 throughput using “--tps” option.
380 Most of the time, the user does not need to specify the window size. The
381 default window size is 10k. For Schooner Memcached, the user can specify
382 different window sizes to get different cache miss rates based on the test
383 case. Memslap supports cache miss rate between 0% and 100%.
384 If you use this utility to test the performance of Schooner Memcached, you
385 can specify a proper window size to get the expected cache miss rate. The
386 formula for calculating window size is as follows:
388 Assume that the key size is 128 bytes, and the value size is 2048 bytes, and
391 1. Small cache cache_size=1M, 100% cache miss (all data get from SSD).
396 (1). cache miss rate 0%
400 (2). cache miss rate 5%
406 (1). cache miss rate 0%
416 The formula for calculating window size for cache miss rate 0%:
418 cache_size / concurrency / (key_size + value_size) \* 0.5
420 The formula for calculating window size for cache miss rate 5%:
422 cache_size / concurrency / (key_size + value_size) \* 0.7
429 Memslap supports both data verification and expire-time
430 verification. The user can use "--verify=" or "-v" to specify the proportion
431 of data verification. In theory, it supports 100% data verification. The
432 user can use "--exp_verify=" or "-e" to specify the proportion of
433 expire-time verification. In theory, it supports 100% expire-time
434 verification. Specify the "--verbose" options to get more detailed error
437 For example: --exp_verify=0.01 –verify=0.1 , it means that 1% of the objects
438 set with expire-time, 10% of the objects gotten will be verified. If the
439 objects are gotten, memslap will verify the expire-time and
443 multi-servers and multi-clients
444 ===============================
447 Memslap supports multi-servers based on self-governed thread.
448 There is a limitation that the number of servers cannot be greater than the
449 number of threads. Memslap assigns one thread to handle one
450 server at least. The user can use the "--servers=" or "-s" option to specify
455 --servers=10.1.1.1:11211,10.1.1.2:11212,10.1.1.3:11213 --threads=6 --concurrency=36
457 The above command means that there are 6 threads, with each thread having 6
458 concurrencies and that threads 0 and 3 handle server 0 (10.1.1.1); threads 1
459 and 4 handle server 1 (10.1.1.2); and thread 2 and 5 handle server 2
462 All the threads and concurrencies in memslap are self-governed.
464 So is memslap. The user can start up several
465 memslap instances. The user can run memslap on different client
466 machines to communicate with the same memcached server at the same. It is
467 recommended that the user start different memslap on different
468 machines using the same configuration.
471 Run with execute number mode or time mode
472 =========================================
475 The default memslap runs with time mode. The default run time
476 is 10 minutes. If it times out, memslap will exit. Do not
477 specify both execute number mode and time mode at the same time; just
482 --time=30s (It means the test will run 30 seconds.)
484 --execute_number=100000 (It means that after running 100000 commands, the test will exit.)
487 Dump statistic information periodically.
488 ========================================
491 The user can use "--stat_freq=" or "-S" to specify the frequency.
497 Memslap will dump the statistics of the commands (get and set) at the frequency of every 20
500 For more information on the format of dumping statistic information, refer to “Format of Output” section.
507 The user can use "--division=" or "-d" to specify multi-get keys count.
508 Memslap by default does single get with TCP. Memslap also supports data
509 verification and expire-time verification for multi-get.
511 Memslap supports multi-get with both TCP and UDP. Because of
512 the different implementation of the ASCII protocol and binary protocol,
513 there are some differences between the two. For the ASCII protocol,
514 memslap sends one “multi-get” to the server once. For the
515 binary protocol, memslap sends several single get commands
516 together as “multi-get” to the server.
523 Memslap supports both UDP and TCP. For TCP,
524 memslap does not reconnect the memcached server if socket connections are
525 lost. If all the socket connections are lost or memcached server crashes,
526 memslap will exit. If the user specifies the “--reconnect”
527 option when socket connections are lost, it will reconnect them.
529 User can use “--udp” to enable the UDP feature, but UDP comes with some
532 UDP cannot set data more than 1400 bytes.
534 UDP is not supported by the binary protocol because the binary protocol of
535 memcached does not support that.
537 UDP doesn’t support reconnection.
544 Set data with TCP and multi-get with UDP. Specify the following options:
546 "--facebook --division=50"
548 If you want to create thousands of TCP connections, specify the
550 "--conn_sock=" option.
552 For example: --facebook --division=50 --conn_sock=200
554 The above command means that memslap will do facebook test,
555 each concurrency has 200 socket TCP connections and one UDP socket.
557 Memslap sets objects with the TCP socket, and multi-gets 50
558 objects once with the UDP socket.
560 If you specify "--division=50", the key size must be less that 25 bytes
561 because the UDP packet size is 1400 bytes.
568 For replication test, the user must specify at least two memcached servers.
569 The user can use “—rep_write=” option to enable feature.
573 --servers=10.1.1.1:11211,10.1.1.2:11212 –rep_write=2
575 The above command means that there are 2 replication memcached servers,
576 memslap will set objects to both server 0 and server 1, get
577 objects which are set to server 0 before from server 1, and also get objects
578 which are set to server 1 before from server 0. If server 0 crashes,
579 memslap will only get objects from server 1. If server 0 comes
580 back to life again, memslap will reconnect server 0. If both
581 server 0 and server 1 crash, memslap will exit.
584 Supports thousands of TCP connections
585 =====================================
588 Start memslap with "--conn_sock=" or "-n" to enable this
589 feature. Make sure that your system can support opening thousands of files
590 and creating thousands of sockets. However, this feature does not support
591 reconnection if sockets disconnect.
595 --threads=8 --concurrency=128 --conn_sock=128
597 The above command means that memslap starts up 8 threads, each
598 thread has 16 concurrencies, each concurrency has 128 TCP socket
599 connections, and the total number of TCP socket connections is 128 \* 128 =
603 Supports binary protocol
604 ========================
607 Start memslap with "--binary" or "-B" options to enable this
608 feature. It supports all the above features except UDP, because the latest
609 memcached 1.3.3 does not implement binary UDP protocol.
615 Since memcached 1.3.3 doesn't implement binary UDP protocol,
616 memslap does not support UDP. In addition, memcached 1.3.3 does not support
617 multi-get. If you specify "--division=50" option, it just sends 50 get
618 commands together as “mulit-get” to the server.
627 This section describes the format of the configuration file. By default
628 when no configuration file is specified memslap reads the default
629 one located at ~/.memslap.cnf.
631 Below is a sample configuration file:
636 ***************************************************************************
637 #comments should start with '#'
639 #start_len end_len proportion
641 #key length range from start_len to end_len
642 #start_len must be equal to or greater than 16
643 #end_len must be equal to or less than 250
644 #start_len must be equal to or greater than end_len
645 #memslap will generate keys according to the key range
646 #proportion: indicates keys generated from one range accounts for the total
649 #example1: key range 16~100 accounts for 80%
650 # key range 101~200 accounts for 10%
651 # key range 201~250 accounts for 10%
652 # total should be 1 (0.8+0.1+0.1 = 1)
658 #example2: all keys length are 128 bytes
664 #start_len end_len proportion
666 #value length range from start_len to end_len
667 #start_len must be equal to or greater than 1
668 #end_len must be equal to or less than 1M
669 #start_len must be equal to or greater than end_len
670 #memslap will generate values according to the value range
671 #proportion: indicates values generated from one range accounts for the
672 total generated values
674 #example1: value range 1~1000 accounts for 80%
675 # value range 1001~10000 accounts for 10%
676 # value range 10001~100000 accounts for 10%
677 # total should be 1 (0.8+0.1+0.1 = 1)
683 #example2: all value length are 128 bytes
689 #cmd_type cmd_proportion
691 #currently memslap only supports get and set command.
697 #example: set command accounts for 50%
698 # get command accounts for 50%
699 # total should be 1 (0.5+0.5 = 1)
715 At the beginning, memslap displays some configuration information as follows:
718 servers : 127.0.0.1:11211
738 set proportion: set_prop=0.10
742 get proportion: get_prop=0.90
753 The servers used by memslap.
759 The number of threads memslap runs with.
765 The number of concurrencies memslap runs with.
771 How long to run memslap.
777 The task window size of each concurrency.
783 The proportion of set command.
789 The proportion of get command.
793 The output of dynamic statistics is something like this:
798 ---------------------------------------------------------------------------------------------------------------------------------
800 Type Time(s) Ops TPS(ops/s) Net(M/s) Get_miss Min(us) Max(us)
801 Avg(us) Std_dev Geo_dist
802 Period 5 345826 69165 65.3 0 27 2198 203
804 Global 20 1257935 62896 71.8 0 26 3791 224
809 Type Time(s) Ops TPS(ops/s) Net(M/s) Get_miss Min(us) Max(us)
810 Avg(us) Std_dev Geo_dist
811 Period 5 38425 7685 7.3 0 42 628 240
813 Global 20 139780 6989 8.0 0 37 3790 253
818 Type Time(s) Ops TPS(ops/s) Net(M/s) Get_miss Min(us) Max(us)
819 Avg(us) Std_dev Geo_dist
820 Period 5 384252 76850 72.5 0 27 2198 207
822 Global 20 1397720 69886 79.7 0 26 3791 227
824 ---------------------------------------------------------------------------------------------------------------------------------
835 Statistics information of get command
841 Statistics information of set command
847 Statistics information of both get and set command
853 Result within a period
871 Throughput, operations/second
883 How many objects can’t be gotten
889 The minimum response time
895 The maximum response time
901 The average response time
907 Standard deviation of response time
913 Geometric distribution based on natural exponential function
917 At the end, memslap will output something like this:
922 ---------------------------------------------------------------------------------------------------------------------------------
923 Get Statistics (1257956 events)
931 8: 484890 459823 12543 824
934 Set Statistics (139782 events)
942 8: 50784 65574 2064 167
945 Total Statistics (1397738 events)
953 8: 535674 525397 14607 991
963 written_bytes: 242516030
964 read_bytes: 1003702556
965 object_bytes: 152086080
970 Run time: 20.0s Ops: 1397754 TPS: 69817 Net_rate: 59.4M/s
971 ---------------------------------------------------------------------------------------------------------------------------------
982 Get statistics of response time
988 Set statistics of response time
994 Both get and set statistics of response time
1000 The accumulated and minimum response time
1006 The accumulated and maximum response time
1012 The accumulated and average response time
1018 Standard deviation of response time
1024 Geometric distribution based on logarithm 2
1030 Total get commands done
1036 Total set commands done
1042 How many objects can’t be gotten from server
1048 How many objects need to verify but can’t get them
1054 How many objects with insistent value
1060 How many objects are expired but we get them
1066 How many objects are unexpired but we can’t get them
1090 How many UDP packages are disorder
1096 How many UDP packages are lost
1102 How many times UDP time out happen
1120 Throughput, operations/second
1126 The average rate of network
1138 List one or more servers to connect. Servers count must be less than
1139 threads count. e.g.: --servers=localhost:1234,localhost:11211
1142 Number of threads to startup, better equal to CPU numbers. Default 8.
1145 Number of concurrency to simulate with load. Default 128.
1148 Number of TCP socks per concurrency. Default 1.
1150 -x, --execute_number=
1151 Number of operations(get and set) to execute for the
1152 given test. Default 1000000.
1155 How long the test to run, suffix: s-seconds, m-minutes, h-hours,
1156 d-days e.g.: --time=2h.
1159 Load the configure file to get command,key and value distribution list.
1162 Task window size of each concurrency, suffix: K, M e.g.: --win_size=10k.
1166 Fixed length of value.
1169 The proportion of date verification, e.g.: --verify=0.01
1172 Number of keys to multi-get once. Default 1, means single get.
1175 Frequency of dumping statistic information. suffix: s-seconds,
1176 m-minutes, e.g.: --resp_freq=10s.
1179 The proportion of objects with expire time, e.g.: --exp_verify=0.01.
1180 Default no object with expire time
1183 The proportion of objects need overwrite, e.g.: --overwrite=0.01.
1184 Default never overwrite object.
1187 Reconnect support, when connection is closed it will be reconnected.
1190 UDP support, default memslap uses TCP, TCP port and UDP port of
1191 server must be same.
1194 Whether it enables facebook test feature, set with TCP and multi-get with UDP.
1197 Whether it enables binary protocol. Default with ASCII protocol.
1200 Expected throughput, suffix: K, e.g.: --tps=10k.
1203 The first nth servers can write data, e.g.: --rep_write=2.
1206 Whether it outputs detailed information when verification fails.
1209 Display this message and then exit.
1212 Display the version of the application and then exit.
1220 memslap -s 127.0.0.1:11211 -S 5s
1222 memslap -s 127.0.0.1:11211 -t 2m -v 0.2 -e 0.05 -b
1224 memslap -s 127.0.0.1:11211 -F config -t 2m -w 40k -S 20s -o 0.2
1226 memslap -s 127.0.0.1:11211 -F config -t 2m -T 4 -c 128 -d 20 -P 40k
1228 memslap -s 127.0.0.1:11211 -F config -t 2m -d 50 -a -n 40
1230 memslap -s 127.0.0.1:11211,127.0.0.1:11212 -F config -t 2m
1232 memslap -s 127.0.0.1:11211,127.0.0.1:11212 -F config -t 2m -p 2
1240 To find out more information please check:
1241 `http://launchpad.org/libmemcached <http://launchpad.org/libmemcached>`_
1249 Mingqiang Zhuang <mingqiangzhuang@hengtiansoft.com> (Schooner Technolgy)
1250 Brian Aker, <brian@tangent.org>
1258 memcached(1) libmemcached(3)