3 memslap - Load testing and benchmarking tool for memcached
11 B<memslap> is a load generation and benchmark tool for memcached(1)
12 servers. It generates configurable workload such as threads, concurrencies, connections,
13 run time, overwrite, miss rate, key size, value size, get/set proportion,
14 expected throughput, and so on. Furthermore, it also supports data
15 verification, expire-time verification, UDP, binary protocol, facebook test,
16 replication test, multi-get and reconnection, etc.
18 Memslap manages network connections like memcached with
19 libevent. Each thread of memslap is bound with a CPU core, all
20 the threads don't communicate with each other, and there are several socket
21 connections in each thread. Each connection keeps key size distribution,
22 value size distribution, and command distribution by itself.
24 You can specify servers via the B<--servers> option or via the
25 environment variable C<MEMCACHED_SERVERS>.
30 Memslap is developed to for the following purposes:
34 =item Manages network connections with libevent asynchronously.
36 =item Set both TCP and UDP up to use non-blocking IO.
38 =item Improves parallelism: higher performance in multi-threads environments.
40 =item Improves time efficiency: faster processing speed.
42 =item Generates key and value more efficiently; key size distribution and value size distribution are configurable.
44 =item Supports get, multi-get, and set commands; command distribution is configurable.
46 =item Supports controllable miss rate and overwrite rate.
48 =item Supports data and expire-time verification.
50 =item Supports dumping statistic information periodically.
52 =item Supports thousands of TCP connections.
54 =item Supports binary protocol.
56 =item Supports facebook test (set with TCP and multi-get with UDP) and replication test.
62 =head2 Effective implementation of network.
64 For memslap, both TCP and UDP use non-blocking network IO. All
65 the network events are managed by libevent as memcached. The network module
66 of memslap is similar to memcached. Libevent can ensure
67 memslap can handle network very efficiently.
69 =head2 Effective implementation of multi-threads and concurrency
71 Memslap has the similar implementation of multi-threads to
72 memcached. Memslap creates one or more self-governed threads;
73 each thread is bound with one CPU core if the system supports setting CPU
76 In addition, each thread has a libevent to manage the events of the network;
77 each thread has one or more self-governed concurrencies; and each
78 concurrency has one or more socket connections. All the concurrencies don’t
79 communicate with each other even though they are in the same thread.
81 Memslap can create thousands of socket connections, and each
82 concurrency has tens of socket connections. Each concurrency randomly or
83 sequentially selects one socket connection from its socket connection pool
84 to run, so memslap can ensure each concurrency handles one
85 socket connection at any given time. Users can specify the number of
86 concurrency and socket connections of each concurrency according to their
89 =head2 Effective implementation of generating key and value
91 In order to improve time efficiency and space efficiency,
92 memslap creates a random characters table with 10M characters. All the
93 suffixes of keys and values are generated from this random characters table.
95 Memslap uses the offset in the character table and the length
96 of the string to identify a string. It can save much memory.
97 Each key contains two parts, a prefix and a suffix. The prefix is an
98 uint64_t, 8 bytes. In order to verify the data set before,
99 memslap need to ensure each key is unique, so it uses the prefix to identify
100 a key. The prefix cannot include illegal characters, such as ‘\r’, ‘\n’,
101 ‘\0’ and ‘ ‘. And memslap has an algorithm to ensure that.
103 Memslap doesn’t generate all the objects (key-value pairs) at
104 the beginning. It only generates enough objects to fill the task window
105 (default 10K objects) of each concurrency. Each object has the following
106 basic information, key prefix, key suffix offset in the character table, key
107 length, value offset in the character table, and value length.
109 In the work process, each concurrency sequentially or randomly selects an
110 object from the window to do set operation or get operation. At the same
111 time, each concurrency kicks objects out of its window and adds new object
114 =head2 Simple but useful task scheduling
116 Memslap uses libevent to schedule all the concurrencies of
117 threads, and each concurrency schedules tasks based on the local task
118 window. Memslap assumes that if each concurrency keeps the same
119 key distribution, value distribution and commands distribution, from
120 outside, memslap keeps all the distribution as a whole.
121 Each task window includes a lot of objects, each object stores its basic
122 information, such as key, value, expire time, and so on. At any time, all
123 the objects in the window keep the same and fixed key and value
124 distribution. If an object is overwritten, the value of the object will be
125 updated. Memslap verifies the data or expire-time according to
126 the object information stored in the task window.
128 Libevent selects which concurrency to handle based on a specific network
129 event. Then the concurrency selects which command (get or set) to operate
130 based on the command distribution. If it needs to kick out an old object and
131 add a new object, in order to keep the same key and value distribution, the
132 new object must have the same key length and value length.
134 If memcached server has two cache layers (memory and SSD), running
135 memslap with different window sizes can get different cache
136 miss rates. If memslap adds enough objects into the windows at
137 the beginning, and the cache of memcached cannot store all the objects
138 initialized, then memslap will get some objects from the second
139 cache layer. It causes the first cache layer to miss. So the user can
140 specify the window size to get the expected miss rate of the first cache
143 =head2 Useful implementation of multi-servers , UDP, TCP, multi-get and binary protocol
145 Because each thread is self-governed, memslap can assign
146 different threads to handle different memcached servers. This is just one of
147 the ways in which memslap supports multiple servers. The only
148 limitation is that the number of servers cannot be greater than the number
149 of threads. The other way to support multiple servers is for replication
150 test. Each concurrency has one socket connection to each memcached server.
151 For the implementation, memslap can set some objects to one
152 memcached server, and get these objects from the other servers.
154 By default, Memslap does single get. If the user specifies
155 multi-get option, memslap will collect enough get commands and
156 pack and send the commands together.
158 Memslap supports both the ASCII protocol and binary protocol,
159 but it runs on the ASCII protocol by default.
160 Memslap by default runs on the TCP protocol, but it also
161 supports UDP. Because UDP is unreliable, dropped packages and out-of-order
162 packages may occur. Memslap creates a memory buffer to handle
163 these problems. Memslap tries to read all the response data of
164 one command from the server and reorders the response data. If some packages
165 get lost, the waiting timeout mechanism can ensure half-baked packages will
166 be discarded and the next command will be sent.
171 Below are some usage samples:
175 =item memslap -s 127.0.0.1:11211 -S 5s
177 =item memslap -s 127.0.0.1:11211 -t 2m -v 0.2 -e 0.05 -b
179 =item memslap -s 127.0.0.1:11211 -F config -t 2m -w 40k -S 20s -o 0.2
181 =item memslap -s 127.0.0.1:11211 -F config -t 2m -T 4 -c 128 -d 20 -P 40k
183 =item memslap -s 127.0.0.1:11211 -F config -t 2m -d 50 -a -n 40
185 =item memslap -s 127.0.0.1:11211,127.0.0.1:11212 -F config -t 2m
187 =item memslap -s 127.0.0.1:11211,127.0.0.1:11212 -F config -t 2m -p 2
191 The user must specify one server at least to run memslap. The
192 rest of the parameters have default values, as shown below:
194 Thread number = 1 Concurrency = 16
196 Run time = 600 seconds Configuration file = NULL
198 Key size = 64 Value size = 1024
200 Get/set = 9:1 Window size = 10k
202 Execute number = 0 Single get = true
204 Multi-get = false Number of sockets of each concurrency = 1
206 Reconnect = false Data verification = false
208 Expire-time verification = false ASCII protocol = true
210 Binary protocol = false Dumping statistic information
214 Overwrite proportion = 0% UDP = false
216 TCP = true Limit throughput = false
218 Facebook test = false Replication test = false
220 =head2 Key size, value size and command distribution.
222 All the distributions are read from the configuration file specified by user
223 with “—cfg_cmd” option. If the user does not specify a configuration file,
224 memslap will run with the default distribution (key size = 64,
225 value size = 1024, get/set = 9:1). For information on how to edit the
226 configuration file, refer to the “Configuration File” section.
228 The minimum key size is 16 bytes; the maximum key size is 250 bytes. The
229 precision of proportion is 0.001. The proportion of distribution will be
230 rounded to 3 decimal places.
232 The minimum value size is 1 bytes; the maximum value size is 1M bytes. The
233 precision of proportion is 0.001. The proportion of distribution will be
234 rounded to 3 decimal places.
235 Currently, memslap only supports set and get commands. And it
236 supports 100% set and 100% get. For 100% get, it will preset some objects to
239 =head2 Multi-thread and concurrency
241 The high performance of memslap benefits from the special
242 schedule of thread and concurrency. It’s important to specify the proper
243 number of them. The default number of threads is 1; the default number of
244 concurrency is 16. The user can use “—threads” and “--concurrency” to
245 specify these variables.
247 If the system supports setting CPU affinity and the number of threads
248 specified by the user is greater than 1, memslap will try to
249 bind each thread to a different CPU core. So if you want to get the best
250 performance memslap, it is better to specify the number of
251 thread equal to the number of CPU cores. The number of threads specified by
252 the user can also be less or greater than the number of CPU cores. Because
253 of the limitation of implementation, the number of concurrencies could be
254 the multiple of the number of threads.
256 1. For 8 CPU cores system
260 --threads=2 --concurrency=128
262 --threads=8 --concurrency=128
264 --threads=8 --concurrency=256
266 --threads=12 --concurrency=144
268 2. For 16 CPU cores system
272 --threads=8 --concurrency=128
274 --threads=16 --concurrency=256
276 --threads=16 --concurrency=512
278 --threads=24 --concurrency=288
280 The memslap performs very well, when
281 used to test the performance of memcached servers.
282 Most of the time, the bottleneck is the network or
283 the server. If for some reason the user wants to
284 limit the performance of memslap, there
285 are two ways to do this:
287 Decrease the number of threads and concurrencies.
288 Use the option “--tps” that memslap
289 provides to limit the throughput. This option allows
290 the user to get the expected throughput. For
291 example, assume that the maximum throughput is 50
292 kops/s for a specific configuration, you can specify
293 the throughput equal to or less than the maximum
294 throughput using “--tps” option.
298 Most of the time, the user does not need to specify the window size. The
299 default window size is 10k. For Schooner Memcached, the user can specify
300 different window sizes to get different cache miss rates based on the test
301 case. Memslap supports cache miss rate between 0% and 100%.
302 If you use this utility to test the performance of Schooner Memcached, you
303 can specify a proper window size to get the expected cache miss rate. The
304 formula for calculating window size is as follows:
306 Assume that the key size is 128 bytes, and the value size is 2048 bytes, and
309 1. Small cache cache_size=1M, 100% cache miss (all data get from SSD).
314 (1). cache miss rate 0%
318 (2). cache miss rate 5%
324 (1). cache miss rate 0%
334 The formula for calculating window size for cache miss rate 0%:
336 cache_size / concurrency / (key_size + value_size) * 0.5
338 The formula for calculating window size for cache miss rate 5%:
340 cache_size / concurrency / (key_size + value_size) * 0.7
344 Memslap supports both data verification and expire-time
345 verification. The user can use "--verify=" or "-v" to specify the proportion
346 of data verification. In theory, it supports 100% data verification. The
347 user can use "--exp_verify=" or "-e" to specify the proportion of
348 expire-time verification. In theory, it supports 100% expire-time
349 verification. Specify the "--verbose" options to get more detailed error
352 For example: --exp_verify=0.01 –verify=0.1 , it means that 1% of the objects
353 set with expire-time, 10% of the objects gotten will be verified. If the
354 objects are gotten, memslap will verify the expire-time and
357 =head2 multi-servers and multi-clients
359 Memslap supports multi-servers based on self-governed thread.
360 There is a limitation that the number of servers cannot be greater than the
361 number of threads. Memslap assigns one thread to handle one
362 server at least. The user can use the "--servers=" or "-s" option to specify
367 --servers=10.1.1.1:11211,10.1.1.2:11212,10.1.1.3:11213 --threads=6 --concurrency=36
369 The above command means that there are 6 threads, with each thread having 6
370 concurrencies and that threads 0 and 3 handle server 0 (10.1.1.1); threads 1
371 and 4 handle server 1 (10.1.1.2); and thread 2 and 5 handle server 2
374 All the threads and concurrencies in memslap are self-governed.
376 So is memslap. The user can start up several
377 memslap instances. The user can run memslap on different client
378 machines to communicate with the same memcached server at the same. It is
379 recommended that the user start different memslap on different
380 machines using the same configuration.
382 =head2 Run with execute number mode or time mode
384 The default memslap runs with time mode. The default run time
385 is 10 minutes. If it times out, memslap will exit. Do not
386 specify both execute number mode and time mode at the same time; just
391 --time=30s (It means the test will run 30 seconds.)
393 --execute_number=100000 (It means that after running 100000 commands, the test will exit.)
395 =head2 Dump statistic information periodically.
397 The user can use "--stat_freq=" or "-S" to specify the frequency.
403 Memslap will dump the statistics of the commands (get and set) at the frequency of every 20
406 For more information on the format of dumping statistic information, refer to “Format of Output” section.
410 The user can use "--division=" or "-d" to specify multi-get keys count.
411 Memslap by default does single get with TCP. Memslap also supports data
412 verification and expire-time verification for multi-get.
414 Memslap supports multi-get with both TCP and UDP. Because of
415 the different implementation of the ASCII protocol and binary protocol,
416 there are some differences between the two. For the ASCII protocol,
417 memslap sends one “multi-get” to the server once. For the
418 binary protocol, memslap sends several single get commands
419 together as “multi-get” to the server.
423 Memslap supports both UDP and TCP. For TCP,
424 memslap does not reconnect the memcached server if socket connections are
425 lost. If all the socket connections are lost or memcached server crashes,
426 memslap will exit. If the user specifies the “--reconnect”
427 option when socket connections are lost, it will reconnect them.
429 User can use “--udp” to enable the UDP feature, but UDP comes with some
432 UDP cannot set data more than 1400 bytes.
434 UDP is not supported by the binary protocol because the binary protocol of
435 memcached does not support that.
437 UDP doesn’t support reconnection.
441 Set data with TCP and multi-get with UDP. Specify the following options:
443 "--facebook --division=50"
445 If you want to create thousands of TCP connections, specify the
447 "--conn_sock=" option.
449 For example: --facebook --division=50 --conn_sock=200
451 The above command means that memslap will do facebook test,
452 each concurrency has 200 socket TCP connections and one UDP socket.
454 Memslap sets objects with the TCP socket, and multi-gets 50
455 objects once with the UDP socket.
457 If you specify "--division=50", the key size must be less that 25 bytes
458 because the UDP packet size is 1400 bytes.
460 =head2 Replication test
462 For replication test, the user must specify at least two memcached servers.
463 The user can use “—rep_write=” option to enable feature.
467 --servers=10.1.1.1:11211,10.1.1.2:11212 –rep_write=2
469 The above command means that there are 2 replication memcached servers,
470 memslap will set objects to both server 0 and server 1, get
471 objects which are set to server 0 before from server 1, and also get objects
472 which are set to server 1 before from server 0. If server 0 crashes,
473 memslap will only get objects from server 1. If server 0 comes
474 back to life again, memslap will reconnect server 0. If both
475 server 0 and server 1 crash, memslap will exit.
477 =head2 Supports thousands of TCP connections
479 Start memslap with "--conn_sock=" or "-n" to enable this
480 feature. Make sure that your system can support opening thousands of files
481 and creating thousands of sockets. However, this feature does not support
482 reconnection if sockets disconnect.
486 --threads=8 --concurrency=128 --conn_sock=128
488 The above command means that memslap starts up 8 threads, each
489 thread has 16 concurrencies, each concurrency has 128 TCP socket
490 connections, and the total number of TCP socket connections is 128 * 128 =
493 =head2 Supports binary protocol
495 Start memslap with "--binary" or "-B" options to enable this
496 feature. It supports all the above features except UDP, because the latest
497 memcached 1.3.3 does not implement binary UDP protocol.
503 Since memcached 1.3.3 doesn't implement binary UDP protocol,
504 memslap does not support UDP. In addition, memcached 1.3.3 does not support
505 multi-get. If you specify "--division=50" option, it just sends 50 get
506 commands together as “mulit-get” to the server.
508 =head1 Configuration file
510 This section describes the format of the configuration file. By default
511 when no configuration file is specified memslap reads the default
512 one located at ~/.memslap.cnf.
514 Below is a sample configuration file:
516 ***************************************************************************
517 #comments should start with '#'
519 #start_len end_len proportion
521 #key length range from start_len to end_len
522 #start_len must be equal to or greater than 16
523 #end_len must be equal to or less than 250
524 #start_len must be equal to or greater than end_len
525 #memslap will generate keys according to the key range
526 #proportion: indicates keys generated from one range accounts for the total
529 #example1: key range 16~100 accounts for 80%
530 # key range 101~200 accounts for 10%
531 # key range 201~250 accounts for 10%
532 # total should be 1 (0.8+0.1+0.1 = 1)
538 #example2: all keys length are 128 bytes
544 #start_len end_len proportion
546 #value length range from start_len to end_len
547 #start_len must be equal to or greater than 1
548 #end_len must be equal to or less than 1M
549 #start_len must be equal to or greater than end_len
550 #memslap will generate values according to the value range
551 #proportion: indicates values generated from one range accounts for the
552 total generated values
554 #example1: value range 1~1000 accounts for 80%
555 # value range 1001~10000 accounts for 10%
556 # value range 10001~100000 accounts for 10%
557 # total should be 1 (0.8+0.1+0.1 = 1)
563 #example2: all value length are 128 bytes
569 #cmd_type cmd_proportion
571 #currently memslap only supports get and set command.
577 #example: set command accounts for 50%
578 # get command accounts for 50%
579 # total should be 1 (0.5+0.5 = 1)
590 =head1 Format of output
592 At the beginning, memslap displays some configuration information as follows:
596 =item servers : 127.0.0.1:11211
598 =item threads count: 1
600 =item concurrency: 16
604 =item windows size: 10k
606 =item set proportion: set_prop=0.10
608 =item get proportion: get_prop=0.90
616 =item servers : "servers"
618 The servers used by memslap.
622 The number of threads memslap runs with.
626 The number of concurrencies memslap runs with.
630 How long to run memslap.
634 The task window size of each concurrency.
638 The proportion of set command.
642 The proportion of get command.
646 The output of dynamic statistics is something like this:
648 ---------------------------------------------------------------------------------------------------------------------------------
650 Type Time(s) Ops TPS(ops/s) Net(M/s) Get_miss Min(us) Max(us)
651 Avg(us) Std_dev Geo_dist
652 Period 5 345826 69165 65.3 0 27 2198 203
654 Global 20 1257935 62896 71.8 0 26 3791 224
659 Type Time(s) Ops TPS(ops/s) Net(M/s) Get_miss Min(us) Max(us)
660 Avg(us) Std_dev Geo_dist
661 Period 5 38425 7685 7.3 0 42 628 240
663 Global 20 139780 6989 8.0 0 37 3790 253
668 Type Time(s) Ops TPS(ops/s) Net(M/s) Get_miss Min(us) Max(us)
669 Avg(us) Std_dev Geo_dist
670 Period 5 384252 76850 72.5 0 27 2198 207
672 Global 20 1397720 69886 79.7 0 26 3791 227
674 ---------------------------------------------------------------------------------------------------------------------------------
682 Statistics information of get command
686 Statistics information of set command
688 =item Total Statistics
690 Statistics information of both get and set command
694 Result within a period
706 Throughput, operations/second
714 How many objects can’t be gotten
718 The minimum response time
722 The maximum response time
726 The average response time
730 Standard deviation of response time
734 Geometric distribution based on natural exponential function
738 At the end, memslap will output something like this:
740 ---------------------------------------------------------------------------------------------------------------------------------
741 Get Statistics (1257956 events)
749 8: 484890 459823 12543 824
752 Set Statistics (139782 events)
760 8: 50784 65574 2064 167
763 Total Statistics (1397738 events)
771 8: 535674 525397 14607 991
781 written_bytes: 242516030
782 read_bytes: 1003702556
783 object_bytes: 152086080
788 Run time: 20.0s Ops: 1397754 TPS: 69817 Net_rate: 59.4M/s
789 ---------------------------------------------------------------------------------------------------------------------------------
797 Get statistics of response time
801 Set statistics of response time
803 =item Total Statistics
805 Both get and set statistics of response time
809 The accumulated and minimum response time
813 The accumulated and maximum response time
817 The accumulated and average response time
821 Standard deviation of response time
825 Geometric distribution based on logarithm 2
829 Total get commands done
833 Total set commands done
837 How many objects can’t be gotten from server
841 How many objects need to verify but can’t get them
845 How many objects with insistent value
849 How many objects are expired but we get them
851 =item unexpired_unget
853 How many objects are unexpired but we can’t get them
867 =item packet_disorder
869 How many UDP packages are disorder
873 How many UDP packages are lost
877 How many times UDP time out happen
889 Throughput, operations/second
893 The average rate of network
900 List one or more servers to connect. Servers count must be less than
901 threads count. e.g.: --servers=localhost:1234,localhost:11211
904 Number of threads to startup, better equal to CPU numbers. Default 8.
907 Number of concurrency to simulate with load. Default 128.
910 Number of TCP socks per concurrency. Default 1.
912 -x, --execute_number=
913 Number of operations(get and set) to execute for the
914 given test. Default 1000000.
917 How long the test to run, suffix: s-seconds, m-minutes, h-hours,
918 d-days e.g.: --time=2h.
921 Load the configure file to get command,key and value distribution list.
924 Task window size of each concurrency, suffix: K, M e.g.: --win_size=10k.
928 Fixed length of value.
931 The proportion of date verification, e.g.: --verify=0.01
934 Number of keys to multi-get once. Default 1, means single get.
937 Frequency of dumping statistic information. suffix: s-seconds,
938 m-minutes, e.g.: --resp_freq=10s.
941 The proportion of objects with expire time, e.g.: --exp_verify=0.01.
942 Default no object with expire time
945 The proportion of objects need overwrite, e.g.: --overwrite=0.01.
946 Default never overwrite object.
949 Reconnect support, when connection is closed it will be reconnected.
952 UDP support, default memslap uses TCP, TCP port and UDP port of
956 Whether it enables facebook test feature, set with TCP and multi-get with UDP.
959 Whether it enables binary protocol. Default with ASCII protocol.
962 Expected throughput, suffix: K, e.g.: --tps=10k.
965 The first nth servers can write data, e.g.: --rep_write=2.
968 Whether it outputs detailed information when verification fails.
971 Display this message and then exit.
974 Display the version of the application and then exit.
978 memslap -s 127.0.0.1:11211 -S 5s
980 memslap -s 127.0.0.1:11211 -t 2m -v 0.2 -e 0.05 -b
982 memslap -s 127.0.0.1:11211 -F config -t 2m -w 40k -S 20s -o 0.2
984 memslap -s 127.0.0.1:11211 -F config -t 2m -T 4 -c 128 -d 20 -P 40k
986 memslap -s 127.0.0.1:11211 -F config -t 2m -d 50 -a -n 40
988 memslap -s 127.0.0.1:11211,127.0.0.1:11212 -F config -t 2m
990 memslap -s 127.0.0.1:11211,127.0.0.1:11212 -F config -t 2m -p 2
994 To find out more information please check:
995 L<http://launchpad.org/libmemcached>
999 Mingqiang Zhuang E<lt>mingqiangzhuang@hengtiansoft.comE<gt> (Schooner Technolgy)
1000 Brian Aker, E<lt>brian@tangent.orgE<gt>
1004 memcached(1) libmemcached(3)