-***********
-
-
-\ **memslap**\ is a load generation and benchmark tool for memcached(1)
-servers. It generates configurable workload such as threads, concurrencies, connections,
-run time, overwrite, miss rate, key size, value size, get/set proportion,
-expected throughput, and so on. Furthermore, it also supports data
-verification, expire-time verification, UDP, binary protocol, facebook test,
-replication test, multi-get and reconnection, etc.
-
-Memslap manages network connections like memcached with
-libevent. Each thread of memslap is bound with a CPU core, all
-the threads don't communicate with each other, and there are several socket
-connections in each thread. Each connection keeps key size distribution,
-value size distribution, and command distribution by itself.
-
-You can specify servers via the \ **--servers**\ option or via the
-environment variable \ ``MEMCACHED_SERVERS``\ .
-
-
-********
-FEATURES
-********
-
-
-Memslap is developed to for the following purposes:
-
-
-Manages network connections with libevent asynchronously.
-
-
-
-Set both TCP and UDP up to use non-blocking IO.
-
-
-
-Improves parallelism: higher performance in multi-threads environments.
-
-
-
-Improves time efficiency: faster processing speed.
-
-
-
-Generates key and value more efficiently; key size distribution and value size distribution are configurable.
-
-
-
-Supports get, multi-get, and set commands; command distribution is configurable.
-
-
-
-Supports controllable miss rate and overwrite rate.
-
-
-
-Supports data and expire-time verification.
-
-
-
-Supports dumping statistic information periodically.
-
-
-
-Supports thousands of TCP connections.
-
-
-
-Supports binary protocol.
-
-
-
-Supports facebook test (set with TCP and multi-get with UDP) and replication test.
-
-
-
-
-*******
-DETAILS
-*******
-
-
-Effective implementation of network.
-====================================
-
-
-For memslap, both TCP and UDP use non-blocking network IO. All
-the network events are managed by libevent as memcached. The network module
-of memslap is similar to memcached. Libevent can ensure
-memslap can handle network very efficiently.
-
-
-Effective implementation of multi-threads and concurrency
-=========================================================
-
-
-Memslap has the similar implementation of multi-threads to
-memcached. Memslap creates one or more self-governed threads;
-each thread is bound with one CPU core if the system supports setting CPU
-core affinity.
-
-In addition, each thread has a libevent to manage the events of the network;
-each thread has one or more self-governed concurrencies; and each
-concurrency has one or more socket connections. All the concurrencies don’t
-communicate with each other even though they are in the same thread.
-
-Memslap can create thousands of socket connections, and each
-concurrency has tens of socket connections. Each concurrency randomly or
-sequentially selects one socket connection from its socket connection pool
-to run, so memslap can ensure each concurrency handles one
-socket connection at any given time. Users can specify the number of
-concurrency and socket connections of each concurrency according to their
-expected workload.
-
-
-Effective implementation of generating key and value
-====================================================
-
-
-In order to improve time efficiency and space efficiency,
-memslap creates a random characters table with 10M characters. All the
-suffixes of keys and values are generated from this random characters table.
-
-Memslap uses the offset in the character table and the length
-of the string to identify a string. It can save much memory.
-Each key contains two parts, a prefix and a suffix. The prefix is an
-uint64_t, 8 bytes. In order to verify the data set before,
-memslap need to ensure each key is unique, so it uses the prefix to identify
-a key. The prefix cannot include illegal characters, such as ‘\r’, ‘\n’,
-‘\0’ and ‘ ‘. And memslap has an algorithm to ensure that.
-
-Memslap doesn’t generate all the objects (key-value pairs) at
-the beginning. It only generates enough objects to fill the task window
-(default 10K objects) of each concurrency. Each object has the following
-basic information, key prefix, key suffix offset in the character table, key
-length, value offset in the character table, and value length.
-
-In the work process, each concurrency sequentially or randomly selects an
-object from the window to do set operation or get operation. At the same
-time, each concurrency kicks objects out of its window and adds new object
-into it.
-
-
-Simple but useful task scheduling
-=================================
-
-
-Memslap uses libevent to schedule all the concurrencies of
-threads, and each concurrency schedules tasks based on the local task
-window. Memslap assumes that if each concurrency keeps the same
-key distribution, value distribution and commands distribution, from
-outside, memslap keeps all the distribution as a whole.
-Each task window includes a lot of objects, each object stores its basic
-information, such as key, value, expire time, and so on. At any time, all
-the objects in the window keep the same and fixed key and value
-distribution. If an object is overwritten, the value of the object will be
-updated. Memslap verifies the data or expire-time according to
-the object information stored in the task window.
-
-Libevent selects which concurrency to handle based on a specific network
-event. Then the concurrency selects which command (get or set) to operate
-based on the command distribution. If it needs to kick out an old object and
-add a new object, in order to keep the same key and value distribution, the
-new object must have the same key length and value length.
-
-If memcached server has two cache layers (memory and SSD), running
-memslap with different window sizes can get different cache
-miss rates. If memslap adds enough objects into the windows at
-the beginning, and the cache of memcached cannot store all the objects
-initialized, then memslap will get some objects from the second
-cache layer. It causes the first cache layer to miss. So the user can
-specify the window size to get the expected miss rate of the first cache
-layer.
-
-
-Useful implementation of multi-servers , UDP, TCP, multi-get and binary protocol
-================================================================================
-
-
-Because each thread is self-governed, memslap can assign
-different threads to handle different memcached servers. This is just one of
-the ways in which memslap supports multiple servers. The only
-limitation is that the number of servers cannot be greater than the number
-of threads. The other way to support multiple servers is for replication
-test. Each concurrency has one socket connection to each memcached server.
-For the implementation, memslap can set some objects to one
-memcached server, and get these objects from the other servers.
-
-By default, Memslap does single get. If the user specifies
-multi-get option, memslap will collect enough get commands and
-pack and send the commands together.
-
-Memslap supports both the ASCII protocol and binary protocol,
-but it runs on the ASCII protocol by default.
-Memslap by default runs on the TCP protocol, but it also
-supports UDP. Because UDP is unreliable, dropped packages and out-of-order
-packages may occur. Memslap creates a memory buffer to handle
-these problems. Memslap tries to read all the response data of
-one command from the server and reorders the response data. If some packages
-get lost, the waiting timeout mechanism can ensure half-baked packages will
-be discarded and the next command will be sent.
-
-
-
-*****
-USAGE
-*****
-
-
-Below are some usage samples:
-
-
-memslap -s 127.0.0.1:11211 -S 5s
-
-
-
-memslap -s 127.0.0.1:11211 -t 2m -v 0.2 -e 0.05 -b
-
-
-
-memslap -s 127.0.0.1:11211 -F config -t 2m -w 40k -S 20s -o 0.2
-
-
-
-memslap -s 127.0.0.1:11211 -F config -t 2m -T 4 -c 128 -d 20 -P 40k
-
-
-
-memslap -s 127.0.0.1:11211 -F config -t 2m -d 50 -a -n 40
-
-
-
-memslap -s 127.0.0.1:11211,127.0.0.1:11212 -F config -t 2m
-
-
-
-memslap -s 127.0.0.1:11211,127.0.0.1:11212 -F config -t 2m -p 2
-
-
-
-The user must specify one server at least to run memslap. The
-rest of the parameters have default values, as shown below:
-
-Thread number = 1 Concurrency = 16
-
-Run time = 600 seconds Configuration file = NULL
-
-Key size = 64 Value size = 1024
-
-Get/set = 9:1 Window size = 10k
-
-Execute number = 0 Single get = true
-
-Multi-get = false Number of sockets of each concurrency = 1
-
-Reconnect = false Data verification = false
-
-Expire-time verification = false ASCII protocol = true
-
-Binary protocol = false Dumping statistic information
-
-periodically = false
-
-Overwrite proportion = 0% UDP = false
-
-TCP = true Limit throughput = false
-
-Facebook test = false Replication test = false
-
-Key size, value size and command distribution.
-==============================================
-
-
-All the distributions are read from the configuration file specified by user
-with “—cfg_cmd” option. If the user does not specify a configuration file,
-memslap will run with the default distribution (key size = 64,
-value size = 1024, get/set = 9:1). For information on how to edit the
-configuration file, refer to the “Configuration File” section.
-
-The minimum key size is 16 bytes; the maximum key size is 250 bytes. The
-precision of proportion is 0.001. The proportion of distribution will be
-rounded to 3 decimal places.
-
-The minimum value size is 1 bytes; the maximum value size is 1M bytes. The
-precision of proportion is 0.001. The proportion of distribution will be
-rounded to 3 decimal places.
-Currently, memslap only supports set and get commands. And it
-supports 100% set and 100% get. For 100% get, it will preset some objects to
-the server.
-
-
-Multi-thread and concurrency
-============================
-
-
-The high performance of memslap benefits from the special
-schedule of thread and concurrency. It’s important to specify the proper
-number of them. The default number of threads is 1; the default number of
-concurrency is 16. The user can use “—threads” and “--concurrency” to
-specify these variables.
-
-If the system supports setting CPU affinity and the number of threads
-specified by the user is greater than 1, memslap will try to
-bind each thread to a different CPU core. So if you want to get the best
-performance memslap, it is better to specify the number of
-thread equal to the number of CPU cores. The number of threads specified by
-the user can also be less or greater than the number of CPU cores. Because
-of the limitation of implementation, the number of concurrencies could be
-the multiple of the number of threads.
-
-1. For 8 CPU cores system
-
-For example:
-
---threads=2 --concurrency=128
-
---threads=8 --concurrency=128
-
---threads=8 --concurrency=256
-
---threads=12 --concurrency=144
-
-2. For 16 CPU cores system
-
-For example:
-
---threads=8 --concurrency=128
-
---threads=16 --concurrency=256
-
---threads=16 --concurrency=512
-
---threads=24 --concurrency=288
-
-The memslap performs very well, when
-used to test the performance of memcached servers.
-Most of the time, the bottleneck is the network or
-the server. If for some reason the user wants to
-limit the performance of memslap, there
-are two ways to do this:
-
-Decrease the number of threads and concurrencies.
-Use the option “--tps” that memslap
-provides to limit the throughput. This option allows
-the user to get the expected throughput. For
-example, assume that the maximum throughput is 50
-kops/s for a specific configuration, you can specify
-the throughput equal to or less than the maximum
-throughput using “--tps” option.
-
-
-Window size
-===========
-
-
-Most of the time, the user does not need to specify the window size. The
-default window size is 10k. For Schooner Memcached, the user can specify
-different window sizes to get different cache miss rates based on the test
-case. Memslap supports cache miss rate between 0% and 100%.
-If you use this utility to test the performance of Schooner Memcached, you
-can specify a proper window size to get the expected cache miss rate. The
-formula for calculating window size is as follows:
-
-Assume that the key size is 128 bytes, and the value size is 2048 bytes, and
-concurrency=128.
-
-1. Small cache cache_size=1M, 100% cache miss (all data get from SSD).
-win_size=10k
-
-2. cache_size=4G
-
-(1). cache miss rate 0%
-
-win_size=8k
-
-(2). cache miss rate 5%
-
-win_size=11k
-
-3. cache_size=16G
-
-(1). cache miss rate 0%
-
-win_size=32k
-
-(2). cache miss
-
-rate 5%
-
-win_size=46k
-
-The formula for calculating window size for cache miss rate 0%:
-
-cache_size / concurrency / (key_size + value_size) \* 0.5
-
-The formula for calculating window size for cache miss rate 5%:
-
-cache_size / concurrency / (key_size + value_size) \* 0.7
-
-
-Verification
-============
-
-
-Memslap supports both data verification and expire-time
-verification. The user can use "--verify=" or "-v" to specify the proportion
-of data verification. In theory, it supports 100% data verification. The
-user can use "--exp_verify=" or "-e" to specify the proportion of
-expire-time verification. In theory, it supports 100% expire-time
-verification. Specify the "--verbose" options to get more detailed error
-information.
-
-For example: --exp_verify=0.01 –verify=0.1 , it means that 1% of the objects
-set with expire-time, 10% of the objects gotten will be verified. If the
-objects are gotten, memslap will verify the expire-time and
-value.
-
-
-multi-servers and multi-clients
-===============================
-
-
-Memslap supports multi-servers based on self-governed thread.
-There is a limitation that the number of servers cannot be greater than the
-number of threads. Memslap assigns one thread to handle one
-server at least. The user can use the "--servers=" or "-s" option to specify
-multi-servers.
-
-For example:
-
---servers=10.1.1.1:11211,10.1.1.2:11212,10.1.1.3:11213 --threads=6 --concurrency=36
-
-The above command means that there are 6 threads, with each thread having 6
-concurrencies and that threads 0 and 3 handle server 0 (10.1.1.1); threads 1
-and 4 handle server 1 (10.1.1.2); and thread 2 and 5 handle server 2
-(10.1.1.3).
-
-All the threads and concurrencies in memslap are self-governed.
-
-So is memslap. The user can start up several
-memslap instances. The user can run memslap on different client
-machines to communicate with the same memcached server at the same. It is
-recommended that the user start different memslap on different
-machines using the same configuration.
-
-
-Run with execute number mode or time mode
-=========================================
-
-
-The default memslap runs with time mode. The default run time
-is 10 minutes. If it times out, memslap will exit. Do not
-specify both execute number mode and time mode at the same time; just
-specify one instead.
-
-For example:
-
---time=30s (It means the test will run 30 seconds.)
-
---execute_number=100000 (It means that after running 100000 commands, the test will exit.)
-
-
-Dump statistic information periodically.
-========================================
-
-
-The user can use "--stat_freq=" or "-S" to specify the frequency.
-
-For example:
-
---stat_freq=20s
-
-Memslap will dump the statistics of the commands (get and set) at the frequency of every 20
-seconds.
-
-For more information on the format of dumping statistic information, refer to “Format of Output” section.
-
-
-Multi-get
-=========
-
-
-The user can use "--division=" or "-d" to specify multi-get keys count.
-Memslap by default does single get with TCP. Memslap also supports data
-verification and expire-time verification for multi-get.
-
-Memslap supports multi-get with both TCP and UDP. Because of
-the different implementation of the ASCII protocol and binary protocol,
-there are some differences between the two. For the ASCII protocol,
-memslap sends one “multi-get” to the server once. For the
-binary protocol, memslap sends several single get commands
-together as “multi-get” to the server.
-
-
-UDP and TCP
-===========
-
-
-Memslap supports both UDP and TCP. For TCP,
-memslap does not reconnect the memcached server if socket connections are
-lost. If all the socket connections are lost or memcached server crashes,
-memslap will exit. If the user specifies the “--reconnect”
-option when socket connections are lost, it will reconnect them.
-
-User can use “--udp” to enable the UDP feature, but UDP comes with some
-limitations:
-
-UDP cannot set data more than 1400 bytes.
-
-UDP is not supported by the binary protocol because the binary protocol of
-memcached does not support that.