相关文章推荐

最近在做的一个项目,用的.net core 2.1,然后缓存用的​ ​Redis​ ​​,缓存相关封装是同事写的,用的驱动是​ ​StackExchange.Redis​ ​​ version 2.0.571 ,一直听说这个驱动并发​情况下有TimeOut bug​,项目开发差不多后,我压测了一下,简单的模拟30个用户持续访问某一个有用到缓存的查询接口,结果这么小的压力下超时异常出现:

Timeout performing GET my_141 (5000ms), inst: 30, qu: 0, qs: 20, in: 20320, serverEndpoint: 172.16.3.119:6379, mgr: 10 of 10 available, clientName: s-119, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=120,Free=32747,Min=1,Max=32767), v: 2.0.571.20511(Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts))
  • 1

后面是堆栈信息…

蛋疼了很久,搜了很多文章,得到以下

解决方案

1、换掉,不用这个驱动( 可以看看.net core redis 驱动推荐,为什么不使用 ​ ​StackExchange.Redis​ ​)

2、redis操作修改为全部异步&& ​ ​ThreadPool.SetMinThreads(200, 200);​


我用的第二种解决了问题,主要换驱动也可能遇到坑;还有时间成本问题;


原因简析

我们看到以上的异常信息当中有这么一段:

IOCP: (Busy=0,Free=1000,Min=1,Max=1000),

WORKER: (Busy=120,Free=32747,Min=1,Max=32767),
  • 1
  • 2
  • 3

意思是当前繁忙的​ ​WORKER​ ​​ 线程有120个,而系统“要由线程池根据需要创建的新的最小工作程序线程数。”,也就是系统创建的工作线程数不足以满足redis的​ ​Get​ ​​操作的繁忙线程的需求,导致部分​ ​Get​ ​操作的线程堵塞超时了;

所以我们把“最小线程​ ​workerThreads​ ​” 修改为200解决问题;


200是我估摸着生产环境服务器设置的,该值设置不合理有可能导致性能问题;



​StackExchange.Redis​ ​超时的问题

最近公司有个项目,在请求量大的情况下,有大量的错误日志是关于redis超时的问题:

Timeout performing SET XXX, inst: 27, mgr: ProcessReadQueue, err: never, queue: 3, qu: 0, qs: 3, qc: 0, wr: 0, wq: 0, in: 15, ar: 1, clientName: XXX, serverEndpoint: 192.168.x.x:6379, keyHashSlot: 944, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=4,Free=4091,Min=4,Max=4095) (Please take a look at this article for some common client-side issues that can cause timeouts: http://stackexchange.github.io/StackExchange.Redis/Timeouts)
  • 1

在网上找解决方案,有说是redis服务端设置的内存不够,客户端连接超时时间设置的长一点,同步时间设置长一点。最后都没有用,只是定位到了是客户端的问题,和服务端没问题。然后继续找在网上发现

​StackExchange.Redis​ ​​是有超时的​ ​bug​ ​​,最后找到​ ​csredis​ ​​解决了问题。把​ ​StackExchange.Redis​ ​替换掉了。


DevOps ​ ​StackExchange​ ​​ 连线 ​ ​Redis​ ​​ 出现 ​ ​Timeout​

同事反应出现大量 ​ ​Redis​ ​​ 连线 ​ ​Timeout​ ​​ 的错误,因为 ​ ​Redis​ ​​ 上存放 ​ ​Session​ ​​ 跟许多 ​ ​config cache​ ​​ 资料,如果 ​ ​Redis​ ​ 异常会严重影响线上服务,所以立马需要进行除错

首先使用 ​ ​Redis-cli​ ​​ 确认服务仍正常执行中,接著执行了 ​ ​Redis Benchmark​ ​​ 检查 ​ ​server​ ​​ 回应,数据并没有发现异常,使用 ​ ​Redis Desktop Manager​ ​​ 连线 ​ ​Redis​ ​​ 资料也可以正常取得,推测 ​ ​Redis server​ ​ 本身应该是正常的

接著确认同事的使用情境后发现并非全面性出现 ​ ​Redis timeout​ ​​ 只有存取几个特定的 ​ ​key​ ​​ 会出现问题。仔细检查后发现:引起 ​ ​timeout error​ ​​ 的 ​ ​key​ ​​ 都有 ​ ​size​ ​ 较大的特徵,推测可能是资料量太大造成的

错误讯息

  • 错误讯息
2017-06-28 18:04:56,025 [17] [ERROR] [RedisObjectStore`1] 
Inner exception number - 0
Timeout performing HGETALL faqscache:******.portal.bll.appdatamanager+cachedata, inst: 1, queue: 18, qu: 0, qs: 18, qc: 0, wr: 0, wq: 0, in: 0, ar: 0, clientName: TestAPP01, serverEndpoint: 127.0.0.1:8188, keyHashSlot: 3414, IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=6,Free=8185,Min=2,Max=8191) (Please take a look at this article for some common client-side issues that can cause timeouts: http://stackexchange.github.io/StackExchange.Redis/Timeouts)
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server)
at StackExchange.Redis.RedisDatabase.HashGetAll(RedisKey key, CommandFlags flags)
at ******.Portal.BLL.Redis.RedisObjectStore`1.Get(String key)
Main exception & its inner exception Log end

解决方式

修改资料同步 ​ ​timeout​ ​​ 设定 - 放宽 ​ ​syncTimeout​ ​​ 时间 (预设 ​ ​1000​ ​ 毫秒)

  1. 原始​ ​redis​ ​ 连线资讯
public static class RedisConnectionFactory
{

private static readonly Lazy<ConnectionMultiplexer> Connection;
public static IServer RedisServer;

static RedisConnectionFactory()
{
var connectionString = "127.0.0.1:6379,127.0.0.1:6380,password=password";
var options = ConfigurationOptions.Parse(connectionString);
Connection = new Lazy<ConnectionMultiplexer>(() => ConnectionMultiplexer.Connect(options));
RedisServer = GetConnection.GetServer(options.EndPoints.First());
}
public static ConnectionMultiplexer GetConnection => Connection.Value;
public static IDatabase RedisDB => GetConnection.GetDatabase();
}
  1. 修改后
public static class RedisConnectionFactory
{
private static readonly Lazy<ConnectionMultiplexer> Connection;
public static IServer RedisServer;
static RedisConnectionFactory()
{
var connectionString = "127.0.0.1:6379,127.0.0.1:6380,password=password,syncTimeout =3000";
var options = ConfigurationOptions.Parse(connectionString);
Connection = new Lazy<ConnectionMultiplexer>(() => ConnectionMultiplexer.Connect(options));
RedisServer = GetConnection.GetServer(options.EndPoints.First());
}
public static ConnectionMultiplexer GetConnection => Connection.Value;
public static IDatabase RedisDB => GetConnection.GetDatabase();
}

心得

放宽资料同步时间后,​ ​timeout​ ​​ 问题确实获得解决,但还是建议同事从源头端来缩小 ​ ​redis​ ​​ 的资料量大小,一来减少 ​ ​network​ ​​ 传输的 ​ ​io​ ​​ ,二来可以增加 ​ ​redis​ ​ 回应速度


​StackExchange.Redis​ ​性能调优

大家经常出现同步调用​ ​Redis​ ​超时的问题,但改成异步之后发现错误非常少了,但却可能通过前后记日志之类的发现Redis命令非常慢。

PS: 以后代码都在​ ​Windows bash​ ​​中运行,​ ​StackExchange.Redis​ ​版本为1.2.6

先快速重现问题和解决问题,大家先运行下面的代码

public static async Task Main(string[] args)
{
ThreadPool.SetMinThreads(8, 8);
using (var connection = await ConnectionMultiplexer.ConnectAsync("localhost"))
{
connection.PreserveAsyncOrder = false;

var db = connection.GetDatabase(0);
var sw = Stopwatch.StartNew();

await Task.WhenAll(Enumerable.Range(0, 10)
.Select(_ => Task.Run(() =>
{
db.StringGet("aaa");

Thread.Sleep(1000);
})));

Console.WriteLine(sw.ElapsedMilliseconds);
}
}

运行发现抛出​ ​StackExchange.Redis.RedisTimeoutException​ ​,为什么呢?是因为当前工作线程根本不够用,同步等待时已经超时。具体请看源代码

如果将上面的​ ​ThreadPool.SetMinThreads(8, 8)​ ​​改成​ ​ThreadPool.SetMinThreads(100, 100)​ ​呢?是不是不抛异常了呢。

再说异步接口变慢的问题,大家先运行下面的代码:

public static async Task Main(string[] args)
{
var tcs = new TaskCompletionSource<bool>();
var sw = Stopwatch.StartNew();

Console.WriteLine($"Main1: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}");

var task = Task.Run(() =>
{
Thread.Sleep(10);
Console.WriteLine($"Run1: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}");
tcs.TrySetResult(true);
Console.WriteLine($"Run2: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}");
Thread.Sleep(10000);
});

var a = tcs.Task.ContinueWith(_ => { Console.WriteLine($"a: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}"); });
var b = tcs.Task.ContinueWith(_ => { Console.WriteLine($"b: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}"); });
var c = tcs.Task.ContinueWith(_ => { Console.WriteLine($"c: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}"); });

await tcs.Task;
Console.WriteLine($"Main2: {sw.ElapsedMilliseconds}, ThreadId: {Environment.CurrentManagedThreadId}");
Thread.Sleep(100);
await Task.Delay(


 
推荐文章