C# · 12月 21, 2021

c# – 优化LINQ例程

我运行一个构建系统.数据方面,简化的描述是我有配置,每个配置都有0..n Builds.
现在构建生成工件,其中一些存储在服务器上.我正在做的是编写一种规则,它将每个配置构建产生的所有字节数相加,并检查它们是否过多.

目前的例程代码如下:

private void CalculateExtendedDiskUsage(IEnumerable<Configuration> allConfigurations){ var sw = new Stopwatch(); sw.Start(); // Lets take only confs that have been updated within last 7 days var items = allConfigurations.AsParallel().Where(x => x.artifact_cleanup_type != null && x.build_cleanup_type != null && x.updated_date > DateTime.UtcNow.AddDays(-7) ).ToList(); using (var ctx = new LocalEntities()) { Debug.WriteLine("Context: " + sw.Elapsed); var allBuilds = ctx.Builds; var ruleResult = new List<Notification>(); foreach (var configuration in items) { // all builds for current configuration var configurationBuilds = allBuilds.Where(x => x.configuration_id == configuration.configuration_id) .OrderByDescending(z => z.build_date); Debug.WriteLine("Filter conf builds: " + sw.Elapsed); // Since I don't kNow which builds/artifacts have been cleaned up,calculate it manually if (configuration.build_cleanup_count != null) { var buildCleanupCount = "30"; // default if (configuration.build_cleanup_type.Equals("ReserveBuildsByDays")) { var buildLastCleanupDate = DateTime.UtcNow.AddDays(-int.Parse(buildCleanupCount)); configurationBuilds = configurationBuilds.Where(x => x.build_date > buildLastCleanupDate) .OrderByDescending(z => z.build_date); } if (configuration.build_cleanup_type.Equals("ReserveBuildsByCount")) { var buildLastCleanupCount = int.Parse(buildCleanupCount); configurationBuilds = configurationBuilds.Take(buildLastCleanupCount).OrderByDescending(z => z.build_date); } } if (configuration.artifact_cleanup_count != null) { // skipped,similar to prevIoUs block } Debug.WriteLine("Done cleanup: " + sw.Elapsed); const int maxDiscAllocationPerConfiguration = 1000000000; // 1GB // Sum all disc usage per configuration var confdiscSizePerConfiguration = configurationBuilds .GroupBy(c => new {c.configuration_id}) .Where(c => (c.Sum(z => z.artifact_dir_size) > maxDiscAllocationPerConfiguration)) .Select(groupedBuilds => new { configurationId = groupedBuilds.FirstOrDefault().configuration_id,configurationPath = groupedBuilds.FirstOrDefault().configuration_path,Total = groupedBuilds.Sum(c => c.artifact_dir_size),Average = groupedBuilds.Average(c => c.artifact_dir_size) }).ToList(); Debug.WriteLine("Done db query: " + sw.Elapsed); ruleResult.AddRange(confdiscSizePerConfiguration.Select(iter => new Notification { ConfigurationId = iter.configurationId,CreatedDate = DateTime.UtcNow,RuleType = (int) RulesEnum.TooMuchDisc,ConfigrationPath = iter.configurationPath })); Debug.WriteLine("Finished loop: " + sw.Elapsed); } // find owners and insert… }}

这正是我想要的,但我在想,如果我能做得更快. Currenly我看到:

Context: 00:00:00.0609067// first roundFilter conf builds: 00:00:00.0636291Done cleanup: 00:00:00.0644505Done db query: 00:00:00.3050122Finished loop: 00:00:00.3062711// avg roundFilter conf builds: 00:00:00.0001707Done cleanup: 00:00:00.0006343Done db query: 00:00:00.0760567Finished loop: 00:00:00.0773370

由.ToList()生成的sql looks very messy.(WHERE中使用的所有东西都包含在DB中的索引)

我正在测试200个配置,因此这加起来为00:00:18.6326722.我总共有大约8k项需要每天处理(所以整个例程需要10多分钟才能完成).

我一直在谷歌搜索这个互联网,在我看来,Entitiy框架并不是很好的并行处理.知道我仍然决定尝试这个async / await approch(第一次尝试了,很抱歉任何废话).

基本上,如果我将所有处理移出范围,例如:

foreach (var configuration in items) { var confdiscSizePerConfiguration = await GetData(configuration,allBuilds); ruleResult.AddRange(confdiscSizePerConfiguration.Select(iter => new Notification { … skiped }

和:

private async Task

出于某种原因,这会导致200个项目的执行时间从18 – > 13秒无论如何,从我的理解,因为我等待每个.ToListAsync(),它仍然按顺序处理,这是正确的吗?

因此,当我用Parallel.ForEach(items,async configuration =>)替换foreach(项目中的var配置)时,“无法并行处理”声明开始出现.执行此更改会导致:

A second operation started on this context before a prevIoUs
asynchronous operation completed. Use ‘await’ to ensure that any
asynchronous operations have completed before calling another method
on this context. Any instance members are not guaranteed to be thread
safe.

起初我有点困惑,因为我几乎在编译器允许的每个地方等待,但可能数据被快速播种.

我试图通过减少贪婪并将新的ParallelOptions {MaxDegreeOfParallelism = 4}添加到该并行循环来克服这一点,农民的假设是默认连接池大小为100,我想要使用的全部是4,应该是充足的.但它仍然失败.

我也尝试在GetData方法中创建新的DbContexts,但它仍然失败.如果我没记错的话(现在不能测试),我得到了

Underlying connection Failed to open

有什么可能使这个例程变得更快?最佳答案在并行之前,优化查询本身是值得的.以下是一些可能会改善您的时间的建议:

1)使用GroupBy时使用Key.这可能会解决复杂的问题.嵌套SQL查询就像这样指示Linq使用GROUP BY中定义的相同键而不是创建子选择.

var confdiscSizePerConfiguration = configurationBuilds .GroupBy(c => new { ConfigurationId = c.configuration_id,ConfigurationPath = c.configuration_path}) .Where(c => (c.Sum(z => z.artifact_dir_size) > maxDiscAllocationPerConfiguration)) .Select(groupedBuilds => new { configurationId = groupedBuilds.Key.ConfigurationId,configurationPath = groupedBuilds.Key.ConfigurationPath,Average = groupedBuilds.Average(c => c.artifact_dir_size) }) .ToList();

2)好像你被N 1问题咬了.简单来说 – 您执行一个SQL查询以获取所有配置,并执行另外一个SQL查询以获取构建信息.总共会有大约8k个小查询,其中2个更大的查询就足够了.如果使用的内存不是约束,则获取内存中的所有构建数据并使用ToLookup优化快速查找.

var allBuilds = ctx.Builds.ToLookup(x=>x.configuration_id);

稍后您可以通过以下方式查

var configurationBuilds = allBuilds[configuration.configuration_id].OrderByDescending(z => z.build_date);

3)您在configurationBuilds上多次执行OrderBy.过滤不会影响记录顺序,因此您可以安全地删除对OrderBy的额外调用:

…configurationBuilds = configurationBuilds.Where(x => x.build_date > buildLastCleanupDate);…configurationBuilds = configurationBuilds.Take(buildLastCleanupCount);…

4)没有必要进行GroupBy,因为已经针对单个配置过滤了构建.

更新:

我更进了一步,创建了一个代码,可以通过单个请求检索与提供的代码相同的结果.它应该更高性能并使用更少的内存.

private void CalculateExtendedDiskUsage(){ using (var ctx = new LocalEntities()) { var ruleResult = ctx.Configurations .Where(x => x.build_cleanup_count != null && ( (x.build_cleanup_type == “ReserveBuildsByDays” && ctx.Builds.Where(y => y.configuration_id == x.configuration_id).Where(y => y.build_date > buildLastCleanupDate).Sum(y => y.artifact_dir_size) > maxDiscAllocationPerConfiguration) || (x.build_cleanup_type == “ReserveBuildsByCount” && ctx.Builds.Where(y => y.configuration_id == x.configuration_id).OrderByDescending(y => y.build_date).Take(buildCleanupCount).Sum(y => y.artifact_dir_size) > maxDiscAllocationPerConfiguration) ) ) .Select(x => new Notification { ConfigurationId = x.configuration_id,ConfigrationPath = x.configuration_path CreatedDate = DateTime.UtcNow,RuleType = (int)RulesEnum.TooMuchDisc,}) .ToList(); }}