相关文章推荐
CVPR2023 医学图像论文大全

CVPR2023 医学图像论文大全

以下为 CVPR2023 医学图像 相关论文,共计41篇。博主从论文库中浏览题目进行人为筛选,所以可能有所遗漏,如有遗漏欢迎评论区指出添加。论文链接来自 arXiv thecvf ,题目和摘要翻译来自 DeepL ,机器翻译所以可能有不准确的地方。


Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation

基于动态图形增强对比学习的胸部x射线报告生成

Automatic radiology reporting has great clinical potential to relieve radiologists from heavy workloads and improve diagnosis interpretation. Recently, researchers have enhanced data-driven neural networks with medical knowledge graphs to eliminate the severe visual and textual bias in this task. The structures of such graphs are exploited by using the clinical dependencies formed by the disease topic tags via general knowledge and usually do not update during the training process. Consequently, the fixed graphs can not guarantee the most appropriate scope of knowledge and limit the effectiveness. To address the limitation, we propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning, named DCL. In detail, the fundamental structure of our graph is pre-constructed from general knowledge. Then we explore specific knowledge extracted from the retrieved reports to add additional nodes or redefine their relations in a bottom-up manner. Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation. Finally, this paper introduces Image-Report Contrastive and Image-Report Matching losses to better represent visual features and textual information. Evaluated on IU-Xray and MIMIC-CXR datasets, our DCL outperforms previous state-of-the-art models on these two benchmarks.

自动化放射学报告具有极大的临床潜力,可以减轻放射科医生的工作负担,提高诊断解读能力。最近,研究人员通过使用医学知识图谱增强数据驱动的神经网络,以消除这一任务中的严重视觉和文本偏差。这些图谱的结构利用通过通用知识形成的疾病主题标签的临床依赖关系,并且通常在训练过程中不进行更新。因此,固定的图谱无法保证最合适的知识范围,限制了效果的发挥。为了解决这个问题,我们提出了一种具有动态结构和节点的知识图谱,以便通过对比学习促进医学报告生成,命名为DCL。具体而言,我们的图谱的基本结构是从通用知识预先构建的。然后,我们探索从检索的报告中提取的特定知识,以自下而上的方式添加附加节点或重新定义它们之间的关系。在将每个图像特征输入解码器模块进行报告生成之前,将其与其自身更新后的图谱进行整合。最后,本文引入了图像-报告对比和图像-报告匹配损失,以更好地表示视觉特征和文本信息。通过在IU-Xray和MIMIC-CXR数据集上进行评估,我们的DCL在这两个基准测试中胜过了先前的最先进模型。


Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification

基于变分信息瓶颈的弱监督病理全切片图像分类的任务微调

While Multiple Instance Learning (MIL) has shown promising results in digital Pathology Whole Slide Image (WSI) classification, such a paradigm still faces performance and generalization problems due to challenges in high computational costs on Gigapixel WSIs and limited sample size for model training. To deal with the computation problem, most MIL methods utilize a frozen pretrained model from ImageNet to obtain representations first. This process may lose essential information owing to the large domain gap and hinder the generalization of model due to the lack of image-level training-time augmentations. Though Self-supervised Learning (SSL) proposes viable representation learning schemes, the improvement of the downstream task still needs to be further explored in the conversion from the task-agnostic features of SSL to the task-specifics under the partial label supervised learning. To alleviate the dilemma of computation cost and performance, we propose an efficient WSI fine-tuning framework motivated by the Information Bottleneck theory. The theory enables the framework to find the minimal sufficient statistics of WSI, thus supporting us to fine-tune the backbone into a task-specific representation only depending on WSI-level weak labels. The WSI-MIL problem is further analyzed to theoretically deduce our fine-tuning method. Our framework is evaluated on five pathology WSI datasets on various WSI heads. The experimental results of our fine-tuned representations show significant improvements in both accuracy and generalization compared with previous works. Source code will be available at this https URL.

虽然多实例学习(MIL)在数字病理学整张切片图像(WSI)分类中显示出了有希望的结果,但这种范式仍然面临性能和泛化问题,原因是在处理吉比像素WSI的高计算成本和模型训练的样本数量有限的挑战。为了解决计算问题,大多数MIL方法首先利用来自ImageNet的预训练模型来获取表示。这个过程可能会由于较大的域差距而丢失关键信息,并且由于缺乏图像级训练时的数据增强,会限制模型的泛化能力。尽管自监督学习(SSL)提出了可行的表示学习方案,但在从SSL的任务不可知特征到部分标签监督学习下的任务特定特征的转换中,下游任务的改进仍需要进一步探索。为了缓解计算成本和性能的困境,我们提出了一个高效的WSI微调框架,该框架受到信息瓶颈理论的启发。该理论使得框架能够找到WSI的最小充分统计量,从而支持我们将骨干网络微调为仅依赖于WSI级弱标签的任务特定表示。进一步分析了WSI-MIL问题,从理论上推导了我们的微调方法。我们的框架在五个病理WSI数据集上进行了评估,涵盖了各种WSI头部。与先前的工作相比,我们的微调表示的实验结果在准确性和泛化性方面都显示出了显著的改进。源代码将在此https网址上提供。


Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation

正交标注有利于无监督医学图像分割

Recent trends in semi-supervised learning have significantly boosted the performance of 3D semi-supervised medical image segmentation. Compared with 2D images, 3D medical volumes involve information from different directions, e.g., transverse, sagittal, and coronal planes, so as to naturally provide complementary views. These complementary views and the intrinsic similarity among adjacent 3D slices inspire us to develop a novel annotation way and its corresponding semi-supervised model for effective segmentation. Specifically, we firstly propose the orthogonal annotation by only labeling two orthogonal slices in a labeled volume, which significantly relieves the burden of annotation. Then, we perform registration to obtain the initial pseudo labels for sparsely labeled volumes. Subsequently, by introducing unlabeled volumes, we propose a dual-network paradigm named Dense-Sparse Co-training (DeSCO) that exploits dense pseudo labels in early stage and sparse labels in later stage and meanwhile forces consistent output of two networks. Experimental results on three benchmark datasets validated our effectiveness in performance and efficiency in annotation. For example, with only 10 annotated slices, our method reaches a Dice up to 86.93% on KiTS19 dataset.

近期半监督学习在三维半监督医学图像分割方面取得了显著的性能提升。与二维图像相比,三维医学体积涉及来自不同方向的信息,例如横断面、矢状面和冠状面,从而自然地提供了互补的视角。这些互补的视角和相邻三维切片之间的内在相似性激发了我们开发一种新的注释方式及其相应的半监督模型,用于有效的分割。具体而言,我们首先提出正交注释,仅对标注的体积中的两个正交切片进行标记,从而显著减轻了注释的负担。然后,我们进行配准以获取稀疏标注体积的初始伪标签。随后,通过引入未标记的体积,我们提出了一种名为Dense-Sparse Co-training(DeSCO)的双网络范式,在早期利用密集伪标签,在后期利用稀疏标签,并同时强制两个网络的输出保持一致。在三个基准数据集上的实验结果验证了我们在性能和注释效率方面的有效性。例如,在KiTS19数据集上,仅使用10个标注切片,我们的方法的Dice指标可以达到86.93%。


Rethinking Few-Shot Medical Segmentation: A Vector Quantization View

基于矢量量化视角的少镜头医学分割再思考

The existing few-shot medical segmentation networks share the same practice that the more prototypes, the better performance. This phenomenon can be theoretically interpreted in Vector Quantization (VO) view: the more prototypes, the more clusters are separated from pixel-wise feature points distributed over the full space. However, as we further think about few-shot segmentation with this perspective, it is found that the clusterization of feature points and the adaptation to unseen tasks have not received enough attention. Motivated by the observation, we propose a learning VO mechanism consisting of grid format VO (GFVO), self-organized VO (SOVO), and residual oriented VO (ROVO). To be specific, GFVO generates the prototype matrix by averaging square grids over the spatial extent which uniformly quantizes the local details; SOVO adaptively assigns the feature points to different local classes and creates a new representation space where the learnable local prototypes are updated with a global view; ROVO introduces residual information to fine-tune the aforementioned learned local prototypes without re-training, which benefits the generalization performance for the irrelevance to the training task. We empirically show that our VO framework yields the state-of-the-art performance over abdomen, cardiac, and prostate MRI datasets and expect this work will provoke a rethink of the current few-shot medical segmentation model design. Our code will soon be publicly available.

现有的少样本医学分割网络都遵循一个共同的做法,即原型越多,性能越好。从向量量化(VO)的视角来看,这种现象可以在理论上解释:原型越多,聚类就越能将散布在整个空间上的像素特征点分离开来。然而,当我们进一步思考基于少样本的分割时,发现特征点的聚类和对未见任务的适应性并没有得到足够的关注。在这一观察的基础上,我们提出了一种学习向量量化机制,包括网格格式向量量化(GFVO)、自组织向量量化(SOVO)和残差导向向量量化(ROVO)。具体而言,GFVO通过对空间范围内的方形网格求平均来生成原型矩阵,从而均匀量化局部细节;SOVO将特征点自适应地分配给不同的局部类别,并创建一个新的表示空间,在这个空间中,可学习的局部原型通过全局视角进行更新;ROVO引入残差信息,对上述学习得到的局部原型进行微调,无需重新训练,从而提高了对训练任务无关的泛化性能。我们在腹部、心脏和前列腺MRI数据集上进行了实证实验,结果显示我们的向量量化框架在性能上达到了最先进的水平,我们希望这项工作能够引发对当前少样本医学分割模型设计的重新思考。我们的代码将很快公开。


Benchmarking Self-Supervised Learning on Diverse Pathology Datasets

在多样性病理学数据集上对自监督学习进行基准测试

Computational pathology can lead to saving human lives, but models are annotation hungry and pathology images are notoriously expensive to annotate. Self-supervised learning has shown to be an effective method for utilizing unlabeled data, and its application to pathology could greatly benefit its downstream tasks. Yet, there are no principled studies that compare SSL methods and discuss how to adapt them for pathology. To address this need, we execute the largest-scale study of SSL pre-training on pathology image data, to date. Our study is conducted using 4 representative SSL methods on diverse downstream tasks. We establish that large-scale domain-aligned pre-training in pathology consistently out-performs ImageNet pre-training in standard SSL settings such as linear and fine-tuning evaluations, as well as in low-label regimes. Moreover, we propose a set of domain-specific techniques that we experimentally show leads to a performance boost. Lastly, for the first time, we apply SSL to the challenging task of nuclei instance segmentation and show large and consistent performance improvements under diverse settings.

计算病理学可以拯救人类生命,但模型对注释数据有很高的需求,而病理图像的注释费用昂贵。自监督学习已被证明是利用无标签数据的有效方法,将其应用于病理学可以极大地提升下游任务的效果。然而,目前还没有系统研究比较自监督学习方法,并讨论如何将其应用于病理学的原则性研究。为了解决这个问题,我们进行了迄今为止规模最大的病理图像自监督预训练研究。我们使用了4种代表性的自监督学习方法,并应用于多样化的下游任务。我们得出的结论是,在标准的自监督学习设置中,如线性评估和微调评估以及低标签情况下,与ImageNet预训练相比,大规模领域对齐的病理学预训练始终表现更好。此外,我们提出了一组针对病理学的领域特定技术,实验证明它们可以提升性能。最后,我们首次将自监督学习应用于具有挑战性的细胞核实例分割任务,并在不同的设置下展示了大幅且一致的性能提升。


Label-Free Liver Tumor Segmentation

无标记肿瘤分割

We demonstrate that AI models can accurately segment liver tumors without the need for manual annotation by using synthetic tumors in CT scans. Our synthetic tumors have two intriguing advantages: (I) realistic in shape and texture, which even medical professionals can confuse with real tumors; (II) effective for training AI models, which can perform liver tumor segmentation similarly to the model trained on real tumors -- this result is exciting because no existing work, using synthetic tumors only, has thus far reached a similar or even close performance to real tumors. This result also implies that manual efforts for annotating tumors voxel by voxel (which took years to create) can be significantly reduced in the future. Moreover, our synthetic tumors can automatically generate many examples of small (or even tiny) synthetic tumors and have the potential to improve the success rate of detecting small liver tumors, which is critical for detecting the early stages of cancer. In addition to enriching the training data, our synthesizing strategy also enables us to rigorously assess the AI robustness.

我们证明了通过使用CT扫描中的合成肿瘤,AI模型可以准确地分割肝脏肿瘤,而无需手动注释。我们的合成肿瘤具有两个引人注目的优势:(一)在形状和纹理上非常逼真,即使是医疗专业人员也可能将其误认为是真实肿瘤;(二)对于训练AI模型非常有效,能够像在真实肿瘤上训练的模型一样执行肝脏肿瘤分割,这个结果令人兴奋,因为迄今为止,仅使用合成肿瘤的现有研究尚未达到类似或接近真实肿瘤的性能。这个结果还意味着将来可以显著减少逐像素手动注释肿瘤的工作量(这需要多年时间来完成)。此外,我们的合成肿瘤可以自动生成许多小(甚至微小)合成肿瘤的示例,并有潜力提高检测小肝脏肿瘤的成功率,这对于早期癌症的检测至关重要。除了丰富训练数据,我们的合成策略还使我们能够对AI的稳健性进行严格评估。


Topology-Guided Multi-Class Cell Context Generation for Digital Pathology

数字病理学拓扑引导的多类细胞上下文生成

In digital pathology, the spatial context of cells is important for cell classification, cancer diagnosis and prognosis. To model such complex cell context, however, is challenging. Cells form different mixtures, lineages, clusters and holes. To model such structural patterns in a learnable fashion, we introduce several mathematical tools from spatial statistics and topological data analysis. We incorporate such structural descriptors into a deep generative model as both conditional inputs and a differentiable loss. This way, we are able to generate high quality multi-class cell layouts for the first time. We show that the topology-rich cell layouts can be used for data augmentation and improve the performance of downstream tasks such as cell classification.

在数字病理学中,细胞的空间环境对于细胞分类、癌症诊断和预后非常重要。然而,要对这种复杂的细胞背景进行建模是具有挑战性的。细胞形成了不同的混合物、系谱、集群和孔。为了以可学习的方式对这种结构模式进行建模,我们从空间统计和拓扑数据分析中引入了一些数学工具。我们将这种结构描述符纳入一个深度生成模型,作为条件输入和可分化的损失。通过这种方式,我们第一次能够生成高质量的多类细胞布局。我们表明,富含拓扑结构的细胞布局可用于数据增强,并改善下游任务的性能,如细胞分类。


Learning Federated Visual Prompt in Null Space for MRI Reconstruction

零空间联合视觉提示在MRI重构中的学习

Federated Magnetic Resonance Imaging (MRI) reconstruction enables multiple hospitals to collaborate distributedly without aggregating local data, thereby protecting patient privacy. However, the data heterogeneity caused by different MRI protocols, insufficient local training data, and limited communication bandwidth inevitably impair global model convergence and updating. In this paper, we propose a new algorithm, FedPR, to learn federated visual prompts in the null space of global prompt for MRI reconstruction. FedPR is a new federated paradigm that adopts a powerful pre-trained model while only learning and communicating the prompts with few learnable parameters, thereby significantly reducing communication costs and achieving competitive performance on limited local data. Moreover, to deal with catastrophic forgetting caused by data heterogeneity, FedPR also updates efficient federated visual prompts that project the local prompts into an approximate null space of the global prompt, thereby suppressing the interference of gradients on the server performance. Extensive experiments on federated MRI show that FedPR significantly outperforms state-of-the-art FL algorithms with <6% of communication costs when given the limited amount of local training data.

联邦磁共振成像(MRI)重建使得多个医院能够在分布式环境下协作,而无需汇总本地数据,从而保护患者隐私。然而,由于不同的MRI协议、不充足的本地训练数据以及有限的通信带宽所引起的数据异质性不可避免地影响了全局模型的收敛和更新。在本文中,我们提出了一种新的算法,称为FedPR,用于学习MRI重建的联邦视觉提示在全局提示的零空间中。FedPR是一种新的联邦学习范式,采用了一个强大的预训练模型,只学习和通信少量可学习参数的提示,从而大大减少了通信成本,并在有限的本地数据上取得了竞争性的性能。此外,为了处理由于数据异质性引起的灾难性遗忘,FedPR还更新了高效的联邦视觉提示,将本地提示投影到全局提示的近似零空间中,从而抑制了对服务器性能的梯度干扰。在联邦MRI的广泛实验中,FedPR在给定有限的本地训练数据时,以不到6%的通信成本显著优于现有的FL算法。


Histopathology Whole Slide Image Analysis With Heterogeneous Graph Representation Learning

基于异构图表示学习的组织病理学整张切片图像分析

Graph-based methods have been extensively applied to whole slide histopathology image (WSI) analysis due to the advantage of modeling the spatial relationships among different entities. However, most of the existing methods focus on modeling WSIs with homogeneous graphs (e.g., with homogeneous node type). Despite their successes, these works are incapable of mining the complex structural relations between biological entities (e.g., the diverse interaction among different cell types) in the WSI. We propose a novel heterogeneous graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis. Specifically, we formulate the WSI as a heterogeneous graph with "nucleus-type" attribute to each node and a semantic similarity attribute to each edge. We then present a new heterogeneous-graph edge attribute transformer (HEAT) to take advantage of the edge and node heterogeneity during massage aggregating. Further, we design a new pseudo-label-based semantic-consistent pooling mechanism to obtain graph-level features, which can mitigate the over-parameterization issue of conventional cluster-based pooling. Additionally, observing the limitations of existing association-based localization methods, we propose a causal-driven approach attributing the contribution of each node to improve the interpretability of our framework. Extensive experiments on three public TCGA benchmark datasets demonstrate that our framework outperforms the state-of-the-art methods with considerable margins on various tasks. Our codes are available at github.com/HKU-MedAI/WS .

由于在建模不同实体之间的空间关系方面具有优势,基于图的方法已广泛应用于整个切片组织病理学图像(WSI)分析中。然而,大多数现有方法都集中在使用同质图(例如,具有同质节点类型)对WSI进行建模。尽管这些方法取得了成功,但它们无法挖掘WSI中生物实体之间的复杂结构关系(例如,不同细胞类型之间的多样化相互作用)。我们提出了一种新颖的异质图方法,以利用不同类型细胞核之间的相互关系进行WSI分析。具体而言,我们将WSI表示为一个具有每个节点的“细胞核类型”属性和每条边的语义相似性属性的异质图。然后,我们提出了一种新的异质图边属性转换器(HEAT),以在信息聚合过程中利用边和节点的异质性。此外,我们设计了一种基于伪标签的语义一致汇聚机制,用于获取图层级特征,可以缓解传统基于聚类的汇聚方法的过度参数问题。此外,鉴于现有关联定位方法的局限性,我们提出了一种因果驱动的方法,将每个节点的贡献归因于提高我们框架的可解释性。对三个公共TCGA基准数据集的广泛实验表明,我们的框架在各种任务上的性能优于现有方法,并取得了相当大的提升。我们的代码可在 github.com/HKU-MedAI/WS 找到。


AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images

AMIGO:用于千兆像素图像表示学习的具有共享上下文处理的稀疏多模态图转换器

Processing giga-pixel whole slide histopathology images (WSI) is a computationally expensive task. Multiple instance learning (MIL) has become the conventional approach to process WSIs, in which these images are split into smaller patches for further processing. However, MIL-based techniques ignore explicit information about the individual cells within a patch. In this paper, by defining the novel concept of shared-context processing, we designed a multi-modal Graph Transformer (AMIGO) that uses the celluar graph within the tissue to provide a single representation for a patient while taking advantage of the hierarchical structure of the tissue, enabling a dynamic focus between cell-level and tissue-level information. We benchmarked the performance of our model against multiple state-of-the-art methods in survival prediction and showed that ours can significantly outperform all of them including hierarchical Vision Transformer (ViT). More importantly, we show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data. Finally, in two different cancer datasets, we demonstrated that our model was able to stratify the patients into low-risk and high-risk groups while other state-of-the-art methods failed to achieve this goal. We also publish a large dataset of immunohistochemistry images (InUIT) containing 1,600 tissue microarray (TMA) cores from 188 patients along with their survival information, making it one of the largest publicly available datasets in this context.

处理千兆像素的全玻片组织病理学图像(WSI)是一项计算成本很高的任务。多重实例学习(MIL)已经成为处理WSIs的传统方法,在这种方法中,这些图像被分割成更小的斑块,以便进一步处理。然而,基于MIL的技术忽略了关于斑块内单个细胞的明确信息。在本文中,通过定义共享上下文处理的新概念,我们设计了一个多模态图变换器(AMIGO),它利用组织内的细胞图为病人提供一个单一的表示,同时利用组织的分层结构,使细胞级和组织级信息之间能够动态聚焦。我们将我们的模型的性能与多个最先进的生存预测方法进行了比较,结果表明,我们的模型可以明显地超过所有的方法,包括层次化的视觉转化器(ViT)。更重要的是,我们表明我们的模型对信息缺失有很强的鲁棒性,以至于它可以在低至20%的数据中实现同样的性能。最后,在两个不同的癌症数据集中,我们证明了我们的模型能够将病人分层为低风险和高风险组,而其他最先进的方法未能实现这一目标。我们还公布了一个大型的免疫组化图像数据集(InUIT),包含188名患者的1600个组织微阵列(TMA)核心以及他们的生存信息,使其成为这方面最大的公开数据集之一。


DoNet: Deep De-overlapping Network for Cytology Instance Segmentation

DoNet:用于细胞学实例分割的深度去重叠网络

Cell instance segmentation in cytology images has significant importance for biology analysis and cancer screening, while remains challenging due to 1) the extensive overlapping translucent cell clusters that cause the ambiguous boundaries, and 2) the confusion of mimics and debris as nuclei. In this work, we proposed a De-overlapping Network (DoNet) in a decompose-and-recombined strategy. A Dual-path Region Segmentation Module (DRM) explicitly decomposes the cell clusters into intersection and complement regions, followed by a Semantic Consistency-guided Recombination Module (CRM) for integration. To further introduce the containment relationship of the nucleus in the cytoplasm, we design a Mask-guided Region Proposal Strategy (MRP) that integrates the cell attention maps for inner-cell instance prediction. We validate the proposed approach on ISBI2014 and CPS datasets. Experiments show that our proposed DoNet significantly outperforms other state-of-the-art (SOTA) cell instance segmentation methods. The code is available at this https URL.

细胞学图像中的细胞实例分割对生物学分析和癌症筛查具有重要意义,但由于1)大量重叠的半透明细胞群导致边界模糊,以及2)模仿物和碎片与细胞核的混淆,因此仍然具有挑战性。在这项工作中,我们提出了一个去重叠网络(DoNet)的分解和重新组合的策略。一个双路径区域分割模块(DRM)明确地将细胞簇分解为交叉区和互补区,然后由语义一致性指导的重组模块(CRM)进行整合。为了进一步引入细胞质中细胞核的包含关系,我们设计了一个掩码引导的区域提议策略(MRP),该策略整合了细胞注意图,用于内细胞实例预测。我们在ISBI2014和CPS数据集上验证了提出的方法。实验表明,我们提出的DoNet明显优于其他最先进的(SOTA)细胞实例分割方法。该代码可在此https网址上获得。


MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation

MCF:半监督医学图像分割的相互校正框架

Semi-supervised learning is a promising method formedical image segmentation under limited annotation. However, the model cognitive bias impairs the segmentation performance, especially for edge regions. Furthermore, current mainstream semi-supervised medical image segmentation (SSMIS) methods lack designs to handle model bias. The neural network has a strong learning ability, but the cognitive bias will gradually deepen during the training, and it is difficult to correct itself. We propose a novel mutual correction framework (MCF) to explore network bias correction and improve the performance of SSMIS. Inspired by the plain contrast idea, MCF introduces two different subnets to explore and utilize the discrepancies between subnets to correct cognitive bias of the model. More concretely, a contrastive difference review (CDR) module is proposed to find out inconsistent prediction regions and perform a review training. Additionally, a dynamic competitive pseudo-label generation (DC-PLG) module is proposed to evaluate the performance of subnets in real-time, dynamically selecting more reliable pseudo-labels. Experimental results on two medical image databases with different modalities (CT and MRI) show that our method achieves superior performance compared to several state-of-the-art methods. The code will be available at github.com/WYC-321/MCF.

半监督学习是一种很有前途的方法,在有限的注释下形成了典型的图像分割。然而,模型的认知偏差损害了分割性能,特别是对于边缘区域。此外,目前主流的半监督医学图像分割(SSMIS)方法缺乏处理模型偏差的设计。神经网络具有很强的学习能力,但在训练过程中认知偏差会逐渐加深,而且很难自我纠正。我们提出了一个新颖的相互修正框架(MCF)来探索网络偏差的修正,提高SSMIS的性能。受朴素对比思想的启发,MCF引入了两个不同的子网,探索并利用子网之间的差异来纠正模型的认知偏差。更具体地说,提出了一个对比性差异审查(CDR)模块来找出不一致的预测区域并进行审查训练。此外,还提出了一个动态竞争性伪标签生成(DC-PLG)模块来实时评估子网的性能,动态地选择更可靠的伪标签。在两个不同模式(CT和MRI)的医学图像数据库上的实验结果表明,与几个最先进的方法相比,我们的方法取得了卓越的性能。该代码将在 github.com/WYC-321/MCF


METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens

METransformer:由具有多个可学习的专家令牌的变压器生成放射学报告

In clinical scenarios, multi-specialist consultation could significantly benefit the diagnosis, especially for intricate cases. This inspires us to explore a "multi-expert joint diagnosis" mechanism to upgrade the existing "single expert" framework commonly seen in the current literature. To this end, we propose METransformer, a method to realize this idea with a transformer-based backbone. The key design of our method is the introduction of multiple learnable "expert" tokens into both the transformer encoder and decoder. In the encoder, each expert token interacts with both vision tokens and other expert tokens to learn to attend different image regions for image representation. These expert tokens are encouraged to capture complementary information by an orthogonal loss that minimizes their overlap. In the decoder, each attended expert token guides the cross-attention between input words and visual tokens, thus influencing the generated report. A metrics-based expert voting strategy is further developed to generate the final report. By the multi-experts concept, our model enjoys the merits of an ensemble-based approach but through a manner that is computationally more efficient and supports more sophisticated interactions among experts. Experimental results demonstrate the promising performance of our proposed model on two widely used benchmarks. Last but not least, the framework-level innovation makes our work ready to incorporate advances on existing "single-expert" models to further improve its performance.

在临床场景中,多专家咨询可以大大有利于诊断,特别是对于复杂的病例。这启发我们探索一种 "多专家联合诊断 "机制,以升级目前文献中常见的 "单一专家 "框架。为此,我们提出了METransformer,一种用基于变压器的主干来实现这一想法的方法。我们方法的关键设计是在变压器编码器和解码器中引入多个可学习的 "专家 "标记。在编码器中,每个专家令牌与视觉令牌和其他专家令牌互动,学习参加不同的图像区域进行图像表示。这些专家标记被鼓励通过正交损失来捕捉互补的信息,使它们的重叠最小化。在解码器中,每个出席的专家标记引导输入词和视觉标记之间的交叉注意,从而影响生成的报告。一个基于度量的专家投票策略被进一步开发以生成最终报告。通过多专家的概念,我们的模型享有基于集合的方法的优点,但通过一种计算上更有效的方式,支持专家之间更复杂的互动。实验结果表明,我们提出的模型在两个广泛使用的基准上有良好的性能。最后但并非最不重要的一点是,框架层面的创新使我们的工作可以纳入现有 "单一专家 "模型的进展,以进一步提高其性能。


PEFAT: Boosting Semi-Supervised Medical Image Classification via Pseudo-Loss Estimation and Feature Adversarial Training

PEFAT:基于伪损失估计和特征对抗训练的半监督医学图像分类

Pseudo-labeling approaches have been proven beneficial for semi-supervised learning (SSL) schemes in computer vision and medical imaging. Most works are dedicated to finding samples with high-confidence pseudo-labels from the perspective of model predicted probability. Whereas this way may lead to the inclusion of incorrectly pseudo-labeled data if the threshold is not carefully adjusted. In addition, low-confidence probability samples are frequently disregarded and not employed to their full potential. In this paper, we propose a novel Pseudo-loss Estimation and Feature Adversarial Training semi-supervised framework, termed as PEFAT, to boost the performance of multi-class and multi-label medical image classification from the point of loss distribution modeling and adversarial training. Specifically, we develop a trustworthy data selection scheme to split a high-quality pseudo-labeled set, inspired by the dividable pseudo-loss assumption that clean data tend to show lower loss while noise data is the opposite. Instead of directly discarding these samples with low-quality pseudo-labels, we present a novel regularization approach to learn discriminate information from them via injecting adversarial noises at the feature-level to smooth the decision boundary. Experimental results on three medical and two natural image benchmarks validate that our PEFAT can achieve a promising performance and surpass other state-of-the-art methods. The code is available at github.com/maxwell0027/

伪标签方法已被证明对计算机视觉和医学成像中的半监督学习(SSL)方案有益。大多数工作都致力于从模型预测概率的角度寻找具有高置信度的伪标签样本。而如果不仔细调整阈值,这种方式可能会导致包含不正确的伪标签数据。此外,低置信度的概率样本经常被忽略,没有被充分运用。在本文中,我们提出了一个新颖的伪损耗估计和特征对抗训练的半监督框架,称为PEFAT,从损耗分布建模和对抗训练的角度提升多类和多标签医学图像分类的性能。具体来说,我们开发了一个值得信赖的数据选择方案来分割高质量的伪标签集,其灵感来自于可分割的伪损耗假设,即干净的数据往往显示较低的损耗,而噪音数据则相反。我们没有直接丢弃这些具有低质量伪标签的样本,而是提出了一种新颖的正则化方法,通过在特征层面注入对抗性噪声来平滑决策边界,从中学习鉴别信息。在三个医学和两个自然图像基准上的实验结果验证了我们的PEFAT可以实现一个有希望的性能,并超过其他最先进的方法。该代码可在 github.com/maxwell0027/


Causally-Aware Intraoperative Imputation for Overall Survival Time Prediction

术中因果感知的整体生存时间预测补全

Previous research in the vision community has primarily focused on learning effective representations from visual patterns. However, this paper highlights the importance of high-level causal reasoning abilities. The paper presents a case study involving the challenging task of predicting Overall Survival (OS) time in primary liver cancers. Predicting OS time at an early stage is particularly difficult due to the lack of obvious image patterns that reflect OS.To address this challenge, the authors propose a causal inference system that utilizes intraoperative attributes and their correlations as intermediate supervision to bridge the gap between images and the final OS prediction. They construct a causal graph and train the images to estimate the intraoperative attributes for predicting OS. A novel model called Causally-aware Intraoperative Imputation Model (CAWIM) is introduced, which sequentially predicts each attribute using its parent nodes in the estimated causal graph.To determine the causal directions, the authors propose a splitting-voting mechanism. This mechanism votes for the direction of causality for each pair of adjacent nodes based on multiple predictions obtained through causal discovery from heterogeneity. The authors demonstrate the practicality and effectiveness of their method through promising results on a liver cancer dataset consisting of 361 patients with long-term observations.

视觉界以前的研究主要集中在从视觉模式中学习有效的表征。然而,本文强调了高级因果推理能力的重要性。本文提出了一个案例研究,涉及预测原发性肝癌总体生存期(OS)的挑战性任务。由于缺乏反映OS的明显图像模式,在早期阶段预测OS时间特别困难。为了应对这一挑战,作者提出了一个因果推理系统,利用术中属性及其相关性作为中间监督,在图像和最终OS预测之间架起桥梁。他们构建了一个因果图,并训练图像来估计术中属性以预测OS。引入了一个新的模型,称为 "因果意识术中归因模型"(CAWIM),该模型利用估计的因果图中的父节点来顺序预测每个属性。为了确定因果方向,作者提出了一个分裂投票机制。该机制根据通过异质性的因果发现获得的多个预测,对每对相邻节点的因果关系方向进行投票。作者通过在一个由361名患者组成的长期观察的肝癌数据集上取得的可喜成果,证明了他们方法的实用性和有效性。


Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling

通过共享特定特征建模实现缺少模态的多模态学习

The missing modality issue is critical but non-trivial to be solved by multi-modal models. Current methods aiming to handle the missing modality problem in multi-modal tasks, either deal with missing modalities only during evaluation or train separate models to handle specific missing modality settings. In addition, these models are designed for specific tasks, so for example, classification models are not easily adapted to segmentation tasks and vice versa. In this paper, we propose the Shared-Specific Feature Modelling (ShaSpec) method that is considerably simpler and more effective than competing approaches that address the issues above. ShaSpec is designed to take advantage of all available input modalities during training and evaluation by learning shared and specific features to better represent the input data. This is achieved from a strategy that relies on auxiliary tasks based on distribution alignment and domain classification, in addition to a residual feature fusion procedure. Also, the design simplicity of ShaSpec enables its easy adaptation to multiple tasks, such as classification and segmentation. Experiments are conducted on both medical image segmentation and computer vision classification, with results indicating that ShaSpec outperforms competing methods by a large margin. For instance, on BraTS2018, ShaSpec improves the SOTA by more than 3% for enhancing tumour, 5% for tumour core and 3% for whole tumour.

缺失模式的问题很关键,但要通过多模式模型来解决,也不是件容易的事。目前旨在处理多模态任务中缺失模态问题的方法,要么只在评估期间处理缺失模态,要么训练单独的模型来处理特定的缺失模态设置。此外,这些模型是为特定任务设计的,因此,例如,分类模型不容易适应分割任务,反之亦然。在本文中,我们提出了共享特定特征建模(ShaSpec)方法,它比解决上述问题的竞争性方法要简单得多,也更有效。ShaSpec旨在通过学习共享和特定的特征,在训练和评估期间利用所有可用的输入模式来更好地代表输入数据。这是通过一种策略实现的,这种策略依赖于基于分布排列和领域分类的辅助任务,此外还有一个剩余特征融合程序。同时,ShaSpec的设计简单,使其能够轻松适应多种任务,如分类和分割。在医学图像分割和计算机视觉分类上都进行了实验,结果表明ShaSpec在很大程度上超过了竞争方法。例如,在BraTS2018上,ShaSpec对增强型肿瘤的SOTA提高了3%以上,对肿瘤核心区提高了5%,对整个肿瘤提高了3%。


Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing

学习利用时间结构进行生物医学视觉语言处理

Self-supervised learning in vision-language processing (VLP) exploits semantic alignment between imaging and text modalities. Prior work in biomedical VLP has mostly relied on the alignment of single image and report pairs even though clinical notes commonly refer to prior images. This does not only introduce poor alignment between the modalities but also a missed opportunity to exploit rich self-supervision through existing temporal content in the data. In this work, we explicitly account for prior images and reports when available during both training and fine-tuning. Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model. It is designed to be versatile to arising challenges such as pose variations and missing input images across time. The resulting model excels on downstream tasks both in single- and multi-image setups, achieving state-of-the-art (SOTA) performance on progression classification, phrase grounding, and report generation, whilst offering consistent improvements on disease classification and sentence-similarity tasks. We release a novel multi-modal temporal benchmark dataset, MS-CXR-T, to quantify the quality of vision-language representations in terms of temporal semantics. Our experimental results show the advantages of incorporating prior images and reports to make the most use of the data.

视觉语言处理(VLP)的自我监督学习利用了成像和文本模式之间的语义对齐。之前在生物医学VLP方面的工作大多依赖于单一图像和报告对的对齐,尽管临床笔记通常会提到之前的图像。这不仅引入了模式之间的不良对齐方式,而且还错过了通过数据中现有的时间内容来利用丰富的自我监督的机会。在这项工作中,我们在训练和微调期间明确考虑了先前的图像和报告。我们的方法被命名为BioViL-T,使用一个CNN-Transformer混合多图像编码器与一个文本模型联合训练。它被设计成能够应对各种挑战,如姿势变化和不同时间的输入图像缺失。由此产生的模型在单幅和多幅图像设置的下游任务中都表现出色,在进展分类、短语定位和报告生成方面取得了最先进的(SOTA)性能,同时在疾病分类和句子相似性任务方面提供了一致的改进。我们发布了一个新的多模式时间基准数据集MS-CXR-T,以量化视觉语言表述在时间语义方面的质量。我们的实验结果显示了纳入先前的图像和报告以充分利用数据的优势。


Geometric Visual Similarity Learning in 3D Medical Image Self-supervised Pre-training

三维医学图像自监督预训练中的几何视觉相似性学习

Learning inter-image similarity is crucial for 3D medical images self-supervised pre-training, due to their sharing of numerous same semantic regions. However, the lack of the semantic prior in metrics and the semantic-independent variation in 3D medical images make it challenging to get a reliable measurement for the inter-image similarity, hindering the learning of consistent representation for same semantics. We investigate the challenging problem of this task, i.e., learning a consistent representation between images for a clustering effect of same semantic features. We propose a novel visual similarity learning paradigm, Geometric Visual Similarity Learning, which embeds the prior of topological invariance into the measurement of the inter-image similarity for consistent representation of semantic regions. To drive this paradigm, we further construct a novel geometric matching head, the Z-matching head, to collaboratively learn the global and local similarity of semantic regions, guiding the efficient representation learning for different scale-level inter-image semantic features. Our experiments demonstrate that the pre-training with our learning of inter-image similarity yields more powerful inner-scene, inter-scene, and global-local transferring ability on four challenging 3D medical image tasks. Our codes and pre-trained models will be publicly available on this https URL.

学习图像间的相似性对于三维医学图像的自我监督预训练至关重要,因为它们共享许多相同的语义区域。然而,指标中语义先验的缺乏和三维医学图像中与语义无关的变化使得获得图像间相似性的可靠测量成为挑战,阻碍了对相同语义的一致表示的学习。我们研究了这项任务的挑战性问题,即为相同语义特征的聚类效应学习图像间的一致表示。我们提出了一个新的视觉相似性学习范式,即几何视觉相似性学习,它将拓扑不变性的先验嵌入到图像间相似性的测量中,以实现语义区域的一致表示。为了推动这一范式,我们进一步构建了一个新的几何匹配头,即Z-匹配头,以协同学习语义区域的全局和局部相似性,指导不同尺度层次的图像间语义特征的有效表示学习。我们的实验表明,通过我们对图像间相似性的学习进行预训练,在四个具有挑战性的三维医学图像任务上产生了更强大的内部场景、场景间和全局-局部转移能力。我们的代码和预训练模型将在这个https网址上公开提供。


Directional Connectivity-based Segmentation of Medical Images

基于方向连接的医学图像分割

Anatomical consistency in biomarker segmentation is crucial for many medical image analysis tasks. A promising paradigm for achieving anatomically consistent segmentation via deep networks is incorporating pixel connectivity, a basic concept in digital topology, to model inter-pixel relationships. However, previous works on connectivity modeling have ignored the rich channel-wise directional information in the latent space. In this work, we demonstrate that effective disentanglement of directional sub-space from the shared latent space can significantly enhance the feature representation in the connectivity-based network. To this end, we propose a directional connectivity modeling scheme for segmentation that decouples, tracks, and utilizes the directional information across the network. Experiments on various public medical image segmentation benchmarks show the effectiveness of our model as compared to the state-of-the-art methods. Code is available at this https URL.

生物标志物分割中的解剖学一致性对于许多医学图像分析任务至关重要。通过深度网络实现解剖学一致性的一个有希望的范式是结合像素连通性,这是数字拓扑学的一个基本概念,用来模拟像素间的关系。然而,以前关于连通性建模的工作忽略了潜在空间中丰富的通道方向性信息。在这项工作中,我们证明了从共享的潜伏空间中有效地脱离方向性子空间可以显著提高基于连接性的网络的特征表示。为此,我们提出了一个用于分割的方向性连接建模方案,该方案对整个网络的方向性信息进行解耦、跟踪和利用。在各种公共医学图像分割基准上的实验表明,与最先进的方法相比,我们的模型是有效的。代码可在此https网址上找到。


DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting

DeGPR:用于多类细胞检测和计数的深度引导后验正则化

Multi-class cell detection and counting is an essential task for many pathological diagnoses. Manual counting is tedious and often leads to inter-observer variations among pathologists. While there exist multiple, general-purpose, deep learning-based object detection and counting methods, they may not readily transfer to detecting and counting cells in medical images, due to the limited data, presence of tiny overlapping objects, multiple cell types, severe class-imbalance, minute differences in size/shape of cells, etc. In response, we propose guided posterior regularization (DeGPR), which assists an object detector by guiding it to exploit discriminative features among cells. The features may be pathologist-provided or inferred directly from visual data. We validate our model on two publicly available datasets (CoNSeP and MoNuSAC), and on MuCeD, a novel dataset that we contribute. MuCeD consists of 55 biopsy images of the human duodenum for predicting celiac disease. We perform extensive experimentation with three object detection baselines on three datasets to show that DeGPR is model-agnostic, and consistently improves baselines obtaining up to 9% (absolute) mAP gains.

多类细胞检测和计数是许多病理诊断中的重要任务。手动计数繁琐且常常导致病理学家之间的观察者差异。虽然存在多个通用的基于深度学习的目标检测和计数方法,但由于数据有限、存在微小重叠对象、多个细胞类型、严重的类别不平衡、细胞大小/形状的微小差异等原因,它们可能无法直接应用于医学图像中的细胞检测和计数。为此,我们提出了引导后验正则化(DeGPR),通过引导目标检测器利用细胞之间的差异性特征来辅助其工作。这些特征可以由病理学家提供,也可以直接从视觉数据中推断得出。我们在两个公开可用的数据集(CoNSeP和MoNuSAC)以及我们贡献的新数据集MuCeD上验证了我们的模型。MuCeD包含55个用于预测乳糜泻的人十二指肠活检图像。我们对三个数据集上的三个目标检测基准模型进行了大量实验,结果表明DeGPR不依赖于特定模型,并且持续改进基线模型,获得高达9%(绝对值)的mAP提升。


OCELOT: Overlapped Cell on Tissue Dataset for Histopathology

OCELOT:组织病理学组织数据集上的重叠细胞

Cell detection is a fundamental task in computational pathology that can be used for extracting high-level medical information from whole-slide images. For accurate cell detection, pathologists often zoom out to understand the tissue-level structures and zoom in to classify cells based on their morphology and the surrounding context. However, there is a lack of efforts to reflect such behaviors by pathologists in the cell detection models, mainly due to the lack of datasets containing both cell and tissue annotations with overlapping regions. To overcome this limitation, we propose and publicly release OCELOT, a dataset purposely dedicated to the study of cell-tissue relationships for cell detection in histopathology. OCELOT provides overlapping cell and tissue annotations on images acquired from multiple organs. Within this setting, we also propose multi-task learning approaches that benefit from learning both cell and tissue tasks simultaneously. When compared against a model trained only for the cell detection task, our proposed approaches improve cell detection performance on 3 datasets: proposed OCELOT, public TIGER, and internal CARP datasets. On the OCELOT test set in particular, we show up to 6.79 improvement in F1-score. We believe the contributions of this paper, including the release of the OCELOT dataset at this https URL are a crucial starting point toward the important research direction of incorporating cell-tissue relationships in computation pathology.

细胞检测是计算病理学中的基本任务,可以用于从全幅图像中提取高级医学信息。为了准确地进行细胞检测,病理学家通常会放大图像以了解组织层面的结构,并根据细胞的形态和周围背景对其进行分类。然而,目前的细胞检测模型很少考虑到病理学家在细胞检测中的这种行为,主要是因为缺乏同时包含有重叠区域的细胞和组织注释的数据集。为了克服这个限制,我们提出并公开发布了OCELOT数据集,该数据集专门用于在组织病理学中研究细胞与组织之间的关系。OCELOT在从多个器官获取的图像上提供了重叠的细胞和组织注释。在这个设置下,我们还提出了多任务学习方法,可以同时学习细胞和组织任务。与仅针对细胞检测任务训练的模型相比,我们提出的方法在3个数据集(OCELOT、TIGER和CARP)上改进了细胞检测性能。特别是在OCELOT测试集上,我们显示了F1分数提高了高达6.79。我们相信本文的贡献,包括在此https URL上发布OCELOT数据集,是将细胞与组织关系纳入计算病理学的重要研究方向的关键起点。


Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization

问题的关键在于:用于现实世界医学图像分割和分布外定位的先进掩模变压器

Real-world medical image segmentation has tremendous long-tailed complexity of objects, among which tail conditions correlate with relatively rare diseases and are clinically significant. A trustworthy medical AI algorithm should demonstrate its effectiveness on tail conditions to avoid clinically dangerous damage in these out-of-distribution (OOD) cases. In this paper, we adopt the concept of object queries in Mask Transformers to formulate semantic segmentation as a soft cluster assignment. The queries fit the feature-level cluster centers of inliers during training. Therefore, when performing inference on a medical image in real-world scenarios, the similarity between pixels and the queries detects and localizes OOD regions. We term this OOD localization as MaxQuery. Furthermore, the foregrounds of real-world medical images, whether OOD objects or inliers, are lesions. The difference between them is less than that between the foreground and background, possibly misleading the object queries to focus redundantly on the background. Thus, we propose a query-distribution (QD) loss to enforce clear boundaries between segmentation targets and other regions at the query level, improving the inlier segmentation and OOD indication. Our proposed framework is tested on two real-world segmentation tasks, i.e., segmentation of pancreatic and liver tumors, outperforming previous state-of-the-art algorithms by an average of 7.39% on AUROC, 14.69% on AUPR, and 13.79% on FPR95 for OOD localization. On the other hand, our framework improves the performance of inlier segmentation by an average of 5.27% DSC when compared with the leading baseline nnUNet.

现实世界的医学图像分割涉及到具有巨大长尾复杂性的对象,其中尾部病况与相对罕见的疾病相关且具有临床意义。一个可信赖的医学AI算法应该在尾部病况上展示其有效性,以避免在这些分布之外(OOD)的情况下造成临床上的危险。在本文中,我们采用Mask Transformers中的对象查询概念,将语义分割形式化为软聚类分配。在训练过程中,查询适应于内部者的特征级聚类中心。因此,在实际场景中对医学图像进行推理时,像素与查询之间的相似性可以检测和定位OOD区域。我们将这种OOD定位称为MaxQuery。此外,现实世界医学图像的前景,无论是OOD对象还是内部者,都是病变。它们之间的差异小于前景和背景之间的差异,可能会导致对象查询在背景上过度集中。因此,我们提出了一个查询分布(QD)损失,以在查询级别上强制分割目标和其他区域之间有明确的边界,提高内部者分割和OOD指示。我们提出的框架在两个现实世界的分割任务上进行了测试,即胰腺和肝脏肿瘤的分割,在OOD定位方面的平均AUROC提高了7.39%,AUPR提高了14.69%,FPR95提高了13.79%,而与领先的基准nnUNet相比,我们的框架在内部者分割性能方面提高了平均5.27%的DSC。


MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery

MagicNet:通过Magic-Cube分割和恢复的半监督多器官分割

We propose a novel teacher-student model for semi-supervised multi-organ segmentation. In teacher-student model, data augmentation is usually adopted on unlabeled data to regularize the consistent training between teacher and student. We start from a key perspective that fixed relative locations and variable sizes of different organs can provide distribution information where a multi-organ CT scan is drawn. Thus, we treat the prior anatomy as a strong tool to guide the data augmentation and reduce the mismatch between labeled and unlabeled images for semi-supervised learning. More specifically, we propose a data augmentation strategy based on partition-and-recovery N3 cubes cross- and within- labeled and unlabeled images. Our strategy encourages unlabeled images to learn organ semantics in relative locations from the labeled images (cross-branch) and enhances the learning ability for small organs (within-branch). For within-branch, we further propose to refine the quality of pseudo labels by blending the learned representations from small cubes to incorporate local attributes. Our method is termed as MagicNet, since it treats the CT volume as a magic-cube and N3-cube partition-and-recovery process matches with the rule of playing a magic-cube. Extensive experiments on two public CT multi-organ datasets demonstrate the effectiveness of MagicNet, and noticeably outperforms state-of-the-art semi-supervised medical image segmentation approaches, with +7% DSC improvement on MACT dataset with 10% labeled images. Code is available at this https URL.

我们提出了一种新颖的师生模型用于半监督的多器官分割。在师生模型中,通常会对未标记的数据进行数据增强,以规范师生之间的一致训练。我们从一个关键的角度出发,认为不同器官的固定相对位置和可变大小可以提供多器官CT扫描的分布信息。因此,我们将先验解剖结构视为指导数据增强并减少标记和未标记图像之间不匹配的强有力工具,用于半监督学习。具体而言,我们提出了一种基于分割和恢复N3立方体的数据增强策略,跨标记和未标记图像之间以及标记图像内部。我们的策略鼓励未标记图像从标记图像(交叉分支)中学习器官的相对位置,增强了对小器官的学习能力(内部分支)。对于内部分支,我们进一步提出通过将小立方体的学习表示进行混合来改进伪标签的质量,以融合局部属性。我们的方法被称为MagicNet,因为它将CT体积视为一个魔方,并且N3立方体的分割和恢复过程与玩魔方的规则相匹配。在两个公开的CT多器官数据集上进行的大量实验证明了MagicNet的有效性,并明显优于最先进的半监督医学图像分割方法,在MACT数据集上有10%标记图像时DSC提高了7%。代码可在此https URL上获得。


KiUT: Knowledge-Injected U-Transformer for Radiology Report Generation

KiUT:用于放射学报告生成的知识注入型u型Transformer

Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image, which could relieve radiologists from the heavy burden of report writing. Although various image caption methods have shown remarkable performance in the natural image field, generating accurate reports for medical images requires knowledge of multiple modalities, including vision, language, and medical terminology. We propose a Knowledge-injected U-Transformer (KiUT) to learn multi-level visual representation and adaptively distill the information with contextual and clinical knowledge for word prediction. In detail, a U-connection schema between the encoder and decoder is designed to model interactions between different modalities. And a symptom graph and an injected knowledge distiller are developed to assist the report generation. Experimentally, we outperform state-of-the-art methods on two widely used benchmark datasets: IU-Xray and MIMIC-CXR. Further experimental results prove the advantages of our architecture and the complementary benefits of the injected knowledge.

放射学报告生成旨在从X射线图像中自动生成临床准确和连贯的段落,可以减轻放射科医生的报告编写负担。虽然各种图像字幕方法在自然图像领域表现出了显著的性能,但为医学图像生成准确的报告需要涉及视觉、语言和医学术语等多种模态的知识。我们提出了一种注入知识的U-Transformer(KiUT)模型,用于学习多层次的视觉表示,并以上下文和临床知识自适应地提取信息进行单词预测。具体而言,我们设计了编码器和解码器之间的U-连接结构,以建模不同模态之间的交互。同时,我们还开发了症状图和注入知识蒸馏器来辅助报告生成。在实验中,我们在两个广泛使用的基准数据集IU-Xray和MIMIC-CXR上表现优于最先进的方法。进一步的实验结果证明了我们的架构的优势以及注入知识的互补效益。


Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images

基于视觉语言预训练的组织病理学图像的多实例零样本迁移

Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of which are applicable to the emerging field of computational pathology where there are limited publicly available paired image-text datasets and each image can span up to 100,000 x 100,000 pixels in dimensions. In this paper we present MI-Zero, a simple and intuitive framework for unleashing the zero-shot transfer capabilities of contrastively aligned image and text models to gigapixel histopathology whole slide images, enabling multiple downstream diagnostic tasks to be carried out by pretrained encoders without requiring any additional labels. MI-Zero reformulates zero-shot transfer under the framework of multiple instance learning to overcome the computational challenge of inference on extremely large images. We used over 550k pathology reports and other available in-domain text corpora to pretrain our text encoder. By effectively leveraging strong pretrained encoders, our best model pretrained on over 33k histopathology image-caption pairs achieves an average median zero-shot accuracy of 70.2% across three different real-world cancer subtyping tasks. Our code is available at: github.com/mahmoodlab/M .

对比式视觉语言预训练已经成为一种强大的方法,用于训练新的语言感知图像编码器或通过增强现有的预训练模型实现零样本视觉识别能力。然而,现有的方法通常在大规模的图像-文本数据集上进行训练,并且设计用于处理仅涉及小到中等尺寸图像的下游任务,这两点都不适用于计算病理学这一新兴领域,因为在该领域中,公开可用的图像-文本配对数据集有限,每个图像的尺寸可以达到100,000 x 100,000像素。本文提出了MI-Zero,这是一个简单而直观的框架,用于将对比对齐的图像和文本模型的零样本迁移能力应用于吉盘级组织病理学全幻灯片图像,使得预训练编码器可以在不需要额外标签的情况下执行多个下游诊断任务。MI-Zero将零样本迁移重新构建为多实例学习的框架,以克服对极大图像进行推理的计算挑战。我们使用了超过55万份病理报告和其他可用的领域内文本语料库来预训练我们的文本编码器。通过有效利用强大的预训练编码器,我们在超过33k组织病理图像-字幕配对上预训练的最佳模型在三个不同的真实癌症分型任务上实现了平均中位数零样本准确率为70.2%。我们的代码可在以下网址找到: github.com/mahmoodlab/M


Hierarchical discriminative learning improves visual representations of biomedical microscopy

分层判别学习提高了生物医学显微镜的视觉表征

Learning high-quality, self-supervised, visual representations is essential to advance the role of computer vision in biomedical microscopy and clinical medicine. Previous work has focused on self-supervised representation learning (SSL) methods developed for instance discrimination and applied them directly to image patches, or fields-of-view, sampled from gigapixel whole-slide images (WSIs) used for cancer diagnosis. However, this strategy is limited because it (1) assumes patches from the same patient are independent, (2) neglects the patient-slide-patch hierarchy of clinical biomedical microscopy, and (3) requires strong data augmentations that can degrade downstream performance. Importantly, sampled patches from WSIs of a patient's tumor are a diverse set of image examples that capture the same underlying cancer diagnosis. This motivated HiDisc, a data-driven method that leverages the inherent patient-slide-patch hierarchy of clinical biomedical microscopy to define a hierarchical discriminative learning task that implicitly learns features of the underlying diagnosis. HiDisc uses a self-supervised contrastive learning framework in which positive patch pairs are defined based on a common ancestry in the data hierarchy, and a unified patch, slide, and patient discriminative learning objective is used for visual SSL. We benchmark HiDisc visual representations on two vision tasks using two biomedical microscopy datasets, and demonstrate that (1) HiDisc pretraining outperforms current state-of-the-art self-supervised pretraining methods for cancer diagnosis and genetic mutation prediction, and (2) HiDisc learns high-quality visual representations using natural patch diversity without strong data augmentations.

学习高质量的自我监督视觉表示对于推进计算机视觉在生物医学显微镜和临床医学中的作用至关重要。先前的研究集中在为实例鉴别开发的自我监督表示学习(SSL)方法,并直接应用于从用于癌症诊断的千亿像素全幅切片图像(WSI)中采样的图像补丁或视野。然而,这种策略存在局限性,因为它假设来自同一患者的补丁是独立的,忽略了临床生物医学显微镜的患者-切片-补丁层次结构,并且需要强大的数据增强,可能会降低下游性能。重要的是,来自患者肿瘤的WSI中的采样补丁是一组多样的图像示例,捕捉到相同的潜在癌症诊断。这促使我们提出HiDisc,这是一种数据驱动的方法,利用临床生物医学显微镜的患者-切片-补丁层次结构来定义一个层次鉴别学习任务,隐式地学习潜在诊断的特征。HiDisc使用自我监督对比学习框架,其中基于数据层次结构中的共同祖先定义正补丁对,并使用统一的补丁、切片和患者鉴别学习目标进行视觉自我监督学习。我们使用两个生物医学显微镜数据集对HiDisc的视觉表示进行基准测试,并证明(1)HiDisc的预训练在癌症诊断和基因突变预测方面优于当前最先进的自我监督预训练方法,以及(2)HiDisc使用自然的补丁多样性学习高质量的视觉表示,而无需强大的数据增强。


Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images

全玻片病理图像的介入袋多实例学习

Multi-instance learning (MIL) is an effective paradigm for whole-slide pathological images (WSIs) classification to handle the gigapixel resolution and slide-level label. Prevailing MIL methods primarily focus on improving the feature extractor and aggregator. However, one deficiency of these methods is that the bag contextual prior may trick the model into capturing spurious correlations between bags and labels. This deficiency is a confounder that limits the performance of existing MIL methods. In this paper, we propose a novel scheme, Interventional Bag Multi-Instance Learning (IBMIL), to achieve deconfounded bag-level prediction. Unlike traditional likelihood-based strategies, the proposed scheme is based on the backdoor adjustment to achieve the interventional training, thus is capable of suppressing the bias caused by the bag contextual prior. Note that the principle of IBMIL is orthogonal to existing bag MIL methods. Therefore, IBMIL is able to bring consistent performance boosting to existing schemes, achieving new state-of-the-art performance. Code is available at github.com/HHHedo/IBMIL .

介入袋多实例学习(IBMIL)是一种用于处理全片病理图像(WSIs)分类的有效范式,以应对巨大像素分辨率和全片级别标签的挑战。现有的多实例学习方法主要关注改进特征提取和聚合器。然而,这些方法的一个缺陷是袋上下文先验可能会导致模型捕捉到袋和标签之间的虚假关联。这个缺陷是限制现有多实例学习方法性能的混淆因素。在本文中,我们提出了一种新颖的方案,介入袋多实例学习(IBMIL),用于实现无混淆的袋级别预测。与传统的基于似然的策略不同,所提出的方案基于反向门调整来实现介入式训练,从而能够抑制袋上下文先验引起的偏差。值得注意的是,IBMIL的原则与现有的袋多实例学习方法是正交的。因此,IBMIL能够为现有方案带来一致的性能提升,实现新的最先进性能。代码可在 github.com/HHHedo/IBMIL 获取。


RankMix: Data Augmentation for Weakly Supervised Learning of Classifying Whole Slide Images With Diverse Sizes and Imbalanced Categories

RankMix:对不同大小和不平衡类别的全幻灯片图像进行弱监督学习的数据增强

Whole Slide Images (WSIs) are usually gigapixel in size and lack pixel-level annotations. The WSI datasets are also imbalanced in categories. These unique characteristics, significantly different from the ones in natural images, pose the challenge of classifying WSI images as a kind of weakly supervise learning problems. In this study, we propose, RankMix, a data augmentation method of mixing ranked features in a pair of WSIs. RankMix introduces the concepts of pseudo labeling and ranking in order to extract key WSI regions in contributing to the WSI classification task. A two-stage training is further proposed to boost stable training and model performance. To our knowledge, the study of weakly supervised learning from the perspective of data augmentation to deal with the WSI classification problem that suffers from lack of training data and imbalance of categories is relatively unexplored.

全片病理图像(WSI)通常具有巨大的分辨率,并且缺乏像素级别的注释。WSI数据集中的类别也存在不平衡的情况。这些与自然图像明显不同的特点给WSI图像的分类带来了挑战,使其成为一种弱监督学习问题。在本研究中,我们提出了一种名为RankMix的数据增强方法,用于混合一对WSI图像中的排序特征。RankMix引入了伪标签和排序的概念,以提取对WSI分类任务有贡献的关键WSI区域。进一步提出了两阶段训练来提升稳定性和模型性能。据我们所知,从数据增强的角度研究弱监督学习,以解决WSI分类问题中的训练数据不足和类别不平衡的问题,尚未得到充分的探索。


AstroNet: When Astrocyte Meets Artificial Neural Network

AstroNet:当星形胶质细胞遇到人工神经网络

Network structure learning aims to optimize network architectures and make them more efficient without compromising performance. In this paper, we first study the astrocytes, a new mechanism to regulate connections in the classic M-P neuron. Then, with the astrocytes, we propose an AstroNet that can adaptively optimize neuron connections and therefore achieves structure learning to achieve higher accuracy and efficiency. AstroNet is based on our built Astrocyte-Neuron model, with a temporal regulation mechanism and a global connection mechanism, which is inspired by the bidirectional communication property of astrocytes. With the model, the proposed AstroNet uses a neural network (NN) for performing tasks, and an astrocyte network (AN) to continuously optimize the connections of NN, i.e., assigning weight to the neuron units in the NN adaptively. Experiments on the classification task demonstrate that our AstroNet can efficiently optimize the network structure while achieving state-of-the-art (SOTA) accuracy.

网络结构学习旨在优化网络架构,使其在不损失性能的情况下更加高效。本文首先研究了星形胶质细胞,这是一种在经典的M-P神经元中调节连接的新机制。然后,借助星形胶质细胞,我们提出了一种名为AstroNet的网络模型,可以自适应地优化神经元之间的连接,从而实现结构学习,达到更高的准确性和效率。AstroNet基于我们构建的星形胶质细胞-神经元模型,具有时态调节机制和全局连接机制,受到星形胶质细胞双向通信特性的启发。在该模型的基础上,AstroNet使用神经网络(NN)执行任务,并使用星形胶质细胞网络(AN)持续优化NN中的连接,即自适应地为神经元单元分配权重。分类任务的实验证明,我们的AstroNet可以在实现最先进的准确性的同时高效地优化网络结构。


Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation

双向复制粘贴半监督医学图像分割

In semi-supervised medical image segmentation, there exist empirical mismatch problems between labeled and unlabeled data distribution. The knowledge learned from the labeled data may be largely discarded if treating labeled and unlabeled data separately or in an inconsistent manner. We propose a straightforward method for alleviating the problem - copy-pasting labeled and unlabeled data bidirectionally, in a simple Mean Teacher architecture. The method encourages unlabeled data to learn comprehensive common semantics from the labeled data in both inward and outward directions. More importantly, the consistent learning procedure for labeled and unlabeled data can largely reduce the empirical distribution gap. In detail, we copy-paste a random crop from a labeled image (foreground) onto an unlabeled image (background) and an unlabeled image (foreground) onto a labeled image (background), respectively. The two mixed images are fed into a Student network and supervised by the mixed supervisory signals of pseudo-labels and ground-truth. We reveal that the simple mechanism of copy-pasting bidirectionally between labeled and unlabeled data is good enough and the experiments show solid gains (e.g., over 21% Dice improvement on ACDC dataset with 5% labeled data) compared with other state-of-the-arts on various semi-supervised medical image segmentation datasets. Code is available at this https URL}.

在半监督的医学图像分割中,存在着有标签和无标签数据分布的经验不匹配问题。如果将有标签的数据和无标签的数据分开处理或以不一致的方式处理,那么从有标签的数据中学习到的知识可能会被大部分抛弃。我们提出了一个直接的方法来缓解这个问题--在一个简单的Mean Teacher架构中,双向复制粘贴有标签的和无标签的数据。该方法鼓励未标注的数据从标注的数据中向内和向外学习全面的共同语义。更重要的是,标记数据和未标记数据的一致学习程序可以在很大程度上减少经验分布差距。详细来说,我们将已标记的图像(前景)的随机裁剪复制粘贴到未标记的图像(背景)上,并将未标记的图像(前景)复制粘贴到已标记的图像(背景)上。这两张混合图像被送入一个学生网络,并由伪标签和地面真相的混合监督信号进行监督。我们发现,在有标签和无标签的数据之间双向复制粘贴的简单机制已经足够好了,而且实验显示,与其他半监督医学图像分割数据集的先进技术相比,有了实实在在的收益(例如,在ACDC数据集上,5%的标签数据有超过21%的Dice改进)。代码可在此https URL}获得。


Pseudo-Label Guided Contrastive Learning for Semi-Supervised Medical Image Segmentation

半监督医学图像分割的伪标签引导对比学习

Although recent works in semi-supervised learning (SemiSL) have accomplished significant success in natural image segmentation, the task of learning discriminative representations from limited annotations has been an open problem in medical images. Contrastive Learning (CL) frameworks use the notion of similarity measure which is useful for classification problems, however, they fail to transfer these quality representations for accurate pixel-level segmentation. To this end, we propose a novel semi-supervised patch-based CL framework for medical image segmentation without using any explicit pretext task. We harness the power of both CL and SemiSL, where the pseudo-labels generated from SemiSL aid CL by providing additional guidance, whereas discriminative class information learned in CL leads to accurate multi-class segmentation. Additionally, we formulate a novel loss that synergistically encourages inter-class separability and intra-class compactness among the learned representations. A new inter-patch semantic disparity mapping using average patch entropy is employed for a guided sampling of positives and negatives in the proposed CL framework. Experimental analysis on three publicly available datasets of multiple modalities reveals the superiority of our proposed method as compared to the state-of-the-art methods. Code is available at: github.com/hritam-98/Pa .

尽管近期在半监督学习(SemiSL)领域的研究在自然图像分割方面取得了重要进展,但在医学图像领域,从有限的标注中学习判别性表示的任务仍然是一个待解决的问题。对比学习(CL)框架使用了相似性度量的概念,对于分类问题非常有用,但它们未能将这些高质量的表示转化为准确的像素级分割。为此,我们提出了一种新颖的医学图像分割半监督基于块的对比学习框架,而无需使用任何明确的预训练任务。我们充分利用了对比学习和半监督学习的优势,通过半监督学习生成的伪标签为对比学习提供额外的引导,而对比学习中学习的判别类别信息则实现了准确的多类分割。此外,我们制定了一种新颖的损失函数,通过促进学习表示中的类间可分性和类内紧密性的协同作用。在所提出的对比学习框架中,利用平均块熵进行了新的块间语义差异映射,以引导正负样本的采样。对多模态的三个公开数据集进行的实验分析表明,与现有的最先进方法相比,我们提出的方法具有优越性能。代码可在此处获取: github.com/hritam-98/Pa



Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

两全其美:表格和图像数据的多模态对比学习

Medical datasets and especially biobanks, often contain extensive tabular data with rich clinical information in addition to images. In practice, clinicians typically have less data, both in terms of diversity and scale, but still wish to deploy deep learning solutions. Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen.To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. Our solution combines SimCLR and SCARF, two leading contrastive learning strategies, and is simple and effective. In our experiments, we demonstrate the strength of our framework by predicting risks of myocardial infarction and coronary artery disease (CAD) using cardiac MR images and 120 clinical features from 40,000 UK Biobank subjects. Furthermore, we show the generalizability of our approach to natural images using the DVM car advertisement dataset.We take advantage of the high interpretability of tabular data and through attribution and ablation experiments find that morphometric tabular features, describing size and shape, have outsized importance during the contrastive learning process and improve the quality of the learned embeddings. Finally, we introduce a novel form of supervised contrastive learning, label as a feature (LaaF), by appending the ground truth label as a tabular feature during multimodal pretraining, outperforming all supervised contrastive baselines.

医学数据集,尤其是生物库,除了图像之外,往往还包含大量的表格数据,其中有丰富的临床信息。在实践中,临床医生通常在多样性和规模方面拥有较少的数据,但仍然希望部署深度学习解决方案。结合不断增加的医学数据集规模和昂贵的注释成本,对能够预训练多模态和预测单模态的无监督方法的需求已经上升。为了解决这些需求,我们提出了第一个自我监督的对比学习框架,利用图像和表格数据来训练单模态编码器。我们的解决方案结合了SimCLR和SCARF这两种领先的对比学习策略,简单而有效。在我们的实验中,我们通过使用心脏MR图像和来自40,000名英国生物库受试者的120个临床特征来预测心肌梗死和冠状动脉疾病(CAD)的风险,证明了我们框架的优势。此外,我们使用DVM汽车广告数据集展示了我们的方法对自然图像的通用性。我们利用表格数据的高可解释性,并通过归因和消融实验发现,描述大小和形状的形态学表格特征在对比学习过程中具有极大的重要性,并提高了学习嵌入的质量。最后,我们引入了一种新的有监督的对比学习形式,即标签为特征(LaaF),通过在多模态预训练期间将地面真实标签附加为表格特征,表现优于所有有监督的对比基线。


SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation

SDC-UDA:用于切片-方向连续跨模式医学图像分割的体积无监督领域适应框架

Recent advances in deep learning-based medical image segmentation studies achieve nearly human-level performance in fully supervised manner. However, acquiring pixel-level expert annotations is extremely expensive and laborious in medical imaging fields. Unsupervised domain adaptation (UDA) can alleviate this problem, which makes it possible to use annotated data in one imaging modality to train a network that can successfully perform segmentation on target imaging modality with no labels. In this work, we propose SDC-UDA, a simple yet effective volumetric UDA framework for Slice-Direction Continuous cross-modality medical image segmentation which combines intra- and inter-slice self-attentive image translation, uncertainty-constrained pseudo-label refinement, and volumetric self-training. Our method is distinguished from previous methods on UDA for medical image segmentation in that it can obtain continuous segmentation in the slice direction, thereby ensuring higher accuracy and potential in clinical practice. We validate SDC-UDA with multiple publicly available cross-modality medical image segmentation datasets and achieve state-of-the-art segmentation performance, not to mention the superior slice-direction continuity of prediction compared to previous studies.

基于深度学习的医学图像分割研究的最新进展在完全监督的情况下达到了接近人类水平的性能。然而,在医学影像领域,获取像素级的专家注释是非常昂贵和费力的。无监督领域适应性(UDA)可以缓解这个问题,这使得使用一种成像模式的注释数据来训练一个网络成为可能,该网络可以在没有标签的情况下成功地进行目标成像模式的分割。在这项工作中,我们提出了SDC-UDA,一个简单而有效的用于切片-方向连续跨模式医学图像分割的容积式UDA框架,它结合了片内和片间的自注意图像翻译、不确定性约束的伪标签细化和容积式自我训练。我们的方法有别于以往用于医学图像分割的UDA方法,它可以在切片方向上获得连续的分割,从而确保更高的准确性和临床实践的潜力。我们用多个公开的跨模式医学图像分割数据集验证了SDC-UDA,并取得了最先进的分割性能,更不用说与以前的研究相比,预测的切片方向的连续性更强。


Ambiguous Medical Image Segmentation using Diffusion Models

利用扩散模型分割模糊医学图像

Collective insights from a group of experts have always proven to outperform an individual's best diagnostic for clinical tasks. For the task of medical image segmentation, existing research on AI-based alternatives focuses more on developing models that can imitate the best individual rather than harnessing the power of expert groups. In this paper, we introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights. Our proposed model generates a distribution of segmentation masks by leveraging the inherent stochastic sampling process of diffusion using only minimal additional learning. We demonstrate on three different medical image modalities- CT, ultrasound, and MRI that our model is capable of producing several possible variants while capturing the frequencies of their occurrences. Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks in terms of accuracy while preserving naturally occurring variation. We also propose a new metric to evaluate the diversity as well as the accuracy of segmentation predictions that aligns with the interest of clinical practice of collective insights.

事实证明,一群专家的集体洞察力在临床任务中总是优于个人的最佳诊断结果。对于医学图像分割的任务,现有的基于人工智能的替代方案的研究更多地集中在开发能够模仿最佳个体的模型上,而不是利用专家群体的力量。在本文中,我们介绍了一种基于扩散模型的方法,通过学习群体洞察力的分布,产生多种合理的输出。我们提出的模型通过利用扩散固有的随机抽样过程,只用最小的额外学习就能产生一个分割掩模的分布。我们在三种不同的医学图像模式--CT、超声和MRI上证明,我们的模型能够产生几种可能的变体,同时捕捉到它们出现的频率。综合结果显示,我们提出的方法在准确性方面优于现有的最先进的模糊分割网络,同时保留了自然发生的变体。我们还提出了一个新的指标来评估分割预测的多样性以及准确性,这与临床实践中的集体洞察力的兴趣相一致。


Fair Federated Medical Image Segmentation via Client Contribution Estimation

基于客户贡献估计的公平联邦医学图像分割

How to ensure fairness is an important topic in federated learning (FL). Recent studies have investigated how to reward clients based on their contribution (collaboration fairness), and how to achieve uniformity of performance across clients (performance fairness). Despite achieving progress on either one, we argue that it is critical to consider them together, in order to engage and motivate more diverse clients joining FL to derive a high-quality global model. In this work, we propose a novel method to optimize both types of fairness simultaneously. Specifically, we propose to estimate client contribution in gradient and data space. In gradient space, we monitor the gradient direction differences of each client with respect to others. And in data space, we measure the prediction error on client data using an auxiliary model. Based on this contribution estimation, we propose a FL method, federated training via contribution estimation (FedCE), i.e., using estimation as global model aggregation weights. We have theoretically analyzed our method and empirically evaluated it on two real-world medical datasets. The effectiveness of our approach has been validated with significant performance improvements, better collaboration fairness, better performance fairness, and comprehensive analytical studies.

如何确保公平性是联合学习(FL)的一个重要课题。最近的研究已经调查了如何根据客户的贡献来奖励他们(协作公平),以及如何在客户之间实现统一的表现(表现公平)。尽管在任何一个方面都取得了进展,但我们认为把它们放在一起考虑是至关重要的,以便吸引和激励更多不同的客户加入FL,从而得出一个高质量的全局模型。在这项工作中,我们提出了一种新的方法来同时优化这两种类型的公平性。具体来说,我们建议在梯度和数据空间中估计客户的贡献。在梯度空间,我们监测每个客户相对于其他客户的梯度方向差异。而在数据空间,我们使用一个辅助模型来测量客户数据的预测误差。基于这种贡献估计,我们提出了一种FL方法,即通过贡献估计进行联合训练(FedCE),即使用估计作为全局模型聚合的权重。我们从理论上分析了我们的方法,并在两个真实世界的医疗数据集上对其进行了经验评估。我们的方法的有效性已经得到了验证,有明显的性能改进,更好的协作公平性,更好的性能公平性,以及全面的分析研究。


Deep Prototypical-Parts Ease Morphological Kidney Stone Identification and are Competitively Robust to Photometric Perturbations

深层原型部件易于形态学肾结石识别,并且对光度扰动具有竞争鲁棒性

Identifying the type of kidney stones can allow urologists to determine their cause of formation, improving the prescription of appropriate treatments to diminish future relapses. Currently, the associated ex-vivo diagnosis (known as Morpho-constitutional Analysis, MCA) is time-consuming, expensive and requires a great deal of experience, as it requires a visual analysis component that is highly operator dependant. Recently, machine learning methods have been developed for in-vivo endoscopic stone recognition. Deep Learning (DL) based methods outperform non-DL methods in terms of accuracy but lack explainability. Despite this trade-off, when it comes to making high-stakes decisions, it's important to prioritize understandable Computer-Aided Diagnosis (CADx) that suggests a course of action based on reasonable evidence, rather than a model prescribing a course of action. In this proposal, we learn Prototypical Parts (PPs) per kidney stone subtype, which are used by the DL model to generate an output classification. Using PPs in the classification task enables case-based reasoning explanations for such output, thus making the model interpretable. In addition, we modify global visual characteristics to describe their relevance to the PPs and the sensitivity of our model's performance. With this, we provide explanations with additional information at the sample, class and model levels in contrast to previous works. Although our implementation's average accuracy is lower than state-of-the-art (SOTA) non-interpretable DL models by 1.5 %, our models perform 2.8% better on perturbed images with a lower standard deviation, without adversarial training. Thus, Learning PPs has the potential to create more robust DL models.

识别肾结石的类型可以让泌尿科医生确定其形成的原因,改善适当的治疗处方以减少未来的复发。目前,相关的体外诊断(称为Morpho-constitutional Analysis,MCA)是耗时的、昂贵的,并且需要大量的经验,因为它需要一个高度依赖操作者的视觉分析组件。最近,机器学习方法已被开发用于体内内窥镜结石识别。基于深度学习(DL)的方法在准确性方面优于非DL方法,但缺乏可解释性。尽管有这种权衡,当涉及到做出高风险的决策时,重要的是优先考虑可理解的计算机辅助诊断(CADx),根据合理的证据提出行动方案,而不是由模型开出行动方案。在这个提案中,我们学习了每个肾结石亚型的原型部件(PPs),这些原型部件被DL模型用来生成一个输出分类。在分类任务中使用PPs能够对这种输出进行基于案例的推理解释,从而使模型可以解释。此外,我们修改了全局视觉特征,以描述它们与PPs的相关性和我们模型性能的敏感性。通过这一点,我们在样本、类别和模型层面上提供了额外的信息的解释,与之前的工作形成了鲜明的对比。尽管我们的实现的平均准确率比最先进的(SOTA)不可解释的DL模型低1.5%,但我们的模型在没有对抗性训练的情况下,在标准差较低的扰动图像上表现好2.8%。因此,学习PPs有可能创造出更加稳健的DL模型。


Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis

医学图像的相干概念解释及其在皮肤病变诊断中的应用

Early detection of melanoma is crucial for preventing severe complications and increasing the chances of successful treatment. Existing deep learning approaches for melanoma skin lesion diagnosis are deemed black-box models, as they omit the rationale behind the model prediction, compromising the trustworthiness and acceptability of these diagnostic methods. Attempts to provide concept-based explanations are based on post-hoc approaches, which depend on an additional model to derive interpretations. In this paper, we propose an inherently interpretable framework to improve the interpretability of concept-based models by incorporating a hard attention mechanism and a coherence loss term to assure the visual coherence of concept activations by the concept encoder, without requiring the supervision of additional annotations. The proposed framework explains its decision in terms of human-interpretable concepts and their respective contribution to the final prediction, as well as a visual interpretation of the locations where the concept is present in the image. Experiments on skin image datasets demonstrate that our method outperforms existing black-box and concept-based models for skin lesion classification.

黑色素瘤的早期检测对于防止严重并发症和增加成功治疗的机会至关重要。现有的用于黑色素瘤皮损诊断的深度学习方法被认为是黑箱模型,因为它们省略了模型预测背后的原理,损害了这些诊断方法的可信度和可接受性。试图提供基于概念的解释是基于事后的方法,它依赖于一个额外的模型来推导解释。在本文中,我们提出了一个固有的可解释框架,以改善基于概念的模型的可解释性,方法是纳入一个硬注意机制和一个一致性损失项,以保证概念编码器对概念激活的视觉一致性,而不需要额外注释的监督。所提出的框架用人类可解释的概念和它们各自对最终预测的贡献来解释其决定,以及对概念在图像中存在的位置的视觉解释。在皮肤图像数据集上的实验表明,我们的方法优于现有的黑盒和基于概念的皮肤病变分类模型。


Image Quality-aware Diagnosis via Meta-knowledge Co-embedding

基于元知识共嵌入的图像质量感知诊断

Medical images usually suffer from image degradation in clinical practice, leading to decreased performance of deep learning-based models. To resolve this problem, most previous works have focused on filtering out degradation-causing low-quality images while ignoring their potential value for models. Through effectively learning and leveraging the knowledge of degradations, models can better resist their adverse effects and avoid misdiagnosis. In this paper, we raise the problem of image quality-aware diagnosis, which aims to take advantage of low-quality images and image quality labels to achieve a more accurate and robust diagnosis. However, the diversity of degradations and superficially unrelated targets between image quality assessment and disease diagnosis makes it still quite challenging to effectively leverage quality labels to assist diagnosis. Thus, to tackle these issues, we propose a novel meta-knowledge co-embedding network, consisting of two subnets: Task Net and Meta Learner. Task Net constructs an explicit quality information utilization mechanism to enhance diagnosis via knowledge co-embedding features, while Meta Learner ensures the effectiveness and constrains the semantics of these features via meta-learning and joint-encoding masking. Superior performance on five datasets with four widely-used medical imaging modalities demonstrates the effectiveness and generalizability of our method.

医学图像在临床实践中通常会出现图像退化,导致基于深度学习的模型的性能下降。为了解决这个问题,以前的大多数工作都集中在过滤掉造成退化的低质量图像,而忽略了它们对模型的潜在价值。通过有效地学习和利用退化的知识,模型可以更好地抵御其不利影响,避免误诊。在本文中,我们提出了图像质量感知诊断的问题,其目的是利用低质量图像和图像质量标签来实现更准确和稳健的诊断。然而,图像质量评估和疾病诊断之间退化的多样性和表面上不相关的目标,使得有效地利用质量标签来协助诊断仍然具有相当的挑战性。因此,为了解决这些问题,我们提出了一个新颖的元知识共同嵌入网络,由两个子网组成: 任务网和元学习者。任务网构建了一个明确的质量信息利用机制,通过知识共同嵌入的特征来增强诊断,而元学习者通过元学习和联合编码屏蔽来确保这些特征的有效性并约束其语义。在四个广泛使用的医学成像模式的五个数据集上的卓越表现证明了我们方法的有效性和通用性。


RepMode: Learning to Re-parameterize Diverse Experts for Subcellular Structure Prediction

RepMode:学习重新参数化不同专家的亚细胞结构预测

In biological research, fluorescence staining is a key technique to reveal the locations and morphology of subcellular structures. However, it is slow, expensive, and harmful to cells. In this paper, we model it as a deep learning task termed subcellular structure prediction (SSP), aiming to predict the 3D fluorescent images of multiple subcellular structures from a 3D transmitted-light image. Unfortunately, due to the limitations of current biotechnology, each image is partially labeled in SSP. Besides, naturally, subcellular structures vary considerably in size, which causes the multi-scale issue of SSP. To overcome these challenges, we propose Re-parameterizing Mixture-of-Diverse-Experts (RepMode), a network that dynamically organizes its parameters with task-aware priors to handle specified single-label prediction tasks. In RepMode, the Mixture-of-Diverse-Experts (MoDE) block is designed to learn the generalized parameters for all tasks, and gating re-parameterization (GatRep) is performed to generate the specialized parameters for each task, by which RepMode can maintain a compact practical topology exactly like a plain network, and meanwhile achieves a powerful theoretical topology. Comprehensive experiments show that RepMode can achieve state-of-the-art overall performance in SSP.

在生物学研究中,荧光染色是揭示亚细胞结构的位置和形态的关键技术。然而,它是缓慢的,昂贵的,并且对细胞有害。在本文中,我们将其建模为一个深度学习任务,称为亚细胞结构预测(SSP),旨在从三维透射光图像中预测多个亚细胞结构的三维荧光图像。不幸的是,由于目前生物技术的限制,在SSP中每张图像都被部分标记。此外,亚细胞结构的大小自然有很大的不同,这就造成了SSP的多尺度问题。为了克服这些挑战,我们提出了Re-parameterizing Mixture-of-Diverse-Experts(RepMode),这是一个动态组织其参数的网络,具有任务意识的先验,以处理指定的单标签预测任务。在RepMode中,多元专家混合物(MoDE)模块被设计为学习所有任务的广义参数,而门控再参数化(GatRep)被用来生成每个任务的专门参数,通过这种方式,RepMode可以保持一个与普通网络完全相同的紧凑的实际拓扑结构,同时实现一个强大的理论拓扑结构。综合实验表明,RepMode可以在SSP中实现最先进的整体性能。


Neuron Structure Modeling for Generalizable Remote Physiological Measurement

可推广的远程生理测量神经元结构建模

Remote photoplethysmography (rPPG) technology has drawn increasing attention in recent years. It can extract Blood Volume Pulse (BVP) from facial videos, making many applications like health monitoring and emotional analysis more accessible. However, as the BVP signal is easily affected by environmental changes, existing methods struggle to generalize well for unseen domains. In this paper, we systematically address the domain shift problem in the rPPG measurement task. We show that most domain generalization methods do not work well in this problem, as domain labels are ambiguous in complicated environmental changes. In light of this, we propose a domain-label-free approach called NEuron STructure modeling (NEST). NEST improves the generalization capacity by maximizing the coverage of feature space during training, which reduces the chance for under-optimized feature activation during inference. Besides, NEST can also enrich and enhance domain invariant features across multi-domain. We create and benchmark a large-scale domain generalization protocol for the rPPG measurement task. Extensive experiments show that our approach outperforms the state-of-the-art methods on both cross-dataset and intra-dataset settings.

近年来,远程光脑技术(rPPG)已经引起了越来越多的关注。它可以从面部视频中提取血容量脉冲(BVP),使许多应用,如健康监测和情绪分析更容易获得。然而,由于BVP信号很容易受到环境变化的影响,现有的方法很难对未见过的领域进行很好的概括。在本文中,我们系统地解决了rPPG测量任务中的领域转移问题。我们表明,大多数领域概括方法在这个问题上不能很好地工作,因为在复杂的环境变化中,领域标签是模糊的。有鉴于此,我们提出了一种无领域标签的方法,即NEuron STructure建模(NEST)。NEST通过在训练过程中最大限度地覆盖特征空间来提高泛化能力,从而减少推理过程中特征激活不足的机会。此外,NEST还可以在多领域中丰富和增强领域不变的特征。我们为rPPG测量任务创建了一个大规模的领域泛化协议,并对其进行基准测试。大量的实验表明,我们的方法在跨数据集和数据集内的设置上都优于最先进的方法。


Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses

基于机械和周期一致性损失的活细胞无监督轮廓跟踪

Analyzing the dynamic changes of cellular morphology is important for understanding the various functions and characteristics of live cells, including stem cells and metastatic cancer cells. To this end, we need to track all points on the highly deformable cellular contour in every frame of live cell video. Local shapes and textures on the contour are not evident, and their motions are complex, often with expansion and contraction of local contour features. The prior arts for optical flow or deep point set tracking are unsuited due to the fluidity of cells, and previous deep contour tracking does not consider point correspondence. We propose the first deep learning-based tracking of cellular (or more generally viscoelastic materials) contours with point correspondence by fusing dense representation between two contours with cross attention. Since it is impractical to manually label dense tracking points on the contour, unsupervised learning comprised of the mechanical and cyclical consistency losses is proposed to train our contour tracker. The mechanical loss forcing the points to move perpendicular to the contour effectively helps out. For quantitative evaluation, we labeled sparse tracking points along the contour of live cells from two live cell datasets taken with phase contrast and confocal fluorescence microscopes. Our contour tracker quantitatively outperforms compared methods and produces qualitatively more favorable results. Our code and data are publicly available at this https URL

分析细胞形态的动态变化对于理解活细胞的各种功能和特征非常重要,包括干细胞和转移性癌细胞。为此,我们需要在活细胞视频的每一帧中追踪高度变形的细胞轮廓上的所有点。轮廓上的局部形状和纹理并不明显,它们的运动也很复杂,经常有局部轮廓特征的扩张和收缩。由于细胞的流动性,先前的光流或深度点集跟踪的艺术是不适合的,而且先前的深度轮廓跟踪没有考虑点的对应关系。我们提出了第一个基于深度学习的细胞(或更普遍的粘弹性材料)轮廓跟踪,通过融合两个轮廓之间的密集表示和交叉注意来实现点的对应。由于手动标记轮廓上的密集跟踪点是不切实际的,因此我们提出了由机械和循环一致性损失组成的无监督学习来训练我们的轮廓跟踪器。迫使点垂直于轮廓线移动的机械损失有效地帮助了我们。为了进行定量评估,我们沿着活细胞的轮廓标注了稀疏的跟踪点,这两个活细胞数据集是用相差显微镜和共聚焦荧光显微镜拍摄的。我们的轮廓跟踪器在数量上优于比较的方法,在质量上产生更有利的结果。我们的代码和数据可在以下网址公开获取

发布于 2023-06-07 10:47 ・IP 属地上海
 
推荐文章