Domain Adaptation with Latent Diffusion Models for Segmentation and Classification

In-Context Learning Unlocked for Diffusion Models: https://zhendong-wang.github.io/prompt-diffusion.github.io/, in-context visual/text prompts

Fast Adaptation with in-context learning to new inverse problems

Long Video Generation with Latent Diffusion Models via AutoPrompting

3D Latent Diffusion

NeRF type methods

3D Neural Field Generation using Triplane Diffusion (code available): Triplane Diffusion
Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction (code available): Diffusion-NeRF
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization (code pending): Single Image

Video Diffusion

Latent Video Diffusion Models for High-Fidelity Long Video Generation Video Diffusion
MagicVideo: Efficient Video Generation With Latent Diffusion Models Video Diffusion 2
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models Latent Video Diffusion Latent CVPR (combining a SR DM and LDM) This is can used together with a 2D image latent diffusion model (e.g. stable diffusion), The key idea is Adjust the latent vector by a 3D temporal network Training
1. first use latent 2D diffusion encoder to obtain a code (spatial step) for
2. then use temporal network to adjust the latent code, and combine with the original code (a convex combination)
3. Do this for a couple of times Inference.
Video Probabilistic Diffusion Models in Projected Latent Space
1. Use the triplane idea (xy, xz, yz) latent codes (using 2D diffusion instead of 3D diffusion)
2. First use video transformer to compress video C X H X W -> C X H’ X W’
3. Then use three small transformers to project 3D into 2D i.e. $z_h = f_{\theta}(u_h)$, $z_w = f_{\theta}(u_w)$, $z_c = f_{\theta}(u_c)$ reducing space complexity to $O(HWC)$ to $O(HW) + O(CW) + O(HC)$

Long Video Generation

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation (Coarse to Fine model). Coarse diffusion with a fine diffusion: https://arxiv.org/pdf/2303.12346.pdf
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation, https://arxiv.org/pdf/2303.08320.pdf, generate residual due to highly correlated frames
Flexible Diffusion Modeling of Long Videos: https://arxiv.org/pdf/2205.11495.pdf, conditional generation
Video Diffusion Models https://arxiv.org/abs/2212.00235, 3D UNet
VIDM: Video Implicit Diffusion Models https://arxiv.org/pdf/2212.00235.pdf combining a motion generator and a content generator. with normalization (INR like)
Video Diffusion Models with Local-Global Context Guidance https://arxiv.org/pdf/2306.02562.pdf Global context and local context
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation https://arxiv.org/pdf/2307.06940.pdf

Visual AutoPrompting

ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation https://arxiv.org/pdf/2305.04651.pdf, use GPT3 to change prompt and edit image
Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models image editting without optimization. https://arxiv.org/pdf/2305.16807.pdf
Visual Instruction Inversion: Image Editing via Visual Prompting https://arxiv.org/pdf/2307.14331.pdf
Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models https://arxiv.org/abs/2209.06970
Test-time Adaptation
Prompting Diffusion Representations for Cross-Domain Semantic Segmentation: https://arxiv.org/pdf/2307.02138.pdf use prompt to improve generalization ability of diffusion models

Inverse Problem

Diffusion with Forward Models: Solving Stochastic Inverse Problems Without Direct Supervision https://arxiv.org/pdf/2306.11719.pdf
Other Related Works
MAGVIT: Masked Generative Video Transformer: https://arxiv.org/pdf/2212.05199.pdf
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation https://arxiv.org/pdf/2205.09853.pdf masked methods like MAE etc
Diffusion Models as Masked Autoencoders https://arxiv.org/abs/2304.03283
DIFFUSION MODELS ALREADY HAVE A SEMANTIC LATENT SPACE https://arxiv.org/pdf/2210.10960.pdf
ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models https://arxiv.org/pdf/2302.07257.pdf
Visual Instruction Tuning: https://arxiv.org/pdf/2304.08485.pdf
Adversarial Discriminative Domain Adaptation https://arxiv.org/pdf/1702.05464.pdf
Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Imaging Inverse Problems https://arxiv.org/pdf/2308.14409.pdf

sketch to image. VAE -> shared feature with downgraded image. (maybe try latent diffusion embedding) lora, shared feature embedding/text, change model itself, controlnet

VPDM Architecture

Triplane Representation of Knee MRI image

Architecture of Diffusion NeRF

Results for Sparse-View Reconstruction

This is a header

Some T-SQL Code

SELECT This, [Is], A, Code, Block -- Using SSMS style syntax highlighting
    , REVERSE('abc')
FROM dbo.SomeTable s
    CROSS JOIN dbo.OtherTable o;

Some PowerShell Code

Write-Host "This is a powershell Code block";

# There are many other languages you can use, but the style has to be loaded first

ForEach ($thing in $things) {
    Write-Output "It highlights it using the GitHub style"
}

Blog Archive

Archive of all previous blog posts

Road to Recovery - ACL完全断裂半月板复杂裂1.5年笔记

Fast Adaptation with in-context learning to new inverse problems

Long Video Generation with Latent Diffusion Models via AutoPrompting

3D Latent Diffusion

NeRF type methods

Video Diffusion

Long Video Generation

Visual AutoPrompting

Test-time Adaptation

Inverse Problem

Other Related Works

VPDM Architecture

Triplane Representation of Knee MRI image

Architecture of Diffusion NeRF

Results for Sparse-View Reconstruction

This is a header

Some T-SQL Code

Some PowerShell Code