Domain Adaptation with Latent Diffusion Models for Segmentation and Classification
- In-Context Learning Unlocked for Diffusion Models: https://zhendong-wang.github.io/prompt-diffusion.github.io/, in-context visual/text prompts
Fast Adaptation with in-context learning to new inverse problems
Long Video Generation with Latent Diffusion Models via AutoPrompting
3D Latent Diffusion
NeRF type methods
- 3D Neural Field Generation using Triplane Diffusion (code available): Triplane Diffusion
- Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction (code available): Diffusion-NeRF
- One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization (code pending): Single Image
Video Diffusion
- Latent Video Diffusion Models for High-Fidelity Long Video Generation Video Diffusion
- MagicVideo: Efficient Video Generation With Latent Diffusion Models Video Diffusion 2
- Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models Latent Video Diffusion Latent CVPR (combining a SR DM and LDM) This is can used together with a 2D image latent diffusion model (e.g. stable diffusion), The key idea is Adjust the latent vector by a 3D temporal network
Training
- first use latent 2D diffusion encoder to obtain a code (spatial step) for
- then use temporal network to adjust the latent code, and combine with the original code (a convex combination)
- Do this for a couple of times Inference.
- Video Probabilistic Diffusion Models in Projected Latent Space
- Use the triplane idea (xy, xz, yz) latent codes (using 2D diffusion instead of 3D diffusion)
- First use video transformer to compress video C X H X W -> C X H’ X W’
- Then use three small transformers to project 3D into 2D i.e. $z_h = f_{\theta}(u_h)$, $z_w = f_{\theta}(u_w)$, $z_c = f_{\theta}(u_c)$ reducing space complexity to $O(HWC)$ to $O(HW) + O(CW) + O(HC)$
Long Video Generation
- NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation (Coarse to Fine model). Coarse diffusion with a fine diffusion: https://arxiv.org/pdf/2303.12346.pdf
-
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation, https://arxiv.org/pdf/2303.08320.pdf, generate residual due to highly correlated frames
- Flexible Diffusion Modeling of Long Videos: https://arxiv.org/pdf/2205.11495.pdf, conditional generation
- Video Diffusion Models https://arxiv.org/abs/2212.00235, 3D UNet
- VIDM: Video Implicit Diffusion Models https://arxiv.org/pdf/2212.00235.pdf combining a motion generator and a content generator. with normalization (INR like)
- Video Diffusion Models with Local-Global Context Guidance https://arxiv.org/pdf/2306.02562.pdf Global context and local context
- Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation https://arxiv.org/pdf/2307.06940.pdf
Visual AutoPrompting
- ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation https://arxiv.org/pdf/2305.04651.pdf, use GPT3 to change prompt and edit image
- Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models image editting without optimization. https://arxiv.org/pdf/2305.16807.pdf
- Visual Instruction Inversion: Image Editing via Visual Prompting https://arxiv.org/pdf/2307.14331.pdf
- Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models https://arxiv.org/abs/2209.06970
Test-time Adaptation
- Prompting Diffusion Representations for Cross-Domain Semantic Segmentation: https://arxiv.org/pdf/2307.02138.pdf use prompt to improve generalization ability of diffusion models
Inverse Problem
- Diffusion with Forward Models: Solving Stochastic
Inverse Problems Without Direct Supervision https://arxiv.org/pdf/2306.11719.pdf
Other Related Works
- MAGVIT: Masked Generative Video Transformer: https://arxiv.org/pdf/2212.05199.pdf
- MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation https://arxiv.org/pdf/2205.09853.pdf masked methods like MAE etc
- Diffusion Models as Masked Autoencoders https://arxiv.org/abs/2304.03283
- DIFFUSION MODELS ALREADY HAVE A SEMANTIC LATENT SPACE https://arxiv.org/pdf/2210.10960.pdf
- ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models https://arxiv.org/pdf/2302.07257.pdf
- Visual Instruction Tuning: https://arxiv.org/pdf/2304.08485.pdf
- Adversarial Discriminative Domain Adaptation https://arxiv.org/pdf/1702.05464.pdf
- Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Imaging Inverse Problems https://arxiv.org/pdf/2308.14409.pdf
sketch to image. VAE -> shared feature with downgraded image. (maybe try latent diffusion embedding) lora, shared feature embedding/text, change model itself, controlnet
VPDM Architecture
Triplane Representation of Knee MRI image
Architecture of Diffusion NeRF
Results for Sparse-View Reconstruction
This is a header
Some T-SQL Code
SELECT This, [Is], A, Code, Block -- Using SSMS style syntax highlighting
, REVERSE('abc')
FROM dbo.SomeTable s
CROSS JOIN dbo.OtherTable o;
Some PowerShell Code
Write-Host "This is a powershell Code block";
# There are many other languages you can use, but the style has to be loaded first
ForEach ($thing in $things) {
Write-Output "It highlights it using the GitHub style"
}