PhD Position F/M PhD Position Computer Vision / Deep Learning: Video Generation



October 16, 2022


2022-05263 - PhD Position F/M PhD Position Computer Vision / Deep Learning: Video Generation

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : PhD Position

The STARS research team combines advanced theory with cutting edge practice focusing on cognitive vision systems.

Team web site : https: //


The Ph.D. position

  • Starts October 2022.
  • The Inria STARS team is seeking for a Ph.D. researcher with strong background in computer vision, deep learning and machine learning.

    The candidate is expected to conduct research related to generative adversarial networks (GANs), including the development of computer vision algorithms for image and video generation.

    Main activities

    Despite remarkable progress in generative models, a pretrained network is currently limited in being able to generate only a single training subject / object within a single scenario the training data was pertained to.

    This Ph.D. thesis aims at bringing video generation to the next level by proposing strategies to generalize the generation ability of generative models by disentangling appearance and motion in the latent space and further disentangling motion in primary directions, applicable to any subject in any setting. This carries the premise of allowing for more complex settings incorporating interaction of subjects / objects.


    Generative adversarial networks (GANs) 1 have witnessed increased interest from academia and industry, due to exceptional capacity in generating highly realistic images 2, 3, 4, 5, 6, 7. Videos signify more complex data, due to the additional temporal dimension. While some research works showed early results in video generation 8-11, there are many open questions in the field.

  • Model architecture
  • The thesis firstly will investigate, how to design model architecture for generator and discriminator in generative models. We will explore traditional model architectures such as CNN and RNN, as well as Transformer-based generators. Our objective will be to explore whether we can design a unified model architecture that generalizes over categories, such as human bodies and faces. We will study how to connect different architectures, in order to create such a general system for cross-category generation.

  • 3D-aware generation
  • Learning 3D-aware models from 2D data has become a popular research topic in image generation. In this thesis, we will go one step further in this direction to explore novel view synthesis in video generation. We intend to combine jointly state-of-the-art novel view synthesis techniques with video generation, aiming at creating 3D-aware video generation. Our idea is to explore implicit representation (e.g., NeRF), explicit representation (e.g., 3D representation), as well as hybrid (implicit-explicit) representation in video generation models. One objective will be to design an efficient and effective representation for novel-view synthesis in video generation.

  • Generalizability
  • Finally, we will aim to design a universal model which is able to generate videos across categories. Most of current models focus on generating single category (e.g., faces, sky…). Currently, there is no models, which are able to generate complex multi-category videos (e.g. Kinetics-600). We plan to increase the complexity of video generative models and design a large-scale video GAN. The objective is to study whether big generative models are able to capture the distribution of complex video datasets and create semantic meaningful videos.

    Candidates must hold a Master degree or equivalent in Computer Science or a closely related discipline by the start date.

    The candidate must be grounded in the basics of computer vision, have solid mathematical and programming skills.

    Preferably in Python, OpenCV, deep learning framework Pytorch or Tensorflow.

    The candidate must be committed to scientific research and strong publications.

  • Remuneration

    Gross Salary per month: 2051€brut per month (year 1 & 2) and 2158€ brut/month (year 3

    General Information
  • Theme/Domain : Vision, perception and multimedia interpretation
  • Town/city : Sophia Antipolis
  • Inria Center : CRI Sophia Antipolis - Méditerranée
  • Starting date : 2022-10-01
  • Duration of contract : 3 years
  • Deadline to apply : 2022-10-16
  • Contacts
  • Inria Team : STARS
  • PhD Supervisor : Dantcheva Antitza /
