Task Description
• Research, design, and implement computer vision and vision-language model (VLM) use cases tailored for MBOS.
• Train and fine-tune deep learning models focusing on multimodal fusion and efficient deployment on embedded platforms.
• Work closely with software, hardware, and product teams to integrate developed algorithms into the overall vehicle system.
• Build and maintain toolchains for fine-tuning and deploying LLMs/VLMs, manage training clusters, and ensure efficient inference on both server-side and embedded targets.
• Experimentation, Evaluation, and Knowledge Transfer to other team members.
Qualifications
• Master degree or above in Computer Science, Electrical Engineering, Robotics, or a related field.
• Proven hands-on experience in developing and deploying computer vision and/or VLM algorithms, preferably in the automotive or robotics domain.
• Experience with deep learning frameworks (such as PyTorch, TensorFlow) and classical computer vision libraries
• Experience with fine-tuning and optimizing LLMs/VLMs (e.g., LoRA, RAG, prompt engineering)
• Familiarity with multimodal fusion techniques and the architecture of models like Transformer, BERT or similar.
• Solid grounding in both Natural Language Processing and Computer Vision; able to design and implement solutions that leverage both modalities.
• Experience with model compression, quantization, and deployment on resource-constrained environments
• Familiarity with dataset collection, labeling, and evaluation for multimodal tasks
• Strong programming skills in Python and C++.
• Experience with cloud services (e.g., Azure, AWS, Tencent) is a plus.
• Outstanding analytical and problem-solving skills.
• Technical leadership and ability to make decisions based on technical facts.
• Strong sense of ownership and drive.
• Good communication skills and ability to work in a collaborative, cross-functional environment.
• English proficiency in written and spoken form.