Senior AI Engineer (f/m/d) - Foundation Model Training & Infrastructure for Power Grids

Über die Aufgabe

Standort
Deutschland
Bayern
Erlangen

  • Land: Germany
  • Bundesland/Kanton/Bezirk: Berlin
  • Stadt: Berlin

Unternehmen
Siemens Energy Global GmbH & Co. KG
Organisation
Grid Technologies
Geschäftsbereich
Digital Grid
Vollzeit / Teilzeit
Vollzeit
Erfahrungsniveau
Berufserfahrene
A Snapshot of Your Day 

We are seeking a highly skilled and driven Senior AI Engineer to join our team as a founding member, developing the critical data and AI infrastructure for training foundation models for power grid applications. You will be instrumental in building and optimizing the end-to-end systems, data pipelines, and training processes that will power our AI research. Working closely with research scientists, you will translate pioneering research into robust, scalable, and efficient implementations, enabling the rapid development and deployment of transformational AI solutions. This role requires deep hands-on expertise in distributed training, data engineering, MLOps, a proven track record of building scalable AI infrastructure


How You’ll Make an Impact 
  • Design, build, and rigorously optimize everything vital for large-scale training and/or fine-tuning with different model architectures. Includes everything from data loading to distributed training to inference, to enhance the MFU (Model Flop Utilization) on large compute clusters as well as collaborate closely and proactively with research scientists, translating research models and algorithms into high-performance, production-ready code and infrastructure. Ability to implement, integrate & test the latest advancements from research publications or open-source code
  • Relentlessly profile and resolve training performance bottlenecks, optimizing every layer of the training stack from data loading to model inference for speed and efficiency. 
  • Contribute to technology evaluations and selection of hardware, software, and cloud services that will define our AI infrastructure platform.
  • Experience with MLOps frameworks (MLFlow, WnB, etc) to implement best practices across the model lifecycle – development, training, validation, and monitoring – ensuring reproducibility, reliability, and continuous improvement.
  • Create thorough documentation for infrastructure, data pipelines, and training procedures, ensuring maintainability and knowledge transfer within the growing AI lab.
  • Stay at the forefront of advancements in large-scale training strategies and data engineering, and proactively driving improvements and innovation in our workflows and infrastructure as well as high-agency individual demonstrating initiative, problem-solving, and a commitment to delivering robust and scalable solutions for rapid prototyping and turnaround 

What You Bring
  • Deep practical expertise with AI frameworks (PyTorch, Jax, PyTorch Lightning, etc). Hands-on experience with large-scale multi-node GPU training, and other optimization strategies for developing large foundation models, across various model architectures. Ability to scale solutions involving large datasets and sophisticated models on distributed compute infrastructure.
  • Excellent problem-solving, debugging, and performance optimization skills, with a data-driven approach to identifying and resolving technical challenges.
  • Strong communication and partnership skills, with a collaborative approach to working with research scientists and other engineers.
  • Experience with MLOps best practices for model tracking, evaluation and deployment
  • Bachelor's or Master's degree or equivalent experience in Computer Science, Engineering, or a related technical field.
  • Long term hands-on experience in a Data & AI Engineer, Machine Learning Engineer, specifically building and optimizing infrastructure for large-scale machine learning systems.

Bonus Points
  • Public GitHub profile demonstrating a track record of open-source contributions to relevant projects in data engineering or deep learning infrastructure is a BIG PLUS
  • Experience writing CUDA/Triton/CUTLASS kernels 
  • Experience with performance monitoring and profiling tools for distributed training and data pipelines

About the Team

Our Grid Technology division enables a reliable, sustainable, and digital grid. The power grid is the backbone of the energy transition. Siemens Energy offers a leading portfolio and solutions in HVDC transmission, grid stabilization and storage, high voltage switchgears and transformers, 
and digital grid technology. 


Who is Siemens Energy? 

At Siemens Energy, we are more than just an energy technology company. With ~100.000 dedicated employees in more than 90 countries, we develop the energy systems of the future, ensuring that the growing energy demand of the global community is met reliably and sustainably. The technologies created in our research departments and factories drive the energy transition and provide the base for one sixth of the world's electricity generation.

Our global team is committed to making sustainable, reliable, and affordable energy a reality by pushing the boundaries of what is possible. We uphold a 150-year legacy of innovation that encourages our search for people who will support our focus on decarbonization, new technologies, and energy transformation.      


Find out how you can make a difference at Siemens Energy: https://www.siemens-energy.com/employeevideo

Our Commitment to Diversity 

Lucky for us, we are not all the same. Through diversity we generate power. We run on inclusion and our combined creative energy is fueled by over 130 nationalities. Siemens Energy celebrates character – no matter what ethnic background, gender, age, religion, identity, or disability. We energize society, all of society, and we do not discriminate based on our differences. 

Rewards/Benefits 
  • In addition to an attractive remuneration package in line with the market, you can expect an attractive employer-financed company pension scheme
  • We also offer the opportunity to become a Siemens Energy shareholder
  • We offer our employees the opportunity to work flexibly and remotely, and our inspiring offices provide space for collaboration and creativity
  • The professional and personal development of our employees is very important to us. We provide them with the opportunities to learn and develop in a self-determined way, various attractive programmes and learning materials are available for this purpose
  • In relation to the "compatibility of family and work", we have a wide range of offers, e.g. flexible working time models, childcare places at many locations, the possibility of trial part-time work or even a sabbatical  .
We value equal opportunities and welcome applications from people with disabilities
#LI-AB3
https://jobs.siemens-energy.com/jobs