Skip to content
  • info@digitalxnode.com
  • GF 27, TDI Center, Near Jasola Apollo Metro Station 110025
digital-x-node-logo
  • Home
  • Company

    Simplifying IT for a complex world.

    • About Us
    • Help & FAQs
    • Partners
    • Why Choose Us
  • Our Services
  • Blogs
  • Recruitment
    • FTE 
    • Staff Augmentation
    • Jobs
  • Bench Resources
Contact
  • Home
  • Company

    Simplifying IT for a complex world.

    • About Us
    • Help & FAQs
    • Partners
    • Why Choose Us
  • Our Services
  • Blogs
  • Recruitment
    • FTE 
    • Staff Augmentation
    • Jobs
  • Bench Resources
digital-x-node-logo

AI Inference Junior Engineer

  • Home
  • Blog Details
  • June 23 2026
  • admin

AI Inference Junior Engineer | Delhi , Noida

We are seeking a highly motivated AI Inference Junior Engineer to support the deployment, optimization, and operation of AI models across modern GPU infrastructure. This role is ideal for early-career engineers passionate about Artificial Intelligence, Machine Learning Infrastructure, Cloud Computing, GPU Acceleration, and Large Language Model (LLM) serving.

As an AI Inference Junior Engineer, you will work alongside experienced AI, platform, and cloud engineers to deploy and manage production-grade AI models, optimize inference performance, support scalable serving environments, and contribute to AI platform development. You will gain hands-on experience with cutting-edge AI technologies, NVIDIA GPU environments, Kubernetes-based infrastructure, cloud-native platforms, and modern inference frameworks.

This position provides an excellent opportunity to build expertise in AI infrastructure, model serving, GPU optimization, and large-scale AI deployment while contributing to innovative AI-powered products and services.

Key Responsibilities

AI Model Deployment & Operations

  • Assist in deploying and managing Large Language Models (LLMs), multimodal models, vision models, speech models, and embedding models.
  • Support AI model serving and inference workflows across production and testing environments.
  • Participate in model versioning, deployment validation, and rollback procedures.
  • Assist in implementing scalable AI serving architectures.
  • Monitor model performance and help optimize inference efficiency.

GPU & Performance Optimization

  • Support GPU resource monitoring and utilization analysis.
  • Assist in optimizing inference latency, throughput, and memory usage.
  • Learn and implement model optimization techniques including quantization and caching strategies.
  • Help identify performance bottlenecks using monitoring and profiling tools.
  • Contribute to benchmarking AI workloads across different hardware environments.

Cloud & Infrastructure Engineering

  • Support deployment of containerized AI workloads using Kubernetes and Docker.
  • Assist in managing cloud-based AI infrastructure environments.
  • Participate in infrastructure monitoring, troubleshooting, and maintenance activities.
  • Help maintain scalable and reliable inference clusters.
  • Support automation and infrastructure improvement initiatives.

Inference Framework Support

Gain experience with modern inference frameworks such as:

  • vLLM
  • NVIDIA TensorRT-LLM
  • Triton Inference Server
  • TGI (Text Generation Inference)
  • Ollama
  • Ray Serve
  • SGLang
  • OpenAI-Compatible APIs

Platform Development

  • Assist in developing APIs and backend services supporting AI workloads.
  • Support authentication, usage tracking, monitoring, and platform integrations.
  • Collaborate with engineering teams to improve platform reliability and scalability.
  • Participate in testing and deployment activities for AI platform services.
  • Contribute to documentation and operational procedures.

Collaboration & Learning

  • Work closely with AI engineers, data scientists, cloud engineers, and platform teams.
  • Participate in code reviews, technical discussions, and knowledge-sharing sessions.
  • Stay updated with emerging AI technologies, LLM frameworks, and GPU innovations.
  • Continuously improve technical skills in AI infrastructure and cloud-native technologies.

Required Skills

Programming & Development

  • Strong foundation in Python programming.
  • Understanding of software engineering principles and coding best practices.
  • Familiarity with REST APIs and backend development concepts.
  • Basic understanding of version control systems such as Git.
  • Ability to write clean, maintainable, and testable code.

AI & Machine Learning

  • Understanding of Machine Learning fundamentals and AI model deployment concepts.
  • Familiarity with transformer architectures and Large Language Models (LLMs).
  • Exposure to:
    • PyTorch
    • Hugging Face Transformers
    • Embedding Models
    • RAG (Retrieval-Augmented Generation)
  • Interest in model optimization and inference performance.

Cloud & Infrastructure

  • Basic knowledge of Docker and containerization concepts.
  • Familiarity with Kubernetes fundamentals.
  • Understanding of Linux operating systems and command-line environments.
  • Exposure to AWS, Azure, GCP, or cloud computing concepts.
  • Basic knowledge of distributed systems and cloud-native architectures.

Databases & Backend Technologies

  • Understanding of:
    • PostgreSQL
    • MongoDB
    • Redis
  • Familiarity with API integrations and data management concepts.
  • Basic understanding of event-driven systems and microservices architecture.

Professional Skills

  • Strong analytical and problem-solving abilities.
  • Excellent communication and collaboration skills.
  • Ability to learn new technologies quickly.
  • Strong attention to detail and commitment to quality.
  • Self-motivated and eager to work in fast-paced technology environments.

Education

  • Bachelor’s degree in Computer Science, Artificial Intelligence, Data Science, Information Technology, Software Engineering, Electronics, or a related field.
  • B.Tech, BE, BCA, B.Sc. (Computer Science/IT), or equivalent qualification.
  • Master’s degree in AI, Machine Learning, Computer Science, or related disciplines is an advantage but not mandatory.
Technology: Python Mongodb REST APIs AWS or Google Cloud LLM
Job Type: Full Time
Job Location: Noida New Delhi
Work Mode: Onsite
Experience: 1 to 3 Years

Apply for this position

Allowed Type(s): .pdf, .doc, .docx
Back to listings
Previous Post
Oracle Solution Architect

LET'S CONNECT

We're here to help

Have questions about our IT staffing, recruitment, or digital solutions? Our team is ready to assist you.

Contact Us
digitalxnode-white-logo

DigitalXnode is one of the leading companies operating in the converged domain of Technology, Finance, and Consulting.

 

Company

Partner
About Us
Why Choose Us

Solution

Consulting
Financial Services
Digital Marketing

Useful Links

Hot Jobs
Recruitment
Job Listing
Candidate Registration
Contact Us

© 2026 DigitalXNode. All Rights Reserved. | Developed by ASMZ Intl

Privacy Policy
Terms & Conditions