Offline
Bloomberg Logo

USA

Bloomberg’s building the world's most trusted information network for finance. Our 9,000+ engineers are building new systems for the Bloomberg Terminal to solve complex, real-world problems. We trust our teams to choose the right technologies for the job, and, at Bloomberg, that is often Python. Our 4,000+ Python developers have their hands in everything from financial analytics and data science to open source contributions.

Bloomberg

Senior Software Engineer - AI App Enablement & Observability

Dublin

Platform Engineering builds the core platforms, tooling, and paved roads that Bloomberg engineers rely on to ship reliable, secure, and high-performing systems at scale.

The AI App Enablement & Observability team accelerates how AI products are built across Bloomberg Industry Group. Our mission is to make AI systems reliable, performant, cost-efficient, and continuously improving through platform tooling, deep observability, and automated feedback loops.

We build developer-facing platforms and workflows that enable teams to experiment, deploy, and operate AI and agent-based systems with confidence. This includes LLM gateways, agent platforms, benchmarking systems, telemetry pipelines, and self-improving infrastructure that closes the loop between observability and action. We emphasise strong developer experience, intuitive APIs/SDKs, and end-to-end ownership.

What’s in it for you?

You will help define how Bloomberg Industry Group builds and operates AI systems at scale by working on platforms that:

  • Accelerate AI product development through reusable tooling and paved roads
  • Provide end-to-end observability across AI systems (models, agents, pipelines, applications)
  • Enable self-improving systems through telemetry-driven feedback loops
  • Optimise cost, performance, and reliability of AI workloads
  • Support both production AI systems and internal engineering agents You’ll collaborate across AI product, infrastructure, and platform teams to deliver foundational systems.

Responsibilities

  • Platform & Enablement
    • Build and evolve AI platform tooling (e.g., developer workflows, benchmarking systems)
    • Design developer-friendly APIs, SDKs, and interfaces
    • Contribute to systems across the Model Development Lifecycle (experimentation, deployment, evaluation)
  • Observability & Telemetry
    • Build and operate observability platforms and telemetry pipelines (logs, metrics, traces, events)
    • Provide visibility into latency, token usage, cost, quality, drift, and reliability
    • Define instrumentation standards, schemas, and conventions
    • Implement distributed tracing using modern approaches (e.g., OpenTelemetry)
  • AI System Insights & Debugging
    • Enable end-to-end debugging of AI and agent workflows (model calls, tool usage, retrieval, orchestration)
    • Build benchmarking, regression detection, and performance analysis capabilities
    • Support observability for both production systems and internal engineering agents
  • Closed-loop Optimization & Automation
    • Develop systems that turn telemetry into action (automated experimentation, regression detection, alerting)
    • Build feedback loops that continuously improve model quality and system behavior
    • Enable self-healing and self-optimising workflows
  • Cost, Performance & Reliability
    • Build tooling for cost visibility, forecasting, and optimization
    • Define SLOs, alerting, and performance tuning practices
    • Improve reliability and scalability of AI infrastructure
  • Ownership & Collaboration
    • Own projects end-to-end (RFCs, architecture, implementation, rollout, production support)
    • Partner with AI teams to drive adoption of platform tooling and standards
    • Produce high-quality documentation and improve developer experience

Requirements

  • Demonstrated experience building production software or platform systems
  • Strong engineering fundamentals with distributed systems or backend platforms
  • Experience or strong interest in observability and debugging complex systems
  • Experience or strong interest in AI/ML systems, LLMs, or agent-based architectures
  • Strong ownership mindset and ability to drive ambiguous problems to production
  • Hands-on experience with modern agentic coding tools (e.g., Claude Code, Codex CLI, Cursor) and multi-model workflows
  • Working knowledge of agent architecture internals (context engineering, tool loops, sub-agent orchestration)

Nice-to-Haves

  • Experience with OpenTelemetry and modern observability ecosystems, including instrumentation, collectors, exporters, and tools like Prometheus, Grafana, and tracing/log systems
  • Experience designing and operating telemetry pipelines, including sampling, retention, cardinality, and cost tradeoffs, as well as integrating observability into CI/CD and developer workflows
  • Familiarity with AI/agent frameworks, including instrumentation of LLM calls, tool usage, workflows, and evaluation signals (quality metrics, benchmarking, regression detection)
  • Experience building cost monitoring, forecasting, and optimization systems for AI workloads
  • Familiarity with cloud and infrastructure tooling (e.g., AWS, Azure, Kubernetes, Terraform)
  • Experience with agentic infrastructure concepts such as MCP servers, hooks, skills, subagents, sandboxing, and persistent memory patterns
  • Active engagement with the agentic engineering frontier, including emerging patterns (e.g., harness vs. model, review debt, feedback loops)
  • Demonstrated agent-native development practices (iterating with agents using testing, verification, and feedback loops)
  • Strong security awareness for autonomous systems, including sandboxing, prompt injection risks, credential exposure, and guardrails
Apply Now