Staff Software Engineer, Systems Reliability

Johns Creek - Georgia

Date Posted: Dec. 04, 2018

Requisition ID: MAC13099


Job Overview:


Macys Tech is looking for a Staff Software Engineer to ensure the applications are highly available, reliable, secure and scalable for Omni Channel Order Management and Supply Chain Platforms. We manage millions of orders per day from,, Mobile Apps, more than 500 stores, and Call Centers.


We are looking for someone who have a strong mindset of application support with a development background and experience transforming legacy applications & systems into next generation cloud native platforms on GCP or Azure. This person will work and collaborate closely with our software engineering, architecture and operations teams on cloud-based application monitoring and deployment as we begin our journey on building a cloud platform.  The activities include application design reviews, code reviews, writing code for enhancements and championing the operational readiness of mission critical applications before production launch. Perform other duties as assigned.


Essential Functions:


Owns and leads the strategy, and adoption of cloud-based application monitoring and deployment.

Ensure services are designed with 24/7 availability and operational readiness.

Implement the scripts for proactive monitoring, alerting, trend analysis and self-healing systems.

Work with product engineering as needed to ensure high availability of services.    

Work with platform and stability & performance teams to influence framework or architectural changes based on learnings from production.

Responsible for review of topology, availability, capacity, caching, deployment, performance, stability and reliability of services.

Perform Root Cause Analysis and proactively prevent recurrence of issues through design, testing and implementation of software-based solutions.

Be the team’s representative on production calls for the application development team to track, manage, trouble-shoot and fix production issues both short term and long term.

Monitor production issue queue on a rotational basis and work with business teams to prioritize, analyze and manage them to closure.

Review system support documents and update production application service run books where needed.

Serve as a coach and mentor to more junior engineers to include delegating and managing tasks, as appropriate.

Evaluate the applicability of leading-edge technologies and use this information to significantly influence future technology strategies.

Consistently demonstrate regular, dependable attendance & punctuality.






Bachelor's degree in Computer Science and/or Engineering or an equivalent combination of education and experience; Master's degree preferred.

8+ years of experience in full life cycle development of J2EE systems.

5+ years of experience with Monitoring tools like DynaTrace, Splunk, KeyNote is a strong plus.

5+ years of experience on application Profiling Skills (Java Core, Thread Dumps etc.).

5+ years of Systems Engineering in 24x7 Production Services environments.

Experience with scripting/programming languages such as C, Java, Perl, Python, Go or Shell scripting is a strong plus.

Deep knowledge of SQL and NoSQL databases.

Experience working in Agile and DevOps environment.

Deep understanding of Cloud Architecture and Operations including: migration, resilience, maintainability, and cost efficiency.

Understanding of public cloud based distributed software systems.

Understanding of 12-factor architecture methodology and its benefits to cloud success.

Knowledge of Google cloud platform and Stackdriver.

Understanding of Linux Operating System and experience analyzing and diagnosing distributed systems and Linux systems including file systems, protocols and libraries.


Communication Skills:


Excellent written and verbal communication skills.

Ability to read, write, and interpret complex technical documents.


Mathematical Skills:


Basic math functions such as addition, subtraction, multiplication, division, and analytical skills.


Reasoning Ability:


Very strong analysis/troubleshooting skills, strong partnering/relationship building skills. 

Ability to consider options and make business decisions (e.g. selection of tools/methodologies for projects). 


Physical Demands:


This position involves regular walking, standing, sitting for extended periods of time, hearing, and talking.

May occasionally involve stooping, kneeling, or crouching.

May involve close vision, color vision, depth perception, focus adjustment, and viewing computer monitor for extended periods of time. 

Involves manual dexterity for using keyboard, mouse, and other office equipment.

May involve moving or lifting items under 10 pounds.


Other Skills:


Deep troubleshooting and scripting skills to improve the availability, performance, and security of services.

Excellent troubleshooting skills, encompassing software, systems, and network. 


Work Hours:


Ability to work a flexible schedule based on department and company needs.

Company Profile:


Macy’s Inc. is one of the nation’s premier retailers.  With fiscal 2016 sales of $25.778 billion and approximately 140,000 employees, the company operates more than 700 department stores under the nameplates Macy’s and Bloomingdale’s, and approximately 125 specialty stores that include Bloomingdale’s The Outlet, Bluemercury and Macy’s Backstage.  Macy’s, Inc. operates stores in 45 states, the District of Columbia, Guam and Puerto Rico, as well as, and  Bloomingdale’s stores in Dubai and Kuwait are operated by Al Tayer Group LLC under license agreements.  Macy’s, Inc. has corporate offices in Cincinnati, Ohio and New York, New York.

This job description is not all inclusive. Macy’s Inc. reserves the right to amend this job description at any time. Macy's Inc. is an Equal Opportunity Employer, committed to a diverse and inclusive work environment.