Zephyr Enterprise Services
Introduction
The Webhook part of the Zephyr Enterprise suite is the module that facilitates data exchange and sync between ZE & JIRA and is responsible for requirements and defects. Looking at the high-volume data growth with our large customers and making the solution more scalable, a decision has been taken to decouple and transition the entire webhook processing to run asynchronously.
Problem Statement
The main problem areas identified in the current implementation are as follows:
Extreme Resource Utilization
Currently, the JIRA webhook and Zephyr Enterprise applications are deployed on the same infrastructure, which becomes a bottleneck during high peak loads. This causes the infrastructure to be utilized more efficiently, resulting in poor application performance.
Scalability Issues
The existing webhook processing module is tightly integrated with core Zephyr Enterprise components, so it is realistically difficult to scale webhook processing out.
Tight Coupling of Webhook with ZE core
Tight Coupling of event processor with ZE core
Legacy Queue – no cross visibility in queues
Data Inconsistencies
Data inconsistencies are often observed due to the JIRA sync processing, which may cause issues like duplication, incomplete details, etc.
Events are often not processed in the order as received.
Duplicate requirements get created in ZE.
Longer processing time causes a huge backlog and lag.
Data loss while processing in case of failures.
Excessive JIRA Calls
In the existing implementation of webhook processing, JIRA APIs are being called excessively, resulting in a delay in overall execution, additional resource utilization, and more burden on the JIRA environment.
Traceability
Insufficient audit trails are maintained in the existing webhook processing components; hence, there are no ways to trace and backtrack the events in case of failures and other investigations.
Proposed Solution
The solutions identified to deal with the existing issues around webhook processing are as below:
Decouple
The webhook system will be decoupled from the existing ZE and implemented as a separate runtime.
Decoupling will allow us to run the Webhook solution outside the ZE core infrastructure, allowing both systems to function independently.
Achieve asynchronous processing of JIRA events independent of ZE application resources.
Centralized Queue Management
We plan to transition from the legacy/file-based queueing mechanism to RabbitMQ, a well-known and widely used message broker system.
Centralized queueing shall provide flexibility and ensure the order in which events are processed.
Better visibility in terms of event processing, e.g., events in the queue, consumption, traffic trends, and so on, through RabbitMQ user panels.
RabbitMQ also supports multi-node setup, which helps achieve scalability needs as and when required.
Fault Tolerance
Implement Persistent queues to prevent data loss in the event of the RabbitMQ server and/or event consumer components go down or reboot.
Follow the Handshake mechanism to inform and remove a request from RabbitMQ only when the consumer has successfully consumed and processed the event.
Explore the feasibility of implementing a handshake between JIRA and the Webhook system and introducing a retry mechanism during service failures.
Explore feasibility and ways to make the Webhook system aware of JIRA rate limits and avoid blocking other JIRA users.
Audit Logs
Bridge the gap in the current implementation and capture adequate logs and audit trails for webhook processing.
Note
The services are compatible with deployment on both Windows and Linux.
This service is optional.
JAVA 17.0.10 is required for running this service, so we recommend a separate server for running the services.