This guide outlines the development of a data processing application in Go, aimed at processing scanning results from Google Pub/Sub, and managing distinct (IP, port, service) information sets with support for horizontal scaling and at-least-once processing semantics.
- Navigate to your project root.
- Run
go mod init <your_module_name>
to start a new Go module.
- Google Cloud Pub/Sub Client:
go get cloud.google.com/go/pubsub
- PostgreSQL Driver:
go get -u github.com/lib/pq
- Configuration Management (Optional):
go get github.com/spf13/viper
for advanced config handling.
Adhere to the directory structure provided, with each directory containing Go files specific to parts of the application.
data-processor/
│
├── cmd/
│ └── main.go # Entry point, sets up the application
│
├── config/
│ └── config.go # Configuration management
│
├── logger/
│ └── logger.go # Logging setup and configuration
│
├── models/
│ └── models.go # Data models for the application
│
├── repositories/
│ └── scan_result_processor.go # repositories
│
├── services/
│ └── scan_result_processor.go # service Layer
- Setup a subscriber to "scan-sub" for message fetching, ensuring at-least-once delivery.
- Decode messages based on
data_version
. Decode base64response_bytes_utf8
fordata_version = 1
, or directly useresponse_str
fordata_version = 2
. - Normalize data for consistent formatting.
- Insert or update records in PostgreSQL, keeping each (IP, port, service) record current with the latest timestamp and service response.
- Design the application to be stateless for horizontal scaling.
- Utilize database transactions for data integrity.
- Ensure multiple consumer support in Pub/Sub subscriber setup without message duplication.
- Use containerization (e.g., Docker) for deployment simplicity and scaling.
- Consider a managed Kubernetes service for easier scaling and management.
This roadmap provides a structured approach to developing an application that meets the requirements for efficient data processing, scalability, and reliability.