Keeping your essential services running smoothly is crucial for any server. Ensuring that services like Nginx and MariaDB are always up and running can be a daunting task. In this article, we’ll explore a robust solution using a bash script combined with systemd to automate service monitoring and restarting, ensuring your services remain operational with minimal intervention.
Why Automate Service Monitoring?
Manual monitoring and restarting of services can be time-consuming and prone to human error. Automation not only ensures consistent uptime but also frees you up to focus on more critical tasks. By leveraging a bash script and systemd, you can create a reliable mechanism to monitor and restart services automatically.
Prerequisites
Before diving into the setup, ensure you have the following:
- A server running Debian 12 or a similar Linux distribution.
- Basic knowledge of bash scripting.
- Systemd installed on your server.
The Bash Script
Our script monitors specified services, restarts them if they fail, and sends email notifications using SendGrid. It also integrates with the OpenAI API to provide summarized log insights and suggested solutions.
Here’s the complete bash script:
#!/bin/bash # Configuration SENDGRID_API_KEY="YOUR_SENDGRID_API_KEY" EMAIL_FROM="[email protected]" EMAIL_TO="[email protected]" CHECK_INTERVAL=60 # in seconds LOG_FILE="/var/log/service_monitor.log" OPENAI_API_KEY="YOUR_OPENAI_API_KEY" # Function to log messages log_message() { local message=$1 echo "$(date '+%Y-%m-%d %H:%M:%S') - $message" >> $LOG_FILE } # Function to sanitize content sanitize_content() { local content=$1 echo "$content" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read())[1:-1])' } # Function to get a summary and suggested solution from OpenAI get_openai_summary() { local log_content=$1 local prompt="Summarize the following error log and suggest a solution:\n\n$log_content" local payload=$(jq -n \ --arg model "gpt-3.5-turbo" \ --arg role1 "system" \ --arg content1 "You are an assistant that summarizes error logs and provides suggested solutions." \ --arg role2 "user" \ --arg content2 "$prompt" \ '{model: $model, messages: [{role: $role1, content: $content1}, {role: $role2, content: $content2}]}' ) log_message "OpenAI payload: $payload" local response=$(curl --silent --request POST \ --url https://api.openai.com/v1/chat/completions \ --header "Content-Type: application/json" \ --header "Authorization: Bearer $OPENAI_API_KEY" \ --data "$payload") log_message "OpenAI response: $response" if [[ "$response" == *"choices"* ]]; then local summary=$(echo "$response" | python3 -c 'import sys, json; print(json.load(sys.stdin)["choices"][0]["message"]["content"])' 2>> $LOG_FILE) echo "$summary" else echo "Error: Failed to get a valid response from OpenAI." log_message "Error: Failed to get a valid response from OpenAI. Response: $response" fi } # Function to send an email using SendGrid send_email() { local subject=$1 local body=$2 log_message "Sending email with subject: $subject" local payload=$(jq -n \ --arg email_to "$EMAIL_TO" \ --arg email_from "$EMAIL_FROM" \ --arg subject "$subject" \ --arg body "$body" \ '{personalizations: [{to: [{email: $email_to}]}], from: {email: $email_from}, subject: $subject, content: [{type: "text/html", value: $body}]}' ) log_message "Email payload: $payload" local response=$(curl --silent --request POST \ --url https://api.sendgrid.com/v3/mail/send \ --header "Authorization: Bearer $SENDGRID_API_KEY" \ --header 'Content-Type: application/json' \ --data "$payload") log_message "SendGrid response: $response" } # Function to restart service and send emails based on the outcome check_and_restart_service() { local service=$1 local status local attempt=0 local max_attempts=3 local wait_times=(30 60 80) while [ $attempt -lt $max_attempts ]; do if ! systemctl is-active --quiet $service; then log_content=$(systemctl status $service --no-pager | head -c 1000) dmesg_content=$(dmesg | tail -n 10 | head -c 1000) sanitized_log_content=$(sanitize_content "$log_content") sanitized_dmesg_content=$(sanitize_content "$dmesg_content") log_message "$service is not active. Attempting to restart (Attempt $((attempt + 1)))..." local openai_summary=$(get_openai_summary "$sanitized_log_content") local email_body="<html> <body> <h2>The $service service is not active</h2> <p><strong>Service Log:</strong></p> <pre>$sanitized_log_content</pre> <p><strong>dmesg Output:</strong></p> <pre>$sanitized_dmesg_content</pre> <p><strong>Summary and Suggested Solution:</strong></p> <pre>$openai_summary</pre> </body> </html>" send_email "$service Service Failed" "$email_body" systemctl restart $service sleep ${wait_times[$attempt]} attempt=$((attempt + 1)) if systemctl is-active --quiet $service; then log_message "$service has been restarted successfully after attempt $attempt." local restored_body="<html> <body> <h2>The $service service has been restored</h2> </body> </html>" send_email "$service Service Restored" "$restored_body" break fi else log_message "$service is running normally." break fi done if [ $attempt -eq $max_attempts ]; then log_message "Failed to restart $service after $max_attempts attempts." local failed_body="<html> <body> <h2>Failed to restart $service after $max_attempts attempts</h2> <p><strong>Service Log:</strong></p> <pre>$sanitized_log_content</pre> <p><strong>dmesg Output:</strong></p> <pre>$sanitized_dmesg_content</pre> </body> </html>" send_email "$service Failed to Restart" "$failed_body" fi } # Main monitoring loop monitor_services() { sleep 60 # Wait for 1 minute after reboot before starting checks while true; do check_and_restart_service "nginx" check_and_restart_service "mariadb" sleep $CHECK_INTERVAL # Wait before starting checks again done } # Test email function test_email() { send_email "Test Email" "<html><body><h2>This is a test email to verify SendGrid settings.</h2></body></html>" echo "Test email sent." } # Check command line argument case "$1" in start) monitor_services ;; test-email) test_email ;; *) echo "Usage: $0 {start|test-email}" ;; esac
Setting Up the Script as a Service
To ensure our monitoring script runs continuously and restarts automatically if it fails, we’ll set it up as a systemd service. This way, it will start on boot and handle any unexpected restarts gracefully, it will also install necessary packages.
Create the following setup script:
#!/bin/bash # Configuration SERVICE_MONITOR_SCRIPT="/path/to/your/service_monitor.sh" SERVICE_MONITOR_DEST="/usr/local/bin/service_monitor.sh" SERVICE_NAME="service_monitor" SERVICE_FILE="/etc/systemd/system/${SERVICE_NAME}.service" # Function to install necessary packages install_packages() { echo "Installing necessary packages..." # Update the package list apt-get update # Install required packages apt-get install -y curl python3 python3-pip systemd jq # Install the OpenAI library if not already installed pip3 install openai echo "Packages installed successfully." } # Function to setup the service monitor setup_service_monitor() { echo "Setting up Service Monitor..." # Copy the service monitor script to the destination cp $SERVICE_MONITOR_SCRIPT $SERVICE_MONITOR_DEST chmod +x $SERVICE_MONITOR_DEST # Create the systemd service file cat <<EOL > $SERVICE_FILE [Unit] Description=Service Monitor for Nginx and MariaDB After=network.target [Service] ExecStart=$SERVICE_MONITOR_DEST start Restart=always User=root [Install] WantedBy=multi-user.target EOL # Reload systemd, enable and start the service systemctl daemon-reload systemctl enable $SERVICE_NAME systemctl start $SERVICE_NAME echo "Service Monitor has been set up and started." } # Function to remove the service monitor remove_service_monitor() { echo "Removing Service Monitor..." # Stop and disable the service systemctl stop $SERVICE_NAME systemctl disable $SERVICE_NAME # Remove the script and service file rm -f $SERVICE_MONITOR_DEST rm -f $SERVICE_FILE # Reload systemd systemctl daemon-reload echo "Service Monitor has been removed." } # Check command line argument if [ "$1" == "setup" ]; then install_packages setup_service_monitor elif [ "$1" == "remove" ]; then remove_service_monitor elif [ "$1" == "test-email" ]; then $SERVICE_MONITOR_DEST test-email else echo "Usage: $0 {setup|remove|test-email}" fi
Setting Up the Service
- Save the Scripts: Save the monitoring script as
service_monitor.sh
and the setup script assetup_service_monitor.sh
. - Make Scripts Executable: Run
chmod +x service_monitor.sh setup_service_monitor.sh
. - Run the Setup Script: Execute
./setup_service_monitor.sh setup
to set up the service.
Monitoring and Testing
To ensure everything is working correctly, you can test the email functionality by running:
./setup_service_monitor.sh test-email
This should send a test email to verify that your SendGrid settings are configured correctly.
Conclusion
You can expand the script to include additional services.
Automating service monitoring and restarting using a bash script and systemd can greatly enhance the reliability and uptime of your critical services. This method ensures consistent service availability and offers detailed logs and email notifications for improved tracking and troubleshooting.
For more details on systemd and bash scripting, check out these resources:
Have any suggestions or feedback? We’d love to hear from you! Please share your thoughts in the comments below.