Automate Service Monitoring with a Bash Script and Systemd

Keeping your essential services running smoothly is crucial for any server. Ensuring that services like Nginx and MariaDB are always up and running can be a daunting task. In this article, we’ll explore a robust solution using a bash script combined with systemd to automate service monitoring and restarting, ensuring your services remain operational with minimal intervention.

Why Automate Service Monitoring?

Manual monitoring and restarting of services can be time-consuming and prone to human error. Automation not only ensures consistent uptime but also frees you up to focus on more critical tasks. By leveraging a bash script and systemd, you can create a reliable mechanism to monitor and restart services automatically.

Prerequisites

Before diving into the setup, ensure you have the following:

  1. A server running Debian 12 or a similar Linux distribution.
  2. Basic knowledge of bash scripting.
  3. Systemd installed on your server.

The Bash Script

Our script monitors specified services, restarts them if they fail, and sends email notifications using SendGrid. It also integrates with the OpenAI API to provide summarized log insights and suggested solutions.

Here’s the complete bash script:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
#!/bin/bash
# Configuration
SENDGRID_API_KEY="YOUR_SENDGRID_API_KEY"
EMAIL_FROM="your_email@example.com"
EMAIL_TO="recipient@example.com"
CHECK_INTERVAL=60 # in seconds
LOG_FILE="/var/log/service_monitor.log"
OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
# Function to log messages
log_message() {
local message=$1
echo "$(date '+%Y-%m-%d %H:%M:%S') - $message" >> $LOG_FILE
}
# Function to sanitize content
sanitize_content() {
local content=$1
echo "$content" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read())[1:-1])'
}
# Function to get a summary and suggested solution from OpenAI
get_openai_summary() {
local log_content=$1
local prompt="Summarize the following error log and suggest a solution:\n\n$log_content"
local payload=$(jq -n \
--arg model "gpt-3.5-turbo" \
--arg role1 "system" \
--arg content1 "You are an assistant that summarizes error logs and provides suggested solutions." \
--arg role2 "user" \
--arg content2 "$prompt" \
'{model: $model, messages: [{role: $role1, content: $content1}, {role: $role2, content: $content2}]}'
)
log_message "OpenAI payload: $payload"
local response=$(curl --silent --request POST \
--url https://api.openai.com/v1/chat/completions \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $OPENAI_API_KEY" \
--data "$payload")
log_message "OpenAI response: $response"
if [[ "$response" == *"choices"* ]]; then
local summary=$(echo "$response" | python3 -c 'import sys, json; print(json.load(sys.stdin)["choices"][0]["message"]["content"])' 2>> $LOG_FILE)
echo "$summary"
else
echo "Error: Failed to get a valid response from OpenAI."
log_message "Error: Failed to get a valid response from OpenAI. Response: $response"
fi
}
# Function to send an email using SendGrid
send_email() {
local subject=$1
local body=$2
log_message "Sending email with subject: $subject"
local payload=$(jq -n \
--arg email_to "$EMAIL_TO" \
--arg email_from "$EMAIL_FROM" \
--arg subject "$subject" \
--arg body "$body" \
'{personalizations: [{to: [{email: $email_to}]}], from: {email: $email_from}, subject: $subject, content: [{type: "text/html", value: $body}]}'
)
log_message "Email payload: $payload"
local response=$(curl --silent --request POST \
--url https://api.sendgrid.com/v3/mail/send \
--header "Authorization: Bearer $SENDGRID_API_KEY" \
--header 'Content-Type: application/json' \
--data "$payload")
log_message "SendGrid response: $response"
}
# Function to restart service and send emails based on the outcome
check_and_restart_service() {
local service=$1
local status
local attempt=0
local max_attempts=3
local wait_times=(30 60 80)
while [ $attempt -lt $max_attempts ]; do
if ! systemctl is-active --quiet $service; then
log_content=$(systemctl status $service --no-pager | head -c 1000)
dmesg_content=$(dmesg | tail -n 10 | head -c 1000)
sanitized_log_content=$(sanitize_content "$log_content")
sanitized_dmesg_content=$(sanitize_content "$dmesg_content")
log_message "$service is not active. Attempting to restart (Attempt $((attempt + 1)))..."
local openai_summary=$(get_openai_summary "$sanitized_log_content")
local email_body="<html>
<body>
<h2>The $service service is not active</h2>
<p><strong>Service Log:</strong></p>
<pre>$sanitized_log_content</pre>
<p><strong>dmesg Output:</strong></p>
<pre>$sanitized_dmesg_content</pre>
<p><strong>Summary and Suggested Solution:</strong></p>
<pre>$openai_summary</pre>
</body>
</html>"
send_email "$service Service Failed" "$email_body"
systemctl restart $service
sleep ${wait_times[$attempt]}
attempt=$((attempt + 1))
if systemctl is-active --quiet $service; then
log_message "$service has been restarted successfully after attempt $attempt."
local restored_body="<html>
<body>
<h2>The $service service has been restored</h2>
</body>
</html>"
send_email "$service Service Restored" "$restored_body"
break
fi
else
log_message "$service is running normally."
break
fi
done
if [ $attempt -eq $max_attempts ]; then
log_message "Failed to restart $service after $max_attempts attempts."
local failed_body="<html>
<body>
<h2>Failed to restart $service after $max_attempts attempts</h2>
<p><strong>Service Log:</strong></p>
<pre>$sanitized_log_content</pre>
<p><strong>dmesg Output:</strong></p>
<pre>$sanitized_dmesg_content</pre>
</body>
</html>"
send_email "$service Failed to Restart" "$failed_body"
fi
}
# Main monitoring loop
monitor_services() {
sleep 60 # Wait for 1 minute after reboot before starting checks
while true; do
check_and_restart_service "nginx"
check_and_restart_service "mariadb"
sleep $CHECK_INTERVAL # Wait before starting checks again
done
}
# Test email function
test_email() {
send_email "Test Email" "<html><body><h2>This is a test email to verify SendGrid settings.</h2></body></html>"
echo "Test email sent."
}
# Check command line argument
case "$1" in
start)
monitor_services
;;
test-email)
test_email
;;
*)
echo "Usage: $0 {start|test-email}"
;;
esac
#!/bin/bash # Configuration SENDGRID_API_KEY="YOUR_SENDGRID_API_KEY" EMAIL_FROM="your_email@example.com" EMAIL_TO="recipient@example.com" CHECK_INTERVAL=60 # in seconds LOG_FILE="/var/log/service_monitor.log" OPENAI_API_KEY="YOUR_OPENAI_API_KEY" # Function to log messages log_message() { local message=$1 echo "$(date '+%Y-%m-%d %H:%M:%S') - $message" >> $LOG_FILE } # Function to sanitize content sanitize_content() { local content=$1 echo "$content" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read())[1:-1])' } # Function to get a summary and suggested solution from OpenAI get_openai_summary() { local log_content=$1 local prompt="Summarize the following error log and suggest a solution:\n\n$log_content" local payload=$(jq -n \ --arg model "gpt-3.5-turbo" \ --arg role1 "system" \ --arg content1 "You are an assistant that summarizes error logs and provides suggested solutions." \ --arg role2 "user" \ --arg content2 "$prompt" \ '{model: $model, messages: [{role: $role1, content: $content1}, {role: $role2, content: $content2}]}' ) log_message "OpenAI payload: $payload" local response=$(curl --silent --request POST \ --url https://api.openai.com/v1/chat/completions \ --header "Content-Type: application/json" \ --header "Authorization: Bearer $OPENAI_API_KEY" \ --data "$payload") log_message "OpenAI response: $response" if [[ "$response" == *"choices"* ]]; then local summary=$(echo "$response" | python3 -c 'import sys, json; print(json.load(sys.stdin)["choices"][0]["message"]["content"])' 2>> $LOG_FILE) echo "$summary" else echo "Error: Failed to get a valid response from OpenAI." log_message "Error: Failed to get a valid response from OpenAI. Response: $response" fi } # Function to send an email using SendGrid send_email() { local subject=$1 local body=$2 log_message "Sending email with subject: $subject" local payload=$(jq -n \ --arg email_to "$EMAIL_TO" \ --arg email_from "$EMAIL_FROM" \ --arg subject "$subject" \ --arg body "$body" \ '{personalizations: [{to: [{email: $email_to}]}], from: {email: $email_from}, subject: $subject, content: [{type: "text/html", value: $body}]}' ) log_message "Email payload: $payload" local response=$(curl --silent --request POST \ --url https://api.sendgrid.com/v3/mail/send \ --header "Authorization: Bearer $SENDGRID_API_KEY" \ --header 'Content-Type: application/json' \ --data "$payload") log_message "SendGrid response: $response" } # Function to restart service and send emails based on the outcome check_and_restart_service() { local service=$1 local status local attempt=0 local max_attempts=3 local wait_times=(30 60 80) while [ $attempt -lt $max_attempts ]; do if ! systemctl is-active --quiet $service; then log_content=$(systemctl status $service --no-pager | head -c 1000) dmesg_content=$(dmesg | tail -n 10 | head -c 1000) sanitized_log_content=$(sanitize_content "$log_content") sanitized_dmesg_content=$(sanitize_content "$dmesg_content") log_message "$service is not active. Attempting to restart (Attempt $((attempt + 1)))..." local openai_summary=$(get_openai_summary "$sanitized_log_content") local email_body="<html> <body> <h2>The $service service is not active</h2> <p><strong>Service Log:</strong></p> <pre>$sanitized_log_content</pre> <p><strong>dmesg Output:</strong></p> <pre>$sanitized_dmesg_content</pre> <p><strong>Summary and Suggested Solution:</strong></p> <pre>$openai_summary</pre> </body> </html>" send_email "$service Service Failed" "$email_body" systemctl restart $service sleep ${wait_times[$attempt]} attempt=$((attempt + 1)) if systemctl is-active --quiet $service; then log_message "$service has been restarted successfully after attempt $attempt." local restored_body="<html> <body> <h2>The $service service has been restored</h2> </body> </html>" send_email "$service Service Restored" "$restored_body" break fi else log_message "$service is running normally." break fi done if [ $attempt -eq $max_attempts ]; then log_message "Failed to restart $service after $max_attempts attempts." local failed_body="<html> <body> <h2>Failed to restart $service after $max_attempts attempts</h2> <p><strong>Service Log:</strong></p> <pre>$sanitized_log_content</pre> <p><strong>dmesg Output:</strong></p> <pre>$sanitized_dmesg_content</pre> </body> </html>" send_email "$service Failed to Restart" "$failed_body" fi } # Main monitoring loop monitor_services() { sleep 60 # Wait for 1 minute after reboot before starting checks while true; do check_and_restart_service "nginx" check_and_restart_service "mariadb" sleep $CHECK_INTERVAL # Wait before starting checks again done } # Test email function test_email() { send_email "Test Email" "<html><body><h2>This is a test email to verify SendGrid settings.</h2></body></html>" echo "Test email sent." } # Check command line argument case "$1" in start) monitor_services ;; test-email) test_email ;; *) echo "Usage: $0 {start|test-email}" ;; esac
#!/bin/bash

# Configuration
SENDGRID_API_KEY="YOUR_SENDGRID_API_KEY"
EMAIL_FROM="your_email@example.com"
EMAIL_TO="recipient@example.com"
CHECK_INTERVAL=60  # in seconds
LOG_FILE="/var/log/service_monitor.log"
OPENAI_API_KEY="YOUR_OPENAI_API_KEY"

# Function to log messages
log_message() {
    local message=$1
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $message" >> $LOG_FILE
}

# Function to sanitize content
sanitize_content() {
    local content=$1
    echo "$content" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read())[1:-1])'
}

# Function to get a summary and suggested solution from OpenAI
get_openai_summary() {
    local log_content=$1
    local prompt="Summarize the following error log and suggest a solution:\n\n$log_content"

    local payload=$(jq -n \
        --arg model "gpt-3.5-turbo" \
        --arg role1 "system" \
        --arg content1 "You are an assistant that summarizes error logs and provides suggested solutions." \
        --arg role2 "user" \
        --arg content2 "$prompt" \
        '{model: $model, messages: [{role: $role1, content: $content1}, {role: $role2, content: $content2}]}'
    )

    log_message "OpenAI payload: $payload"

    local response=$(curl --silent --request POST \
        --url https://api.openai.com/v1/chat/completions \
        --header "Content-Type: application/json" \
        --header "Authorization: Bearer $OPENAI_API_KEY" \
        --data "$payload")

    log_message "OpenAI response: $response"

    if [[ "$response" == *"choices"* ]]; then
        local summary=$(echo "$response" | python3 -c 'import sys, json; print(json.load(sys.stdin)["choices"][0]["message"]["content"])' 2>> $LOG_FILE)
        echo "$summary"
    else
        echo "Error: Failed to get a valid response from OpenAI."
        log_message "Error: Failed to get a valid response from OpenAI. Response: $response"
    fi
}

# Function to send an email using SendGrid
send_email() {
    local subject=$1
    local body=$2

    log_message "Sending email with subject: $subject"

    local payload=$(jq -n \
        --arg email_to "$EMAIL_TO" \
        --arg email_from "$EMAIL_FROM" \
        --arg subject "$subject" \
        --arg body "$body" \
        '{personalizations: [{to: [{email: $email_to}]}], from: {email: $email_from}, subject: $subject, content: [{type: "text/html", value: $body}]}'    
    )

    log_message "Email payload: $payload"

    local response=$(curl --silent --request POST \
      --url https://api.sendgrid.com/v3/mail/send \
      --header "Authorization: Bearer $SENDGRID_API_KEY" \
      --header 'Content-Type: application/json' \
      --data "$payload")

    log_message "SendGrid response: $response"
}

# Function to restart service and send emails based on the outcome
check_and_restart_service() {
    local service=$1
    local status
    local attempt=0
    local max_attempts=3
    local wait_times=(30 60 80)

    while [ $attempt -lt $max_attempts ]; do
        if ! systemctl is-active --quiet $service; then
            log_content=$(systemctl status $service --no-pager | head -c 1000)
            dmesg_content=$(dmesg | tail -n 10 | head -c 1000)
            sanitized_log_content=$(sanitize_content "$log_content")
            sanitized_dmesg_content=$(sanitize_content "$dmesg_content")

            log_message "$service is not active. Attempting to restart (Attempt $((attempt + 1)))..."

            local openai_summary=$(get_openai_summary "$sanitized_log_content")

            local email_body="<html>
            <body>
                <h2>The $service service is not active</h2>
                <p><strong>Service Log:</strong></p>
                <pre>$sanitized_log_content</pre>
                <p><strong>dmesg Output:</strong></p>
                <pre>$sanitized_dmesg_content</pre>
                <p><strong>Summary and Suggested Solution:</strong></p>
                <pre>$openai_summary</pre>
            </body>
            </html>"

            send_email "$service Service Failed" "$email_body"

            systemctl restart $service
            sleep ${wait_times[$attempt]}
            attempt=$((attempt + 1))

            if systemctl is-active --quiet $service; then
                log_message "$service has been restarted successfully after attempt $attempt."

                local restored_body="<html>
                <body>
                    <h2>The $service service has been restored</h2>
                </body>
                </html>"

                send_email "$service Service Restored" "$restored_body"
                break
            fi
        else
            log_message "$service is running normally."
            break
        fi
    done

    if [ $attempt -eq $max_attempts ]; then
        log_message "Failed to restart $service after $max_attempts attempts."

        local failed_body="<html>
        <body>
            <h2>Failed to restart $service after $max_attempts attempts</h2>
            <p><strong>Service Log:</strong></p>
            <pre>$sanitized_log_content</pre>
            <p><strong>dmesg Output:</strong></p>
            <pre>$sanitized_dmesg_content</pre>
        </body>
        </html>"

        send_email "$service Failed to Restart" "$failed_body"
    fi
}

# Main monitoring loop
monitor_services() {
    sleep 60  # Wait for 1 minute after reboot before starting checks
    while true; do
        check_and_restart_service "nginx"
        check_and_restart_service "mariadb"
        sleep $CHECK_INTERVAL  # Wait before starting checks again
    done
}

# Test email function
test_email() {
    send_email "Test Email" "<html><body><h2>This is a test email to verify SendGrid settings.</h2></body></html>"
    echo "Test email sent."
}

# Check command line argument
case "$1" in
    start)
        monitor_services
        ;;
    test-email)
        test_email
        ;;
    *)
        echo "Usage: $0 {start|test-email}"
        ;;
esac

Setting Up the Script as a Service

To ensure our monitoring script runs continuously and restarts automatically if it fails, we’ll set it up as a systemd service. This way, it will start on boot and handle any unexpected restarts gracefully, it will also install necessary packages.

Create the following setup script:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
#!/bin/bash
# Configuration
SERVICE_MONITOR_SCRIPT="/path/to/your/service_monitor.sh"
SERVICE_MONITOR_DEST="/usr/local/bin/service_monitor.sh"
SERVICE_NAME="service_monitor"
SERVICE_FILE="/etc/systemd/system/${SERVICE_NAME}.service"
# Function to install necessary packages
install_packages() {
echo "Installing necessary packages..."
# Update the package list
apt-get update
# Install required packages
apt-get install -y curl python3 python3-pip systemd jq
# Install the OpenAI library if not already installed
pip3 install openai
echo "Packages installed successfully."
}
# Function to setup the service monitor
setup_service_monitor() {
echo "Setting up Service Monitor..."
# Copy the service monitor script to the destination
cp $SERVICE_MONITOR_SCRIPT $SERVICE_MONITOR_DEST
chmod +x $SERVICE_MONITOR_DEST
# Create the systemd service file
cat <<EOL > $SERVICE_FILE
[Unit]
Description=Service Monitor for Nginx and MariaDB
After=network.target
[Service]
ExecStart=$SERVICE_MONITOR_DEST start
Restart=always
User=root
[Install]
WantedBy=multi-user.target
EOL
# Reload systemd, enable and start the service
systemctl daemon-reload
systemctl enable $SERVICE_NAME
systemctl start $SERVICE_NAME
echo "Service Monitor has been set up and started."
}
# Function to remove the service monitor
remove_service_monitor() {
echo "Removing Service Monitor..."
# Stop and disable the service
systemctl stop $SERVICE_NAME
systemctl disable $SERVICE_NAME
# Remove the script and service file
rm -f $SERVICE_MONITOR_DEST
rm -f $SERVICE_FILE
# Reload systemd
systemctl daemon-reload
echo "Service Monitor has been removed."
}
# Check command line argument
if [ "$1" == "setup" ]; then
install_packages
setup_service_monitor
elif [ "$1" == "remove" ]; then
remove_service_monitor
elif [ "$1" == "test-email" ]; then
$SERVICE_MONITOR_DEST test-email
else
echo "Usage: $0 {setup|remove|test-email}"
fi
#!/bin/bash # Configuration SERVICE_MONITOR_SCRIPT="/path/to/your/service_monitor.sh" SERVICE_MONITOR_DEST="/usr/local/bin/service_monitor.sh" SERVICE_NAME="service_monitor" SERVICE_FILE="/etc/systemd/system/${SERVICE_NAME}.service" # Function to install necessary packages install_packages() { echo "Installing necessary packages..." # Update the package list apt-get update # Install required packages apt-get install -y curl python3 python3-pip systemd jq # Install the OpenAI library if not already installed pip3 install openai echo "Packages installed successfully." } # Function to setup the service monitor setup_service_monitor() { echo "Setting up Service Monitor..." # Copy the service monitor script to the destination cp $SERVICE_MONITOR_SCRIPT $SERVICE_MONITOR_DEST chmod +x $SERVICE_MONITOR_DEST # Create the systemd service file cat <<EOL > $SERVICE_FILE [Unit] Description=Service Monitor for Nginx and MariaDB After=network.target [Service] ExecStart=$SERVICE_MONITOR_DEST start Restart=always User=root [Install] WantedBy=multi-user.target EOL # Reload systemd, enable and start the service systemctl daemon-reload systemctl enable $SERVICE_NAME systemctl start $SERVICE_NAME echo "Service Monitor has been set up and started." } # Function to remove the service monitor remove_service_monitor() { echo "Removing Service Monitor..." # Stop and disable the service systemctl stop $SERVICE_NAME systemctl disable $SERVICE_NAME # Remove the script and service file rm -f $SERVICE_MONITOR_DEST rm -f $SERVICE_FILE # Reload systemd systemctl daemon-reload echo "Service Monitor has been removed." } # Check command line argument if [ "$1" == "setup" ]; then install_packages setup_service_monitor elif [ "$1" == "remove" ]; then remove_service_monitor elif [ "$1" == "test-email" ]; then $SERVICE_MONITOR_DEST test-email else echo "Usage: $0 {setup|remove|test-email}" fi
#!/bin/bash

# Configuration
SERVICE_MONITOR_SCRIPT="/path/to/your/service_monitor.sh"
SERVICE_MONITOR_DEST="/usr/local/bin/service_monitor.sh"
SERVICE_NAME="service_monitor"
SERVICE_FILE="/etc/systemd/system/${SERVICE_NAME}.service"

# Function to install necessary packages
install_packages() {
    echo "Installing necessary packages..."

    # Update the package list
    apt-get update

    # Install required packages
    apt-get install -y curl python3 python3-pip systemd jq

    # Install the OpenAI library if not already installed
    pip3 install openai

    echo "Packages installed successfully."
}

# Function to setup the service monitor
setup_service_monitor() {
    echo "Setting up Service Monitor..."

    # Copy the service monitor script to the destination
    cp $SERVICE_MONITOR_SCRIPT $SERVICE_MONITOR_DEST
    chmod +x $SERVICE_MONITOR_DEST

    # Create the systemd service file
    cat <<EOL > $SERVICE_FILE
[Unit]
Description=Service Monitor for Nginx and MariaDB
After=network.target

[Service]
ExecStart=$SERVICE_MONITOR_DEST start
Restart=always
User=root

[Install]
WantedBy=multi-user.target
EOL

    # Reload systemd, enable and start the service
    systemctl daemon-reload
    systemctl enable $SERVICE_NAME
    systemctl start $SERVICE_NAME

    echo "Service Monitor has been set up and started."
}

# Function to remove the service monitor
remove_service_monitor() {
    echo "Removing Service Monitor..."

    # Stop and disable the service
    systemctl stop $SERVICE_NAME
    systemctl disable $SERVICE_NAME

    # Remove the script and service file
    rm -f $SERVICE_MONITOR_DEST
    rm -f $SERVICE_FILE

    # Reload systemd
    systemctl daemon-reload

    echo "Service Monitor has been removed."
}

# Check command line argument
if [ "$1" == "setup" ]; then
    install_packages
    setup_service_monitor
elif [ "$1" == "remove" ]; then
    remove_service_monitor
elif [ "$1" == "test-email" ]; then
    $SERVICE_MONITOR_DEST test-email
else
    echo "Usage: $0 {setup|remove|test-email}"
fi

Setting Up the Service

  1. Save the Scripts: Save the monitoring script as service_monitor.sh and the setup script as setup_service_monitor.sh.
  2. Make Scripts Executable: Run chmod +x service_monitor.sh setup_service_monitor.sh.
  3. Run the Setup Script: Execute ./setup_service_monitor.sh setup to set up the service.

Monitoring and Testing

To ensure everything is working correctly, you can test the email functionality by running:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
./setup_service_monitor.sh test-email
./setup_service_monitor.sh test-email
./setup_service_monitor.sh test-email

This should send a test email to verify that your SendGrid settings are configured correctly.

Conclusion

You can expand the script to include additional services.

Automating service monitoring and restarting using a bash script and systemd can greatly enhance the reliability and uptime of your critical services. This method ensures consistent service availability and offers detailed logs and email notifications for improved tracking and troubleshooting.

For more details on systemd and bash scripting, check out these resources:

Have any suggestions or feedback? We’d love to hear from you! Please share your thoughts in the comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *