DevOps & Skalierung

ELK-Stack für die CAPTCHA-Lösungsprotokollanalyse

Wenn Ihre CAPTCHA-Pipeline Tausende von Aufgaben verarbeitet, lässt sich grep nicht skalieren. Mit dem ELK-Stack (Elasticsearch, Logstash, Kibana) können Sie Lösungsprotokolle durchsuchen, aggregieren und visualisieren – Fehlermuster finden, Latenztrends verfolgen und Probleme in Sekundenschnelle diagnostizieren.

Architektur

[CAPTCHA Workers] → JSON logs → [Filebeat] → [Logstash] → [Elasticsearch]
                                                                ↓
                                                           [Kibana]

Strukturierte Protokollierung

Python – JSON-Protokollausgabe

import os
import json
import time
import logging
import sys
import requests

API_KEY = os.environ["CAPTCHAAI_API_KEY"]


class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_entry = {
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
        }
        # Add extra fields
        if hasattr(record, "captcha_id"):
            log_entry["captcha_id"] = record.captcha_id
        if hasattr(record, "captcha_type"):
            log_entry["captcha_type"] = record.captcha_type
        if hasattr(record, "solve_time"):
            log_entry["solve_time"] = record.solve_time
        if hasattr(record, "error_code"):
            log_entry["error_code"] = record.error_code
        if hasattr(record, "target_url"):
            log_entry["target_url"] = record.target_url
        if hasattr(record, "poll_count"):
            log_entry["poll_count"] = record.poll_count
        return json.dumps(log_entry)


# Configure logger
logger = logging.getLogger("captchaai")
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(JSONFormatter())
logger.addHandler(handler)

session = requests.Session()


def solve_captcha(sitekey, pageurl, captcha_type="recaptcha_v2"):
    extra = {"captcha_type": captcha_type, "target_url": pageurl}

    # Submit
    resp = session.post("https://ocr.captchaai.com/in.php", data={
        "key": API_KEY,
        "method": "userrecaptcha",
        "googlekey": sitekey,
        "pageurl": pageurl,
        "json": 1
    })
    data = resp.json()

    if data.get("status") != 1:
        logger.error("Submit failed", extra={
            **extra, "error_code": data.get("request")
        })
        return {"error": data.get("request")}

    captcha_id = data["request"]
    extra["captcha_id"] = captcha_id
    logger.info("Task submitted", extra=extra)

    # Poll
    start = time.time()
    poll_count = 0
    for _ in range(60):
        time.sleep(5)
        poll_count += 1
        result = session.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": captcha_id, "json": 1
        }).json()

        if result.get("status") == 1:
            elapsed = round(time.time() - start, 2)
            logger.info("Solve success", extra={
                **extra,
                "solve_time": elapsed,
                "poll_count": poll_count
            })
            return {"solution": result["request"]}

        if result.get("request") != "CAPCHA_NOT_READY":
            logger.error("Solve failed", extra={
                **extra,
                "error_code": result.get("request"),
                "poll_count": poll_count
            })
            return {"error": result.get("request")}

    logger.error("Solve timeout", extra={
        **extra,
        "error_code": "TIMEOUT",
        "poll_count": poll_count
    })
    return {"error": "TIMEOUT"}

JavaScript – Strukturierte Protokollierung

const axios = require("axios");

const API_KEY = process.env.CAPTCHAAI_API_KEY;

function log(level, message, fields = {}) {
  const entry = {
    timestamp: new Date().toISOString(),
    level,
    message,
    service: "captcha-worker",
    ...fields,
  };
  console.log(JSON.stringify(entry));
}

async function solveCaptcha(sitekey, pageurl, captchaType = "recaptcha_v2") {
  const fields = { captchaType, targetUrl: pageurl };

  const submitResp = await axios.post("https://ocr.captchaai.com/in.php", null, {
    params: {
      key: API_KEY, method: "userrecaptcha",
      googlekey: sitekey, pageurl, json: 1,
    },
  });

  if (submitResp.data.status !== 1) {
    log("error", "Submit failed", { ...fields, errorCode: submitResp.data.request });
    return { error: submitResp.data.request };
  }

  const captchaId = submitResp.data.request;
  fields.captchaId = captchaId;
  log("info", "Task submitted", fields);

  const startTime = Date.now();
  let pollCount = 0;

  for (let i = 0; i < 60; i++) {
    await new Promise((r) => setTimeout(r, 5000));
    pollCount++;

    const pollResp = await axios.get("https://ocr.captchaai.com/res.php", {
      params: { key: API_KEY, action: "get", id: captchaId, json: 1 },
    });

    if (pollResp.data.status === 1) {
      const solveTime = ((Date.now() - startTime) / 1000).toFixed(2);
      log("info", "Solve success", { ...fields, solveTime: parseFloat(solveTime), pollCount });
      return { solution: pollResp.data.request };
    }

    if (pollResp.data.request !== "CAPCHA_NOT_READY") {
      log("error", "Solve failed", { ...fields, errorCode: pollResp.data.request, pollCount });
      return { error: pollResp.data.request };
    }
  }

  log("error", "Solve timeout", { ...fields, errorCode: "TIMEOUT", pollCount });
  return { error: "TIMEOUT" };
}

module.exports = { solveCaptcha };

Filebeat-Konfiguration

# filebeat.yml
filebeat.inputs:

  - type: log
    paths:

      - /var/log/captcha-worker/*.log
    json:
      keys_under_root: true
      add_error_key: true
      message_key: message

output.logstash:
  hosts: ["logstash:5044"]

Logstash-Pipeline

# logstash-captcha.conf
input {
  beats {
    port => 5044
  }
}

filter {
  # Parse JSON logs
  json {
    source => "message"
    target => "captcha"
  }

  # Add computed fields
  if [captcha][solve_time] {
    mutate {
      add_field => {
        "solve_time_bucket" => "fast"
      }
    }
    if [captcha][solve_time] > 30 {
      mutate { update => { "solve_time_bucket" => "medium" } }
    }
    if [captcha][solve_time] > 90 {
      mutate { update => { "solve_time_bucket" => "slow" } }
    }
  }

  # Extract date
  date {
    match => ["[captcha][timestamp]", "ISO8601"]
    target => "@timestamp"
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "captcha-logs-%{+YYYY.MM.dd}"
  }
}

Elasticsearch-Indexvorlage

{
  "index_patterns": ["captcha-logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0
    },
    "mappings": {
      "properties": {
        "captcha_type": { "type": "keyword" },
        "captcha_id": { "type": "keyword" },
        "error_code": { "type": "keyword" },
        "solve_time": { "type": "float" },
        "poll_count": { "type": "integer" },
        "target_url": { "type": "keyword" },
        "level": { "type": "keyword" },
        "message": { "type": "text" }
      }
    }
  }
}

Kibana-Dashboard-Panels

Panel Visualisierung Abfrage
Erfolgsquote lösen Metrisch level:info AND message:"Solve success" / insgesamt
Fehleraufschlüsselung Kreisdiagramm level:error gruppiert nach error_code
Latenz im Laufe der Zeit Liniendiagramm Durchschnittlicher solve_time im Zeitverlauf
Fehler im Laufe der Zeit Balkendiagramm Zählen Sie level:error pro 5-Minuten-Bucket
Am langsamsten gelöst Datentabelle Top 10 nach solve_time absteigend
Warteschlangenaktivität Flächendiagramm Anzahl nach message („Aufgabe eingereicht“ vs. „Erfolg gelöst“)

Nützliche Abfragen

# All errors in the last hour
level:error AND @timestamp:[now-1h TO now]

# Timeout errors for reCAPTCHA
error_code:TIMEOUT AND captcha_type:recaptcha_v2

# Slow solves (> 60 seconds)
solve_time:>60

# Errors for a specific target URL
level:error AND target_url:"example.com"

# Specific CAPTCHA ID investigation
captcha_id:"73519847"

Fehlerbehebung

Problem Ursache Lösung
## Fehlerbehebung
Problem Ursache Lösung
Logs erscheinen nicht in Kibana Logstash-Pipeline falsch konfiguriert Logstash-Konfiguration prüfen und Input/Output-Pfade validieren
Hohe Latenz bei Log-Ingestion Elasticsearch überlastet Index-Shards optimieren und Bulk-Writes verwenden
Fehlende Felder in CAPTCHA-Logs Strukturiertes Logging nicht implementiert JSON-Logging mit Pflichtfeldern (solver, latenz, fehlercode) einführen

Verwandte Leitfäden

  • Strukturierte Protokollierung für CAPTCHA-Vorgänge
  • Grafana-Dashboard-Vorlagen für CaptchaAI
  • CAPTCHA-Lösungsraten mit Prometheus/Grafana überwachen
Kommentare sind für diesen Artikel deaktiviert.