PromptsVault AI is thinking...
Searching the best prompts from our community
Searching the best prompts from our community
Prompts matching the #observability tag
Build centralized logging with ELK stack (Elasticsearch, Logstash, Kibana). Pipeline: 1. Filebeat agents on application servers. 2. Logstash for log parsing and enrichment. 3. Elasticsearch cluster for storage and indexing. 4. Kibana for visualization and search. 5. Index lifecycle management for retention. 6. Alerting on error patterns. 7. Log correlation across services. Use structured logging (JSON). Include security (authentication, encryption) and performance tuning (sharding, replicas).
Set up comprehensive monitoring with Prometheus and Grafana. Components: 1. Prometheus server with service discovery. 2. Node Exporter for system metrics. 3. Application instrumentation with custom metrics. 4. Alertmanager for notifications (PagerDuty, Slack). 5. Grafana dashboards for visualization (RED metrics, resource usage). 6. Recording rules for aggregations. 7. Alert rules for SLO violations. Use Docker Compose for local setup. Include retention policies and high-availability configuration.
Build comprehensive monitoring and observability infrastructure for production systems. Monitoring stack architecture: 1. Metrics: Prometheus for collection, Grafana for visualization, 15-second scrape intervals. 2. Logging: ELK Stack (Elasticsearch, Logstash, Kibana) or EFK (Fluentd instead of Logstash). 3. Tracing: Jaeger for distributed tracing, OpenTelemetry for instrumentation. 4. Alerting: AlertManager for routing, PagerDuty for escalation. Key metrics to monitor: 1. Infrastructure: CPU (>80% alert), memory (>85%), disk space (>90%), network I/O. 2. Application: response time (<200ms target), error rate (<0.1%), throughput (requests/second). 3. Business: user signups, conversion rates, revenue metrics, feature usage. Alerting best practices: 1. Alert fatigue prevention: meaningful alerts only, proper severity levels (critical/warning/info). 2. Runbook automation: automated remediation for common issues, escalation procedures. 3. On-call rotation: 7-day rotations, primary/secondary coverage, fair distribution. Dashboard design: 1. Golden signals: latency, traffic, errors, saturation for each service. 2. SLA monitoring: 99.9% uptime target, error budget tracking, service level indicators. Log management: structured logging (JSON), log retention policies (90 days), centralized aggregation with filtering.
Debug LLM applications with LangSmith. Features: 1. Trace every LLM call. 2. View chain execution steps. 3. Latency and token analysis. 4. Error tracking and debugging. 5. Dataset creation from logs. 6. Evaluation and testing. 7. Feedback collection. 8. Cost monitoring. Essential for production LLM apps. Use to identify bottlenecks and optimize prompts.