PromptsVault AI is thinking...
Searching the best prompts from our community
Searching the best prompts from our community
Top-rated prompts for Data Science
Build an interactive real-time analytics dashboard. Tech stack: 1. Use Plotly Dash for the web framework. 2. Implement WebSocket connections for live data streaming. 3. Create responsive charts (time series, heatmaps, scatter plots). 4. Add filtering and date range selectors. 5. Implement data aggregation with Pandas for performance. 6. Use Redis for caching frequently accessed metrics. 7. Add export functionality (CSV, PDF reports). 8. Implement role-based access control. Include dark mode toggle and mobile responsiveness.
Build production churn prediction system. Pipeline: 1. Perform exploratory data analysis and visualization. 2. Engineer features (RFM, engagement scores, usage patterns). 3. Handle class imbalance with SMOTE or class weights. 4. Train multiple models (XGBoost, Random Forest, Neural Network). 5. Implement cross-validation and hyperparameter tuning. 6. Create SHAP values for model interpretability. 7. Build prediction API with FastAPI. 8. Set up monitoring for model drift. Include feature importance analysis and business impact metrics.
Implement GA4 tracking plan. Setup: 1. Configure improved measurement events. 2. Create custom dimensions and metrics. 3. Set up conversion events with value assignment. 4. Implement User-ID for cross-device tracking. 5. Create exploration reports (funnel, path analysis). 6. Link with Google Ads and BigQuery. 7. Enable data retention settings. 8. Debug with Tag Assistant. Include consent mode v2 implementation.
Optimize Pandas data processing pipeline. Techniques: 1. Vectorize operations (avoid loops). 2. Use appropriate data types (int8, category). 3. Process large datasets with chunking. 4. Parallelize processing with Dask or Swifter. 5. Efficient file formats (Parquet/Feather). 6. Memory usage profiling. 7. Index optimization for merging. 8. Caching intermediate results. Include benchmark comparisons.
Reduce BigQuery costs for a high-volume analytics workload. Strategies: 1. Partition tables by date and cluster by high-cardinality columns. 2. Use materialized views for frequently-run aggregations. 3. Implement query result caching and avoid SELECT *. 4. Set up cost controls with custom quotas per project. 5. Analyze query execution plans to identify expensive operations. Provide before/after cost comparison and ROI calculation. Include monitoring dashboard for ongoing cost tracking.
Build an anomaly detection system for transaction fraud. Approach: 1. Use Isolation Forest for unsupervised outlier detection. 2. Engineer features (transaction amount, time of day, location distance). 3. Set contamination parameter based on historical fraud rate. 4. Generate anomaly scores and flag top 1% as suspicious. 5. Create alerting system with precision/recall monitoring. Visualize anomalies on a scatter plot with decision boundary. Balance false positives vs fraud detection.
Design a dbt (data build tool) project for analytics engineering. Structure: 1. Staging models (raw data cleaning). 2. Intermediate models (business logic transformations). 3. Mart models (final aggregated tables). 4. Tests for data quality (unique, not_null, relationships). 5. Documentation with schema.yml and descriptions. Implement incremental models for large tables and use Jinja macros for reusable logic. Include CI/CD integration.
Create a production-quality Jupyter notebook template. Structure: 1. Markdown header with title, author, date, and objective. 2. Table of contents with anchor links. 3. Environment setup cell (imports, configs, random seed). 4. Exploratory Data Analysis section with visualizations. 5. Modeling section with clear train/test split. 6. Results summary with key metrics and business recommendations. Use consistent styling, hide warnings, and include inline documentation.
Analyze user retention using cohort analysis. Deliverables: 1. Cohort table showing retention % by signup month. 2. Heatmap visualization with color gradient (green=high, red=low). 3. Retention curves comparing different cohorts. 4. Calculate Day 1, Day 7, Day 30 retention rates. 5. Identify trends and anomalies with statistical annotations. Use Python (pandas, seaborn) or SQL + BI tool. Include actionable insights for product team.
Build an automated data quality monitoring system. Checks to implement: 1. Completeness (null percentage per column). 2. Uniqueness (duplicate detection). 3. Validity (regex patterns, range checks). 4. Timeliness (data freshness alerts). 5. Consistency (cross-table referential integrity). Create a dashboard showing quality scores over time with alerting for threshold breaches. Use Great Expectations or custom Python validators.
Create advanced features for a churn prediction model. Techniques: 1. Temporal features (days since last purchase, purchase frequency). 2. Aggregations (total spend, average order value). 3. Categorical encoding (one-hot, target encoding). 4. Interaction features (tenure × monthly charges). 5. Feature selection using mutual information and correlation analysis. Document feature importance and business rationale for each engineered feature.
Architect a real-time data pipeline using Apache Kafka. Components: 1. Producer sending clickstream events (JSON). 2. Kafka topic with 3 partitions for scalability. 3. Consumer group processing events in parallel. 4. Stream processing with Kafka Streams for aggregations. 5. Sink connector to write to Elasticsearch. Include error handling, exactly-once semantics, and monitoring with Kafka lag metrics.
Perform RFM (Recency, Frequency, Monetary) customer segmentation. Process: 1. Calculate RFM scores from transaction data. 2. Normalize features using StandardScaler. 3. Determine optimal K using elbow method and silhouette score. 4. Apply K-means clustering (4-6 segments). 5. Profile each segment with descriptive statistics and business labels (Champions, At-Risk, Lost). Visualize clusters using PCA 2D projection.
Build a time series forecasting model using Facebook Prophet. Steps: 1. Prepare historical sales data with daily granularity. 2. Add custom seasonality for Black Friday and holiday peaks. 3. Include external regressors (marketing spend, weather). 4. Generate 90-day forecast with uncertainty intervals. 5. Validate model using cross-validation and MAPE metric. Visualize actual vs predicted with interactive Plotly charts.
Design a production-grade Airflow DAG for daily ETL. Workflow: 1. Extract data from PostgreSQL and REST API. 2. Transform using pandas (clean, join, aggregate). 3. Load to data warehouse (Snowflake/BigQuery). 4. Send Slack notification on success/failure. 5. Implement retry logic and SLA monitoring. Use TaskGroups for organization, XComs for data passing, and proper error handling with callbacks.
Create a statistical analysis tool for A/B test results. Features: 1. Calculate conversion rate for Control vs Variant. 2. Compute p-value using two-proportion z-test. 3. Determine statistical significance at 95% confidence level. 4. Calculate required sample size for desired power (80%). 5. Visualize confidence intervals with error bars. Include interpretation guidelines for non-technical stakeholders.
Optimize a slow-running SQL query on a 50M+ row table. Techniques to apply: 1. Add appropriate indexes on WHERE and JOIN columns. 2. Replace subqueries with CTEs (Common Table Expressions). 3. Use EXPLAIN ANALYZE to identify bottlenecks. 4. Partition large tables by date for faster scans. 5. Rewrite correlated subqueries as window functions. Provide before/after execution times and explain each optimization decision.
Design a C-suite executive dashboard in Tableau. Layout: 1. KPI cards showing YoY growth for Revenue, Profit, Customer Count. 2. Geographic heat map of sales by region. 3. Trend line chart with forecast band for next quarter. 4. Top 10 products bar chart with drill-down capability. 5. Interactive filters for date range and business unit. Use a professional color palette (blues/grays) and ensure mobile responsiveness.
Build a robust data cleaning pipeline for a messy CSV dataset. Requirements: 1. Handle missing values using forward-fill, backward-fill, and mean imputation strategies. 2. Detect and remove outliers using IQR method. 3. Standardize date formats across multiple columns. 4. Remove duplicate rows based on composite keys. 5. Generate a data quality report showing before/after statistics. Use pandas best practices with method chaining for readability.