Loading project...
# nvrbot-scraper
**Status:** š¢ Active
**Phase:** Daily Scraping + Cleanup Pipeline Implemented
**Last Activity:** 2026-03-11
---
## Linear Metadata
**Project ID:** `17c6d70c-e194-41f7-a7e9-644fd91a60b8`
**Team ID:** `96b685fe-2252-47c5-97ee-273d8c484942`
**Last Synced:** `2026-02-06T13:21:08.582Z`
## Overview
Web scraper for extracting class schedule data from competitor wellness studios (sauna, cold plunge, meditation). Used for NVRMND competitive analysis ā tracking capacity, instructors, fill rates, class types.
**Source Code:** `/home/john/projects/superscaper`
---
## Scraper Status (2026-03-27)
**Last Update:** 2026-03-27 19:51 PST
**Status:** ā
**ALL CURRENT - Daily Scraping Active**
**Mode:** Daily forward scraping (nightly at 11pm PT) with auto-merge to master.json
| Studio | Last Scraped | Status | Notes |
|--------|--------------|--------|-------|
| BD_CARLSBAD | 2026-03-27 | ā
Current | Backfill complete Nov 2024 ā Mar 2026 |
| OS_ADELAIDE | 2026-03-27 | ā
Current | Backfill complete (split from OS_TORONTO) |
| OS_YORKVILLE | 2026-03-27 | ā
Current | Backfill complete (split from OS_TORONTO) |
| OS_FLATIRON | 2026-03-27 | ā
Current | Backfill complete Nov 2024 ā Mar 2026 |
| BD_LIBERTY | 2026-03-27 | ā
Current | Backfill complete Dec 2025 ā Mar 2026 |
| OS_WILLIAMSBURG | 2026-03-27 | ā
Current | Backfill complete Nov 2024 ā Mar 2026 |
### Daily Scraping
**Last successful scrape:** 2026-03-27 19:51:00 PST
**Records collected:** 121 new records (100% valid, 0 errors)
**Output file:** `nvrbot_scrape_20260327.json`
**Errors:** None
**Breakdown by studio:**
- BD Carlsbad: 22 classes
- BD Liberty Station: 20 classes
- OtherShip Adelaide: 17 classes
- OtherShip Flatiron: 21 classes
- OtherShip Williamsburg: 23 classes
- OtherShip Yorkville: 18 classes
### Master Data
**Location:** `/home/john/projects/superscaper/processed/master.json`
**Total records:** ~99,729
**Date range:** 2021-07-05 ā 2026-02-14
**File size:** ~78.9 MB
**Last merged:** 2026-02-14 23:02 PST
**Auto-merge:** ā
Enabled (runs after each scrape)
**Removed:** MZ_MYRTLE (no classes found on platform), ST_YONGE, ST_FRONT (no availability data exposed)
**Spot Check Results (Jan 20 - Feb 2, 2026):**
- 1,730 records scraped across 6 studios
- 100% validation pass rate (0 errors)
- Adelaide/Yorkville separation confirmed
- Output format verified
**Critical Bug Fixed (2026-02-03 18:20):**
- ā ļø **Data loss bug:** Scraper was overwriting same file on multiple runs per day
## Data Locations (Complete Inventory)
### Master Data (Single Source of Truth)
| File | Records | Date Range | Size | Last Updated |
|------|---------|------------|------|--------------|
| **`processed/master.json`** | **99,488** | **2021-07-05 ā 2026-02-13** | **78.8 MB** | **2026-02-14 09:52 PST** |
ā **This is the canonical local dataset** ā auto-updated nightly after each scrape.
### Daily Scrape Files (superscaper directory)
| File | Records | Date Range | Notes |
|------|---------|------------|-------|
| `nvrbot_scrape_20260213.json` | 392 | Feb 11-13, 2026 | Latest daily scrape |
| `nvrbot_scrape_20241106.json` | 13,838 | May 2 ā Nov 4, 2024 | Historical backfill |
| `nvrbot_scrape_20260204.json` | 35,679 | Jan 1 ā Feb 3, 2026 | 2025 full year backfill |
**Path:** `/home/john/projects/superscaper/`
### Legacy Files (Historical)
| File | Records | Date Range | Notes |
|------|---------|------------|-------|
| `BDfull.json` | 4,099 | May 2 ā Nov 5, 2024 | BD legacy export |
| `OSfull.json` | 5,679 | May 2 ā Nov 5, 2024 | OS legacy export |
**Note:** Legacy files superseded by `processed/master.json`
### Supabase (Production Database)
**Database:** `classes` table at `ootepdsivzlhqhaielor.supabase.co`
**Auto-sync:** ā
Enabled (upsert after each nightly scrape)
**Deduplication:** Composite key (studio_id + class_date + time + class)
**Current records:** ~99,488+ (auto-updated nightly)
### Google Drive - NVRMND Central (Historical/Deprecated)
**Shared Drive:** `NVRMND Central` (ID: `0AEKvqTBXqb6XUk9PVA`)
**Location:** `Data Dump > JG > 04 Data & Exports > Active`
| File | Type | ID | Records | Notes |
|------|------|-----|---------|-------|
| **nvrbot** | Google Sheet | `1oE9MkZzYA4FqODqi50cBQ5GlqBWYRHDoGXVm9tJIEGI` | 47,832 | ā ļø **Not auto-updated** (historical) |
| nvrbot_scrape_20241106.csv | CSV | `1hV5HUKt0M1DVyKH2qz2u3igJKLbFpKDi` | 13,838 | Raw backup |
| BDfull | Google Sheet | `1OzW03kQu6SHsVrXdJQmj4qQZzxO_5X-QUj6xdf3rU_s` | ā | BD cleaned |
| OSfull.csv | CSV | `1BrFc6-1dBUe36OD-IWLffBpf7wQScaaE` | 5,679 | OS raw |
| MZfull.csv | CSV | `12y6Q3iIeCbyus-YN36amqW_Yvbx0oGID` | ~2,500 | MZ raw (Jun 2024) |
**Note:** Google Sheets are no longer auto-updated. Use Supabase or `processed/master.json` for current data.
## Existing Cleanup Schema (from nvrbot Google Sheet)
### Sheet Structure
| Tab | Rows | Purpose |
|-----|------|---------|
| **fullData** | 47,832 | Main aggregated cleaned data |
| **OS done** | 16,117 | OtherShip cleaned (filter: excludes Yorkville) |
| **BD done** | 25,825 | Breathe Degrees cleaned (filter: excludes Free Flow/Social) |
| **MZ done** | 3,076 | MindZero cleaned |
| **nvrbot inputs** | 1,007 | Raw input staging |
| **locationMap** | 7 | Location normalization lookup |
| **timeMap** | ~100 | Time ā military time lookup |
| **UPDATE** | 11,044 | Recent update staging |
| **new data** | 1,903 | New data staging |
### Cleaned Schema (fullData)
| Column | Type | Example | Notes |
|--------|------|---------|-------|
| company | string | "Othership" | Normalized company name |
| classDate | string | "01-31-2022" | MM-DD-YYYY format |
| day | string | "Monday" | Day of week |
| time | string | "3:00 PM" | 12-hour format |
| duration | string | "75" | Minutes (as string) |
| class | string | "Free Flow" | Class name |
| location | string | "Adelaide" | Raw location name |
| type | string | "Free Flow" | Class type category |
| fill | string | "14/17 Open" | Combined availability string |
| classStatus | string | "" | Status field |
| open | string | "14" | Open spots (as string) |
| total | string | "17" | Total spots (as string) |
| filled | string | "3" | Filled spots (as string) |
| instructor | string | "Othership Guide" | Primary instructor |
| instructor2 | string | "" | Secondary instructor |
| instructor3 | string | "" | Third instructor |
| room | string | "Sauna" | Room name |
| url | string | "https://..." | Source URL |
| yoga | string | "" | Yoga class flag |
| saunadotcnt | string | "1" | Counter field |
### Lookup Tables
**locationMap:**
| locationID | location |
|------------|----------|
| Adelaide | OtherShip - Adelaide |
| Yorkville | OtherShip - Yorkville |
| Carlsbad Studio | Breathe Degrees - Carlsbad |
| Myrtle Beach | Mind Zero - Myrtle Beach |
| Mt. Pleasant | Mind Zero - Mt. Pleasant |
| Flatiron | OtherShip - Flatiron |
**timeMap:**
| time | milTime | hour |
|------|---------|------|
| 1:00 PM | 1300 | 13 |
| 10:00 AM | 1000 | 10 |
| ... | ... | ... |
---
## Second-Level Cleanup Spec (DRAFT)
### Goals
1. **Deduplicate** raw data (re-use scraper's deduplication logic)
2. **Normalize** raw scraper output to consistent schema
3. **Match** existing Google Sheet format for continuity
4. **Automate** what was previously done manually
5. **Add** new computed fields for analysis
**Note on Deduplication:** The cleanup pipeline should use the same `generate_record_id()` and `deduplicate_records()` functions from `src/main.py`. This ensures consistency between scraper and cleanup deduplication logic.
### Input Format (Raw Scraper Output)
```json
{
"time": "8:00 AM",
"duration": "75",
"location": "Williamsburg",
"class": "Guided Down: Senses",
"instructor": "Becca Jacobs",
"room": "Sauna",
"classDate": "01-31-2026",
"day": "Saturday",
"open_spots": 58,
"total_spots": 64,
"filled_spots": 6,
"status": "open",
"type": "Class",
"company": "OtherShip",
"url": "https://..."
}
```
### Output Format (Cleaned)
**Option A: Match Existing Google Sheet Schema**
```json
{
"company": "Othership",
"classDate": "01-31-2026",
"day": "Saturday",
"time": "8:00 AM",
"duration": "75",
"class": "Guided Down: Senses",
"location": "Williamsburg",
"type": "Guided",
"fill": "58/64 Open",
"classStatus": "open",
"open": "58",
"total": "64",
"filled": "6",
"instructor": "Becca Jacobs",
"instructor2": "",
"instructor3": "",
"room": "Sauna",
"url": "https://...",
"yoga": "",
"saunadotcnt": ""
}
```
**Option B: Enhanced Schema (New)**
```json
{
"id": "OS_WILLIAMSBURG_2026-01-31_0800",
"studio_id": "OS_WILLIAMSBURG",
"company": "OtherShip",
"location_normalized": "OtherShip - Williamsburg",
"date_iso": "2026-01-31",
"date_display": "01-31-2026",
"day_of_week": "Saturday",
"time_12h": "8:00 AM",
"time_24h": "08:00",
"hour": 8,
"duration_min": 75,
"class_name": "Guided Down: Senses",
"class_type": "Guided",
"class_category": "Sauna",
"instructor": "Becca Jacobs",
"instructor_normalized": "becca_jacobs",
"instructor2": null,
"instructor3": null,
"room": "Sauna",
"open_spots": 58,
"total_spots": 64,
"filled_spots": 6,
"fill_rate": 0.094,
"fill_display": "58/64 Open",
"status": "open",
"is_waitlist": false,
"is_full": false,
"url": "https://...",
"scraped_at": "2026-02-01T16:23:00Z"
}
```
### Transformations Required
**Pipeline order:** (Run in sequence)
| Step | Transformation | Complexity | Notes |
|------|----------------|------------|-------|
| **0. Deduplication** | Remove duplicate records | Low | **FIRST STEP** - Use scraper's `generate_record_id()` logic |
| **1. Company** | Normalize capitalization | Low | "OtherShip" ā "Othership" |
| **2. Date** | Parse MM-DD-YYYY ā ISO | Low | "01-31-2026" ā "2026-01-31" |
| **3. Time** | Parse to 24h, extract hour | Low | "8:00 AM" ā "08:00", hour: 8 |
| **4. Duration** | String ā int | Low | "75" ā 75 |
| **5. Location** | Lookup ā normalized name | Medium | "Williamsburg" ā "OtherShip - Williamsburg" |
| **6. Type** | Extract from class name | Medium | "Guided Down" ā "Guided" |
| **7. Instructor** | Split multiple, normalize | Medium | "Arkaya \| Elly" ā instructor, instructor2 |
| **8. Fill rate** | Calculate filled/total | Low | 6/64 ā 0.094 |
| **9. Class category** | Infer from room/class name | Medium | room: "Sauna" ā category: "Sauna" |
| **10. Yoga flag** | Pattern match class name | Low | "Yoga Flow" ā yoga: "Y" |
**Critical:** Deduplication MUST run first to prevent duplicate data in cleaned output. Subsequent steps operate on unique records only.
### Class Type Taxonomy (Needs Validation)
Based on existing data patterns:
| Type | Pattern | Examples |
|------|---------|----------|
| **Free Flow** | "Free Flow", "Open" | Open sessions, self-guided |
| **Guided** | "Guided Down", "Guided Up", "Guided All Around" | Instructor-led sauna sessions |
| **Class** | Specific class names | Yoga, HIIT, Breathwork |
| **Social** | "Social" | Community events |
| **Private** | "Private" | Private bookings |
| **Online** | "Online", "Virtual" | Remote classes |
### Instructor Normalization
**Challenges:**
- Multiple instructors in one field: "Arkaya | Elly Ball"
- Generic names: "Othership Guide", "Free Flow Guide"
- Inconsistent formatting: "BECCA JACOBS" vs "Becca Jacobs"
**Approach:**
1. Split on ` | ` delimiter
2. Title case normalization
3. Generate slug: "becca_jacobs"
4. Map generic guides to company defaults
---
## Possible Cleanup Approaches
### Approach 1: Python Script (Recommended)
**Pros:**
- Full control over transformations
- Can run locally or in CI/CD
- Easy to version control
- Can output to multiple formats
**Cons:**
- Need to maintain code
- Separate from Google Sheets workflow
**Implementation:**
```
/projects/nvrbot-scraper/
āāā cleanup/
ā āāā transform.py # Main transformation logic
ā āāā lookups.py # Location/time mappings
ā āāā validators.py # Data quality checks
ā āāā output.py # JSON/CSV/Sheets export
```
### Approach 2: Google Sheets Formulas
**Pros:**
- Matches existing workflow
- Non-technical users can modify
- Real-time updates
**Cons:**
- Complex formulas hard to maintain
- Performance issues with large datasets
- Version control difficult
**Implementation:**
- Import raw CSV to "inputs" tab
- Use VLOOKUP for location mapping
- Use formulas to compute derived fields
- Copy-paste values to "done" tabs
### Approach 3: Hybrid (Script + Sheets)
**Pros:**
- Best of both worlds
- Script handles heavy lifting
- Sheets for final review/adjustments
**Cons:**
- Two systems to maintain
- Data sync complexity
**Implementation:**
1. Python script transforms raw ā cleaned JSON
2. Script uploads to staging sheet
3. Manual review in Sheets
4. Append to master sheet
### Approach 4: Database Pipeline
**Pros:**
- SQL for analysis
- Scales well
- Can power dashboards
**Cons:**
- More infrastructure
- Overkill for current volume
**Implementation:**
- SQLite or Postgres
- Raw ā staging ā clean tables
- Views for analysis
---
## Recommended Approach
**Hybrid (Approach 3)** with Python script + Google Sheets integration:
1. **Script** does:
- Load raw JSON/CSV
- **Deduplicate records first** (re-use scraper's logic from `src/main.py`)
- Apply all transformations
- Validate data quality
- Output cleaned JSON + CSV
- Optionally push to Google Sheets staging tab
2. **Google Sheets** for:
- Visual review
- Manual corrections
- Final append to master data
- Pivot tables and analysis
3. **Automation** via:
- Cron job for daily scrape
- Cron job for daily cleanup
- Alert on data quality issues
---
## Cleanup Pipeline Implementation ā
**Status:** ā
**IMPLEMENTED** (2026-02-07)
**Location:** `/home/john/projects/superscaper/cleanup/transform.py`
### Implementation Details
**Pipeline Steps:**
0. **Deduplication** (FIRST STEP - CRITICAL)
- Uses composite key: `company_location_date_time_class`
- Keeps LAST occurrence when duplicates found (most recent scrape)
- Same logic as scraper (`src/main.py` functions)
1. Company name normalization
2. Room field cleaning
3. Instructor classification (named/generic)
4. Waitlist detection
5. Integer type enforcement
6. Fill rate calculation
7. Normalized location field
8. Time parsing (24h format + hour extraction)
**Usage:**
```bash
# JSON output only
python cleanup/transform.py input.json output.json
# JSON + CSV output
python cleanup/transform.py input.json output.json --csv output.csv
# Example with actual files
cd /home/john/projects/superscaper
./venv/bin/python3 cleanup/transform.py nvrbot_scrape_20260206.json cleaned.json --csv cleaned.csv
```
**Testing Results:**
| Test | Input Records | Duplicates Found | Output Records | Errors | Status |
|------|---------------|------------------|----------------|---------|---------|
| Small dataset (Feb 6) | 269 | 0 | 269 | 0 | ā
Pass |
| Synthetic duplicates | 279 | 10 | 269 | 0 | ā
Pass |
| Large dataset (Feb 4) | 35,679 | 5 | 35,674 | 0 | ā
Pass |
**Output Schema:**
The cleaned output includes all original fields plus:
- `instructor_type`: "named" or "generic"
- `instructor_normalized`: slug format (e.g., "becca_jacobs")
- `is_waitlist`: boolean flag
- `fill_rate`: decimal (filled/total)
- `location_normalized`: "Company - Location"
- `time_24h`: 24-hour format (e.g., "08:00")
- `hour`: integer 0-23
**Documentation:**
- `/home/john/projects/superscaper/cleanup/README.md` - Updated with deduplication details
- Pipeline follows spec in STATUS.md "Second-Level Cleanup Spec"
**Next Steps:**
1. ~~Implement cleanup pipeline~~ ā
DONE
2. Test on production data ā ā
DONE
3. Integrate with daily scraper workflow (optional automation)
4. Add Google Sheets export capability (future enhancement)
5. Schedule daily cleanup cron job (future automation)
---
## Data Quality Issues to Handle
| Issue | Example | Solution |
|-------|---------|----------|
| **Duplicate records** | Same class scraped twice | **Dedupe on composite key** (company_location_date_time_class) - FIRST STEP in pipeline |
| Missing availability | S&T shows 0/0/0 | Flag as "no_data", exclude from fill rate analysis |
| Multiple instructors | "Arkaya \| Elly Ball" | Split to instructor1/2/3 |
| Date format variations | "01-31-2026" vs "2026-01-31" | Normalize to ISO internally |
| Location name changes | "Carlsbad Studio" vs "Carlsbad" | Lookup table normalization |
| Class name variations | "Free Flow" vs "FreeFlow" | Fuzzy matching + manual mapping |
| Timezone issues | EST vs PST studios | Store in local time with TZ indicator |
---
## Current Studios (8 total)
| Studio | Company | Location | Platform | Status |
|--------|---------|----------|----------|--------|
| BD_CARLSBAD | Breathe Degrees | Carlsbad, CA | Mariana Tek | ā
Working |
| BD_LIBERTY | Breathe Degrees | Liberty Station, SD | Mariana Tek | ā
Working |
| OS_TORONTO | OtherShip | Toronto (Adelaide + Yorkville) | Mariana Tek | ā
Working |
| OS_FLATIRON | OtherShip | NYC (Flatiron) | Mariana Tek | ā
Working |
| OS_WILLIAMSBURG | OtherShip | Brooklyn (Williamsburg) | Mariana Tek | ā
Working |
| MZ_MYRTLE | MindZero | Myrtle Beach, SC | Mariana Tek | ā
Working |
| ST_YONGE | Sweat and Tonic | Toronto (Yonge) | Mariana Tek | ā ļø No avail data |
| ST_FRONT | Sweat and Tonic | Toronto (Front) | Mariana Tek | ā ļø No avail data |
---
## Studios Not Yet Added
### Momence Platform (Requires Separate Scraper)
| Studio | Location | Platform | Notes |
|--------|----------|----------|-------|
| Soul Plunge | La Jolla, CA | Momence | host_id: 37373 |
| Conscious Body Recovery | San Diego | Momence | boardId: 85694 |
| Conscious Body Recovery | Temecula | Momence | boardId: 76949 |
**Momence Limitation:** Only exposes binary availability (open/full), not spot counts.
---
## Edge Case: Late-Day Additions
**Issue Identified:** 2026-02-04
**Status:** ā ļø Requires Implementation
### The Problem
Current scraper starts from `SCRAPED_THROUGH + 1`, which can miss classes added after the scrape but before midnight.
**Example scenario:**
```
Feb 3, 11:00 PM: Scraper runs, captures Feb 3 classes
Sets SCRAPED_THROUGH = 2026-02-03
Feb 3, 11:30 PM: Studio adds new class for Feb 3 schedule
Feb 4, 11:00 PM: Scraper starts from Feb 4
ā Missed: The 11:30pm addition to Feb 3
```
### Analysis
**Scenarios:**
| Scenario | Risk Level | Impact |
|----------|-----------|--------|
| **Late additions** | ā ļø HIGH | Studios add last-minute slots 11pm-midnight ā LOST with current logic |
| **Spot count updates** | ā ļø LOW | Minor ā we have point-in-time snapshots, not tracking real-time changes |
| **Cancellations** | ā
ACCEPTABLE | "Ghost" records actually valuable (shows schedule volatility) |
| **Reschedules** | ā
ACCEPTABLE | Multiple time slots visible (tracks changes) |
### Proposed Solution: Re-scrape Last Date + Deduplicate
**Change:**
```python
# OLD: start_date = studio_config.scraped_through + timedelta(days=1)
# NEW: start_date = studio_config.scraped_through # Re-scrape last date
```
**Add deduplication:**
```python
def generate_record_id(record):
"""Create unique composite key"""
return f"{record['company']}_{record['location']}_{record['classDate']}_{record['time']}_{record['class']}"
def deduplicate_records(records):
"""Keep most recent record per unique ID"""
seen = {}
for record in records:
record_id = generate_record_id(record)
seen[record_id] = record # Later record overwrites (handles spot updates)
return list(seen.values())
```
**Unique key example:**
```
OtherShip_Williamsburg_2025-09-15_07:00 AM_Guided Down: Sound Immersion
```
### Impact Analysis
| Metric | Current | Proposed | Change |
|--------|---------|----------|--------|
| Data completeness | 95-98% | 99-100% | +2-5% |
| Scrape volume/day | ~600 classes | ~1,200 classes | +100% |
| Execution time | ~10 min | ~15 min | +50% |
| Disk usage growth | Minimal | +2-5% (dedup mitigates) | Minor |
| Late additions | ā Lost | ā
Captured | ā
Fixed |
| Spot count snapshots | One/day | Two/day | Bonus |
### Recommendation
**ā
IMPLEMENT** ā Data quality justifies the overhead
**Rationale:**
1. **Real risk**: Studios DO add classes late in day (observed behavior)
2. **Acceptable cost**: 50% more execution time, minimal storage impact
3. **Data quality > efficiency**: Complete data more important than speed
4. **Bonus benefit**: Captures spot count updates (nice for fill rate trends)
### Alternative Considered: 2-Day Overlap
**Rejected:**
- Re-scrape last 2 days (SCRAPED_THROUGH - 1)
- 200% overhead vs 100%
- Overkill ā 1-day overlap sufficient for this use case
## Single Source of Truth (SSOT)
**File:** `data-state.json`
**Purpose:** Canonical state for all scraper data ā replaces fragmented tracking across multiple files
**Auto-updated by:**
- Scraper (after each run): `./scripts/update-data-state.sh scraper`
- Cleanup pipeline (future): `./scripts/update-data-state.sh cleanup`
- Manual refresh: `./scripts/update-data-state.sh manual`
**Contains:**
- Per-studio: scrapedThrough dates, raw/processed record counts, date ranges
- File inventory: all JSON files with sizes and record counts
- Totals: aggregate stats
- Pipeline status: scraper/cleanup/merge process state
**Human-readable docs:** STATUS.md tables should be generated FROM data-state.json (not maintained separately)
---
## Key Files
### Scraper Code
- `/home/john/projects/superscaper/src/main.py` ā Entry point (includes auto-merge)
- `/home/john/projects/superscaper/src/scraper.py` ā Selenium scraper
- `/home/john/projects/superscaper/src/parser.py` ā Data parser
- `/home/john/projects/superscaper/src/supabase_sync.py` ā Supabase push logic
- `/home/john/projects/superscaper/scripts/merge_to_master.py` ā Master.json merge script
### Configuration
- `/home/john/projects/superscaper/.env` ā Studio configs & SCRAPED_THROUGH dates
- `/home/john/projects/superscaper/scraper.log` ā Detailed run logs
### Data Files (Priority Order)
1. **`processed/master.json`** ā ā Canonical local dataset (auto-updated)
2. **Supabase `classes` table** ā Production database (auto-synced)
3. `nvrbot_scrape_YYYYMMDD.json` ā Daily scrape outputs
### Project Docs
- `~/.openclaw/workspaces/main/projects/nvrbot-scraper/STATUS.md` ā This file
- `~/.openclaw/workspaces/main/skills/nvrbot-scraper/SKILL.md` ā Skill automation docs
- `/home/john/projects/superscaper/CLAUDE.md` ā Technical scraper docs