modernize-scientific-stack
Modernize Scientific Computing Stack
This skill provides guidance for migrating legacy Python 2 scientific computing code to modern Python 3 with contemporary libraries and best practices.
When to Use This Skill
Apply this skill when:
- Migrating Python 2 scientific scripts to Python 3
- Updating legacy data processing code using outdated patterns
- Modernizing scripts that use deprecated file handling, string encoding, or numerical libraries
- Converting scripts from csv module to pandas for data analysis
- Replacing os.path with pathlib for path manipulation
Approach
Phase 1: Complete Code Discovery
Before making any changes, ensure complete understanding of the existing codebase:
-
Read all source files completely - If a file read is truncated, request the full content before proceeding. Never assume file contents based on partial reads.
-
Identify all dependencies - Check for:
- Import statements (standard library and third-party)
- Configuration files (JSON, YAML, INI)
- Data files (CSV, Excel, pickle)
- Environment requirements
-
Map the data flow - Understand:
- Input file formats and encodings
- Data transformations applied
- Output format requirements
- Any intermediate files or caches
Phase 2: Identify Migration Requirements
Common Python 2 to Python 3 migration patterns in scientific code:
| Legacy Pattern | Modern Replacement |
|---|---|
print "text" |
print("text") |
unicode() / str() |
str() with explicit encoding |
open(file) |
open(file, encoding='utf-8') |
os.path.join() |
pathlib.Path() |
csv module |
pandas.read_csv() |
for key in dict.keys() |
for key in dict |
dict.has_key(x) |
x in dict |
| Manual file iteration | Context managers (with statements) |
xrange() |
range() |
Integer division / |
Explicit // or float division |
Phase 3: Implementation Strategy
-
Create the modernized script with these priorities:
- UTF-8 encoding for all file operations
- pathlib.Path for all file path manipulations
- pandas for CSV/data processing
- Type hints where beneficial
- Context managers for resource handling
-
Handle configuration files - Check for file existence before reading:
config_path = Path("config.json") if config_path.exists(): config = json.loads(config_path.read_text(encoding='utf-8')) -
Create requirements.txt - Include all dependencies with version constraints
Phase 4: Verification Protocol
Critical: Always verify file operations
After writing any file, read it back to confirm:
- The complete content was written (not truncated)
- The syntax is valid
- All imports are present
Testing sequence:
-
Syntax validation - Run Python syntax check:
python -m py_compile script.py -
Import verification - Test all imports resolve:
python -c "from script import *" -
Functional test - Run the script and compare output to expected results
-
Output validation - Verify output format matches requirements exactly
Common Pitfalls to Avoid
-
Truncated file content - Never proceed with partial file reads. If a response shows
... [truncated]or incomplete content, request the full file before continuing. -
Unverified writes - After using a write operation, always read the file back to confirm the complete content was written correctly.
-
Encoding issues - Always specify
encoding='utf-8'explicitly in file operations. Legacy scripts often have implicit ASCII assumptions. -
Path string concatenation - Replace all
os.path.join()and string concatenation for paths withpathlib.Pathoperations. -
Missing edge case handling:
- Empty data files or datasets
- Missing required files
- Invalid data types in CSV columns
- Stations/entities with no matching data
-
Environment setup repetition - When setting up environments (venv, PATH), verify the setup persists rather than repeating in each command.
Verification Checklist
Before marking the task complete, confirm:
- All source files were read completely (no truncation)
- Written files were verified by reading back
- All Python 2 patterns have been converted
- File encodings are explicitly specified
- pathlib is used for all path operations
- pandas is used for data processing (where appropriate)
- requirements.txt includes all dependencies
- Script runs without errors
- Output matches expected format exactly
- Edge cases are handled (empty data, missing files)
Output Validation
When the task specifies an expected output format, verify the output matches exactly:
- Run the modernized script
- Capture the output
- Compare against expected format character-by-character if needed
- Pay attention to:
- Decimal precision in numerical output
- Whitespace and formatting
- Order of output items
- Units and labels