Operations SIG 18 Nov 2024
Jump to navigation
Jump to search
Agenda
What: Meeting to Discuss Improving Node Operations as part of the HC1: OP SIG
When: 19 Nov 2024 @ 8 CET EST
Where: https://matrix.to/#/#sig-op:hc1.chat
Chair: 0x3639
Agenda:
- Discuss follow Up items from previous meeting
- Document action items
- Establish next meeting
If you want to attend please respond (or DM) with your full matrix username and I will invite you to the group. No FUD, anger or BS allowed.
Pre-meeting Notes
- Created a troubleshooting script that runs a series of actions that help trouble shoot go-zenon. Runs basic linux commands to check the service, disk space, UFW, and then looks at logs and looks at some node endpoints.
- Created a bootstrap / restore script that stops go-zenon, backups and compresses the necessary files, and then restarts go-zenon
- I've been testing locally and need to submit a PR.
- Traveling this week so probably can't attend the meeting
- I have the znnd_exporter (prometheus metrics) code ready. Working on the dashboard and getting it auto-installed
- I want to make sure we start planning for the HyperQube Network Launch Ops support work
- Created a branch in which the database references are explicitly released. Commit message for context:
This commit introduces explicit releasing of database handles. The LevelDB package relies on the Go GC to cleanup unused snapshot references, but many other database packages require snapshots to be released explicitly. These changes serve as a starting point for assessing the usage of alternative databases.
- Releasing the DB references manually provides no apparent improvement in performance - possibly a negative effect in performance. Would need more testing to determine the effect.
- Overall the task of manually managing the references is very tedious (and complicated inside the account pool) and as can be seen from the amount of changes done in the branch, it is not a trivial change and affects a vast portion of the codebase.
- Based on personal testing and anecdotal evidence from others the recommended approach for syncing a node from scratch on a VPS with non-dedicated resources should be to first sync the node on a local machine and then transfer the node's database to the server.
- Syncing a node locally on my machine only takes around 13 hours, while on a VPS with shared resources it can take over a week. This would suggest that LevelDB is not the main culprit for the slow sync, raising into question how much time should be spent on investigating the replacement of LevelDB right now.