Operations SIG 18 Nov 2024

From Zenon Wiki
Revision as of 18:04, 19 November 2024 by 0x3639 (talk | contribs) (correct URL)
Jump to navigation Jump to search

Agenda

What: Meeting to Discuss Improving Node Operations as part of the HC1: OP SIG

When: 19 Nov 2024 @ 8 CET EST

Where: https://matrix.to/#/#sig-op:hc1.chat

Chair: 0x3639

Agenda:

  1. Discuss follow Up items from previous meeting
  2. Document action items
  3. Establish next meeting

If you want to attend please respond (or DM) with your full matrix username and I will invite you to the group. No FUD, anger or BS allowed.

Pre-meeting Notes

0x3639

  • Created a troubleshooting script that runs a series of actions that help trouble shoot go-zenon. Runs basic linux commands to check the service, disk space, UFW, and then looks at logs and looks at some node endpoints.
  • Created a bootstrap / restore script that stops go-zenon, backups and compresses the necessary files, and then restarts go-zenon
  • I've been testing locally and need to submit a PR.

George

  • Traveling this week so probably can't attend the meeting
  • I have the znnd_exporter (prometheus metrics) code ready. Working on the dashboard and getting it auto-installed
  • I want to make sure we start planning for the HyperQube Network Launch Ops support work

Coinselor

Vilkris

  • Created a branch in which the database references are explicitly released. Commit message for context:
    • This commit introduces explicit releasing of database handles. The LevelDB package relies on the Go GC to cleanup unused snapshot references, but many other database packages require snapshots to be released explicitly. These changes serve as a starting point for assessing the usage of alternative databases.
  • Releasing the DB references manually provides no apparent improvement in performance - possibly a negative effect in performance. Would need more testing to determine the effect.
  • Overall the task of manually managing the references is very tedious (and complicated inside the account pool) and as can be seen from the amount of changes done in the branch, it is not a trivial change and affects a vast portion of the codebase.
  • Based on personal testing and anecdotal evidence from others the recommended approach for syncing a node from scratch on a VPS with non-dedicated resources should be to first sync the node on a local machine and then transfer the node's database to the server.
  • Syncing a node locally on my machine only takes around 13 hours, while on a VPS with shared resources it can take over a week. This would suggest that LevelDB is not the main culprit for the slow sync, raising into question how much time should be spent on investigating the replacement of LevelDB right now.

Meeting Minutes Summary (ChatGPT)