HC1: Operations SIG 26 Aug 2024: Difference between revisions
Line 37: | Line 37: | ||
== Minutes == | == Minutes == | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:03:11 - deeznnutz:<nowiki>'''</nowiki> Thx everyone for contributing to the `go-zenon` bash script. We are making good progress. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:03:21 - deeznnutz:<nowiki>'''</nowiki> I merged in @coinselor's PR #8 to improve the ASCII art and add a `--help` flag. <nowiki>&</nowiki>#x20; | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:03:34 - deeznnutz:<nowiki>'''</nowiki> those changes were pretty straight forward | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:03:42 - deeznnutz:<nowiki>'''</nowiki> George submitted the PR for arm64 support. I have not tested it yet. Once we test it we can pull in that change. It's pretty simple. He submitted an issue to make sure the script checks for `apt` and `systemd`. Should we clarify that as a requirement or have the script check for the proper operating system and systemd? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:04:05 - georgezgeorgez:<nowiki>'''</nowiki> I think it's fine just to document it for now. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:04:15 - georgezgeorgez:<nowiki>'''</nowiki> I think it's okay for us to do 1 deployment target really well first. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:04:45 - georgezgeorgez:<nowiki>'''</nowiki> The people who need the most support will probably be choosing ubuntu/deb as their recommended OS. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:04:51 - deeznnutz:<nowiki>'''</nowiki> ya, makes sense. Should we check in the script and halt it if apt and systemd are not present? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:05:25 - georgezgeorgez:<nowiki>'''</nowiki> We could do that, but not a priority. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:05:46 - deeznnutz:<nowiki>'''</nowiki> OK - I can add that as a todo and we can deal with it later. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:05:47 - georgezgeorgez:<nowiki>'''</nowiki> We should try an get someone to use this script in the wild asap | |||
And get information about their nodes via the monitoring | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:06:03 - deeznnutz:<nowiki>'''</nowiki> I setup a stand alone script to automates the installation of grafana, node_exporter & promethesus. It creates a default promethesus datasource, scrapes the node_exporter endpoint, and installs a default node_exporter dashboard. It currently only works on amd64. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:06:21 - deeznnutz:<nowiki>'''</nowiki> TODO | |||
<nowiki>*</nowiki> We need to expand functionality to arm64 | <nowiki>*</nowiki> We need to expand functionality to arm64 | ||
Line 91: | Line 87: | ||
* What else should we include? | * What else should we include? | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:06:35 - georgezgeorgez:<nowiki>'''</nowiki> What is the infinity data plugin? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:06:45 - deeznnutz:<nowiki>'''</nowiki> it's a plugin that allows curl calls | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:07:03 - deeznnutz:<nowiki>'''</nowiki> it basically runs them on a schedule and then you can display the data in a dashboard | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:07:22 - georgezgeorgez:<nowiki>'''</nowiki> gotcha. That might be the fastest way | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:07:34 - georgezgeorgez:<nowiki>'''</nowiki> There could be other relatively quick methods like parsing logs | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:07:54 - deeznnutz:<nowiki>'''</nowiki> previously I used JSON API and it worked great. but that plugin is no longer under development | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:08:04 - coinselor:<nowiki>'''</nowiki> I think syrius shows quite a few znnd metrics, we could use that as reference | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:08:49 - georgezgeorgez:<nowiki>'''</nowiki> Long term, I think we should consider building metrics into the node | |||
I think <nowiki>https://opentelemetry.io/</nowiki> is worth considering | I think <nowiki>https://opentelemetry.io/</nowiki> is worth considering | ||
But not really the next step for us | But not really the next step for us | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:09:15 - deeznnutz:<nowiki>'''</nowiki> that would be awesome. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:09:34 - georgezgeorgez:<nowiki>'''</nowiki> In terms of other metrics, what would help us debug a production issue or a testnet failure? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:09:45 - georgezgeorgez:<nowiki>'''</nowiki> We might need different dashboards for prod and dev envs | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:10:00 - deeznnutz:<nowiki>'''</nowiki> We can add Loki the log processos | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:10:15 - deeznnutz:<nowiki>'''</nowiki> Ive tested that before. it can parse all the logs and you can display them any way you want | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:10:48 - georgezgeorgez:<nowiki>'''</nowiki> Grafana has something called the LGTM stack | |||
<nowiki>https://grafana.com/go/webinar/getting-started-with-grafana-lgtm-stack/</nowiki> | <nowiki>https://grafana.com/go/webinar/getting-started-with-grafana-lgtm-stack/</nowiki> | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:10:56 - georgezgeorgez:<nowiki>'''</nowiki> I'm not familiar with Tempo or Mirmir | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:11:29 - deeznnutz:<nowiki>'''</nowiki> cool - I've never seen that before. I can check it out | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:12:34 - georgezgeorgez:<nowiki>'''</nowiki> These days, tools are being developed so fast it seems | |||
I think we just go with something, relatively modern, and then if there's a big reason to change, we change | I think we just go with something, relatively modern, and then if there's a big reason to change, we change | ||
A few years ago, ELK stack was pretty popular, but I think less now. And I think it's a bit overkill. | A few years ago, ELK stack was pretty popular, but I think less now. And I think it's a bit overkill. | ||
If there is a criteria, we should consider how lightweight the stack is | If there is a criteria, we should consider how lightweight the stack is | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:12:52 - georgezgeorgez:<nowiki>'''</nowiki> Considering that any resources used for the monitoring stack is taking away from znnd | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:12:57 - georgezgeorgez:<nowiki>'''</nowiki> in a single node deploy | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:13:41 - deeznnutz:<nowiki>'''</nowiki> so the next steps are arm support, Infinity data plugin, create znnd dashboard | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:13:44 - georgezgeorgez:<nowiki>'''</nowiki> I'm not 100% sure how useful log aggregation will be for single node | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:14:06 - georgezgeorgez:<nowiki>'''</nowiki> Considering that all the logs will just be on the box itself | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:14:23 - coinselor:<nowiki>'''</nowiki> Aren't we making the monitoring stack optional when using the script? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:14:24 - georgezgeorgez:<nowiki>'''</nowiki> But if it helps people isolate the logs around a certain timeframe/ metric spike | |||
It could still be useful | It could still be useful | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:14:35 - deeznnutz:<nowiki>'''</nowiki> we could consider a `--send-logs` flag | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:15:13 - georgezgeorgez:<nowiki>'''</nowiki> <@coinselor:zenon.chat "Aren't we making the monitoring ..."> Yes optional, but hopefully it's useful enough where most node operators want to run it | |||
So lightweight is better imo | So lightweight is better imo | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:15:24 - deeznnutz:<nowiki>'''</nowiki> <@coinselor:zenon.chat "Aren't we making the monitoring ..."> this was one of my questions. I assumed we would add a flag for `--grafana` to install it separately | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, | |||
17:15:47 - coinselor:<nowiki>'''</nowiki> I can work on the interactivity of the script. I should be able to look at how the script is installing all the stuff deez is adding and make it interactive so that the user has to choose what to install. | |||
Maybe we can make the monitoring stack the (Default) option | Maybe we can make the monitoring stack the (Default) option | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:16:19 - georgezgeorgez:<nowiki>'''</nowiki> deeznnutz: you are the chair. You run a pillar and nodes. What would actually be useful to you? | |||
How can we get feedback about what is important for other operators? | How can we get feedback about what is important for other operators? | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:16:57 - georgezgeorgez:<nowiki>'''</nowiki> As chair, you should try and get feedback from users/stakeholders | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:17:13 - georgezgeorgez:<nowiki>'''</nowiki> Maybe a survey to pillars? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:17:45 - deeznnutz:<nowiki>'''</nowiki> ya, makes sense. It would be super helpful to me when trouble shooting stuff if I could get logs and settings when helping someone | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:17:59 - coinselor:<nowiki>'''</nowiki> I think the survey might be more useful after we have them use the script for the first time, then get their feedback. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:18:13 - deeznnutz:<nowiki>'''</nowiki> i always go through a series of questions that are super simple before getting into helping someone. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:18:35 - georgezgeorgez:<nowiki>'''</nowiki> nice, that is the basis of the "diagnostics" i talked about | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:18:43 - deeznnutz:<nowiki>'''</nowiki> but regarding others, I can ask them what would be useful to them as a pillar / operator | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:18:55 - georgezgeorgez:<nowiki>'''</nowiki> yeah we can do it informally to start | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:19:18 - georgezgeorgez:<nowiki>'''</nowiki> i just want to make sure we're building stuff with guidance from the actual community | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:19:37 - georgezgeorgez:<nowiki>'''</nowiki> i mean we're part of the community, but broader feedback | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:19:49 - deeznnutz:<nowiki>'''</nowiki> what about setting up a producer address like the znn controller does. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:20:02 - deeznnutz:<nowiki>'''</nowiki> should we have a `--producer` flag that setups up a producer address? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:20:20 - georgezgeorgez:<nowiki>'''</nowiki> i think that is only necessary for pillars | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:20:34 - georgezgeorgez:<nowiki>'''</nowiki> so if that is our initial target user then yeah we would need it | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:20:42 - georgezgeorgez:<nowiki>'''</nowiki> but changing the producer also requires changing it on-chain | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:20:53 - georgezgeorgez:<nowiki>'''</nowiki> some people might want to re-use an existing producer | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:21:10 - georgezgeorgez:<nowiki>'''</nowiki> maybe that would be considered a bad practice | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:21:29 - deeznnutz:<nowiki>'''</nowiki> can a producer address be created with the CLI | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:21:45 - deeznnutz:<nowiki>'''</nowiki> I've never created one before without using the znn-controller-software | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:22:19 - deeznnutz:<nowiki>'''</nowiki> <@coinselor:zenon.chat "I think the survey might be more..."> maybe we do it before and after | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:22:43 - georgezgeorgez:<nowiki>'''</nowiki> the producer is just a key-pair | |||
The node configuration has to specify the file to use | The node configuration has to specify the file to use | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:22:51 - deeznnutz:<nowiki>'''</nowiki> for example I know shai wants better monitoring tools. Would be interesting to get his feedback | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:23:39 - deeznnutz:<nowiki>'''</nowiki> right, in the `config.json` | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:24:01 - coinselor:<nowiki>'''</nowiki> informally asking before sounds good to brainstorm ideas, but I won't be shocked if someone goes 'a tg bot that alerts me about node going down' and similar requests | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:25:26 - georgezgeorgez:<nowiki>'''</nowiki> Sometimes a user doesn't exactly know what they want 😅 | |||
It's up to us to translate requests into underlying problems and solve those | It's up to us to translate requests into underlying problems and solve those | ||
The surface level suggestion sometimes will be and sometimes won't be the best path | The surface level suggestion sometimes will be and sometimes won't be the best path | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:25:43 - georgezgeorgez:<nowiki>'''</nowiki> So another target user could be developers | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:25:55 - georgezgeorgez:<nowiki>'''</nowiki> I created a "devnet' branch of znnd way back | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:26:11 - georgezgeorgez:<nowiki>'''</nowiki> And it sets up the producer and config necessary for a single node testnet | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:26:39 - georgezgeorgez:<nowiki>'''</nowiki> It's baked into znnd | |||
And it means that in order to use it, developers have to rebase their changes on top of the branch | And it means that in order to use it, developers have to rebase their changes on top of the branch | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:26:45 - georgezgeorgez:<nowiki>'''</nowiki> It would be better if creating a devnet was a separate script | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:26:56 - georgezgeorgez:<nowiki>'''</nowiki> Not tied to a specific branch of go-zenon | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:27:35 - georgezgeorgez:<nowiki>'''</nowiki> But I think for Operations, we should focus on node operators first | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:27:40 - deeznnutz:<nowiki>'''</nowiki> So maybe I can start creating issues in GH for this additional functionality. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:28:32 - georgezgeorgez:<nowiki>'''</nowiki> Yeah it's no problem to define more work | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:28:56 - georgezgeorgez:<nowiki>'''</nowiki> We should have a selection of possible things to do | |||
And then work with the users/stakeholders to pick what to do next | And then work with the users/stakeholders to pick what to do next | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:29:02 - deeznnutz:<nowiki>'''</nowiki> We are talking about | |||
- interactive installation menu | - interactive installation menu | ||
- producer flag | - producer flag | ||
- testnet flag | - testnet flag | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:29:26 - georgezgeorgez:<nowiki>'''</nowiki> Do you have an idea of how a menu would work? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:29:33 - deeznnutz:<nowiki>'''</nowiki> in addition to the things mentioned above to integrate znnd monitoring | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:29:42 - georgezgeorgez:<nowiki>'''</nowiki> I think doing it in bash wouldn't be so pretty | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:29:49 - deeznnutz:<nowiki>'''</nowiki> <@georgezgeorgez:hc1.chat "Do you have an idea of how a men..."> I know how it wont work... lol | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:30:08 - deeznnutz:<nowiki>'''</nowiki> I tried it and could not get one working with an install command with curl. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:30:28 - deeznnutz:<nowiki>'''</nowiki> maybe I just gave up too early | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:30:34 - georgezgeorgez:<nowiki>'''</nowiki> <nowiki>https://github.com/charmbracelet/bubbletea</nowiki> | |||
If we do go in the direction of TUIs | If we do go in the direction of TUIs | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:31:11 - deeznnutz:<nowiki>'''</nowiki> that would be awesome | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:31:19 - georgezgeorgez:<nowiki>'''</nowiki> deeznnutz: but again, probably not the near term focus | |||
What do you think we should try and have done before next meeting? | What do you think we should try and have done before next meeting? | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:31:59 - deeznnutz:<nowiki>'''</nowiki> my goal is to get the new datasource integrated and a custom znnd dashboard working | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:32:06 - deeznnutz | |||
<nowiki> | <nowiki>:</nowiki><nowiki>'''</nowiki> that what I can work on. | ||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:32:21 - georgezgeorgez:<nowiki>'''</nowiki> Cool that's what I was thinking too | |||
Maybe at the very least, wireframe json for the dashboard? | |||
And some poc of the infinity plugin for at least one of the graphs | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:32:25 - deeznnutz:<nowiki>'''</nowiki> and pull in your changes | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:32:55 - deeznnutz:<nowiki>'''</nowiki> <@georgezgeorgez:hc1.chat "Cool that's what I was thinking ..."> yes that is doable | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:33:52 - deeznnutz:<nowiki>'''</nowiki> The TUI framework would be super cool. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:33:53 - georgezgeorgez:<nowiki>'''</nowiki> <@deeznnutz:zenon.chat "and pull in your changes"> If we want to actually test it live, we would need to run an arm64 server | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:34:12 - deeznnutz:<nowiki>'''</nowiki> <@georgezgeorgez:hc1.chat "If we want to actually test it l..."> DO does not have them. So I can test on another platform | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:34:14 - georgezgeorgez:<nowiki>'''</nowiki> Long term, i think it would make sense for us to have some test scripts that interact with cloud provider apis to spin up nodes etc | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:34:35 - georgezgeorgez:<nowiki>'''</nowiki> Run some tests, spit out data, and then tear it down | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:34:52 - georgezgeorgez:<nowiki>'''</nowiki> <nowiki>https://boto3.amazonaws.com/v1/documentation/api/latest/index.html</nowiki> | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:35:04 - coinselor:<nowiki>'''</nowiki> I can test the arm64 changes | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:35:39 - deeznnutz:<nowiki>'''</nowiki> I need to add arm support for the grafana install too | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:36:00 - deeznnutz:<nowiki>'''</nowiki> maybe we can all take a look at the TUI framework. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:36:22 - deeznnutz:<nowiki>'''</nowiki> between that and everything else going on I think this is doable in the next 2 weeks | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:36:40 - georgezgeorgez:<nowiki>'''</nowiki> A lot of my devnet branch could be carved out. | |||
But let's get a dashboard out first, before we make things pretty | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:36:58 - deeznnutz:<nowiki>'''</nowiki> cool. sounds like a plan | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:37:04 - georgezgeorgez:<nowiki>'''</nowiki> I can help with the Infinity plugin and grafana dashboard json | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:37:09 - georgezgeorgez:<nowiki>'''</nowiki> and whatever else really | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:37:35 - deeznnutz:<nowiki>'''</nowiki> maybe you can take on the dashboard after I get the plugin installed and setup | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:37:36 - georgezgeorgez:<nowiki>'''</nowiki> And yeah maybe before the next meeting, we can talk with other pillar operators | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:38:02 - deeznnutz:<nowiki>'''</nowiki> I have time this week. I'm traveling T-TH next week. | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:38:31 - georgezgeorgez:<nowiki>'''</nowiki> When do you think we should meet next? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:38:59 - deeznnutz:<nowiki>'''</nowiki> Sept 9 @ 6PM EST? does that work? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:39:32 - georgezgeorgez:<nowiki>'''</nowiki> Should be good | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:39:41 - georgezgeorgez:<nowiki>'''</nowiki> coinselor hbu? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:39:51 - coinselor:<nowiki>'''</nowiki> Ye that works | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:40:08 - deeznnutz:<nowiki>'''</nowiki> cool. sounds like a plan. thx everyone!! | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:40:21 - georgezgeorgez:<nowiki>'''</nowiki> Anything else you want to go over? Or call it for today? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:40:47 - deeznnutz:<nowiki>'''</nowiki> I'm good. did you see my post on dynamic fusing? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:40:54 - deeznnutz:<nowiki>'''</nowiki> am I retarded? | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:41:11 - georgezgeorgez:<nowiki>'''</nowiki> Not sure if either question is within scope of the SIG | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:41:16 - georgezgeorgez:<nowiki>'''</nowiki> haha | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:41:19 - deeznnutz:<nowiki>'''</nowiki> lol | |||
<nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:41:26 - deeznnutz:<nowiki>'''</nowiki> ya, we can chat about that elswhere | |||
Thank you | <nowiki>'''</nowiki>Mon, Aug 26, 2024, 17:41:44 - georgezgeorgez:<nowiki>'''</nowiki> Thank you everyone. |
Revision as of 02:11, 30 August 2024
Agenda
What: Meeting to Discuss Improving Node Operations as part of the HC1: Operations SIG
When: 26 Aug 2024 @ 6PM EST
Where: https://element.zenon.chat/#/room/#sig-operations:hc1.chat 4
Chair: 0x3639
Agenda:
- Discuss follow Up items from previous meeting
- Document action items
- Establish next meeting
If you want to attend please respond (or DM) with your full matrix username and I will invite you to the group. No FUD, anger or BS allowed.
Pre-meeting Notes
- Added `grafana.sh` https://github.com/go-zenon/go/blob/main/grafana.sh. This automates the installation of grafana, node_exporter & promethesus. It creates a default promethesus datasource, scrapes the node_exporter endpoint, and installs a node_exporter dashboard. Tested on amd64. Need to add arm64 support.
- Started to investigate a custom dashboard for znnd. I created this for docker previous. It leveraged the JSON API data source for Grafana. However, this plugin is now in maintenance mode, no new features will be added. Grafana recommends using the Infinity data source plugin instead.
- I started to investigate the Infinity data source plugin. It will be used to scrape the api endpoints to report `syncStatus` and other important metrics.
- Next we can consider installing Loki to manage log files. We can discuss at the meeting.
- Opened PR for arm64 support https://github.com/go-zenon/go/pull/9, needs testing
- Added issue https://github.com/go-zenon/go/issues/10
- Made ASCII Art more readable at lower resolutions.
- Added --help flag https://github.com/go-zenon/go/pull/8
- I can test arm64 support, will be spawning an arm VPS for the Supernova testnet.
Minutes
'''Mon, Aug 26, 2024, 17:03:11 - deeznnutz:''' Thx everyone for contributing to the `go-zenon` bash script. We are making good progress.
'''Mon, Aug 26, 2024, 17:03:21 - deeznnutz:''' I merged in @coinselor's PR #8 to improve the ASCII art and add a `--help` flag.
'''Mon, Aug 26, 2024, 17:03:34 - deeznnutz:''' those changes were pretty straight forward
'''Mon, Aug 26, 2024, 17:03:42 - deeznnutz:''' George submitted the PR for arm64 support. I have not tested it yet. Once we test it we can pull in that change. It's pretty simple. He submitted an issue to make sure the script checks for `apt` and `systemd`. Should we clarify that as a requirement or have the script check for the proper operating system and systemd?
'''Mon, Aug 26, 2024, 17:04:05 - georgezgeorgez:''' I think it's fine just to document it for now.
'''Mon, Aug 26, 2024, 17:04:15 - georgezgeorgez:''' I think it's okay for us to do 1 deployment target really well first.
'''Mon, Aug 26, 2024, 17:04:45 - georgezgeorgez:''' The people who need the most support will probably be choosing ubuntu/deb as their recommended OS.
'''Mon, Aug 26, 2024, 17:04:51 - deeznnutz:''' ya, makes sense. Should we check in the script and halt it if apt and systemd are not present?
'''Mon, Aug 26, 2024, 17:05:25 - georgezgeorgez:''' We could do that, but not a priority.
'''Mon, Aug 26, 2024, 17:05:46 - deeznnutz:''' OK - I can add that as a todo and we can deal with it later.
'''Mon, Aug 26, 2024, 17:05:47 - georgezgeorgez:''' We should try an get someone to use this script in the wild asap
And get information about their nodes via the monitoring
'''Mon, Aug 26, 2024, 17:06:03 - deeznnutz:''' I setup a stand alone script to automates the installation of grafana, node_exporter & promethesus. It creates a default promethesus datasource, scrapes the node_exporter endpoint, and installs a default node_exporter dashboard. It currently only works on amd64.
'''Mon, Aug 26, 2024, 17:06:21 - deeznnutz:''' TODO
* We need to expand functionality to arm64
* Add a custom dashboard for `znnd`. This will require installing the Infinity data plugin and adding a new datasource (you can add an api endpoint as a datasource and it scrapes the api at `x` interval).
* Potential \`znnd\` metrics to show
* Sync status
* currentHeight
* targetHeight
* version
* commit
* numPeers
* stats.osInfo
* What else should we include?
'''Mon, Aug 26, 2024, 17:06:35 - georgezgeorgez:''' What is the infinity data plugin?
'''Mon, Aug 26, 2024, 17:06:45 - deeznnutz:''' it's a plugin that allows curl calls
'''Mon, Aug 26, 2024, 17:07:03 - deeznnutz:''' it basically runs them on a schedule and then you can display the data in a dashboard
'''Mon, Aug 26, 2024, 17:07:22 - georgezgeorgez:''' gotcha. That might be the fastest way
'''Mon, Aug 26, 2024, 17:07:34 - georgezgeorgez:''' There could be other relatively quick methods like parsing logs
'''Mon, Aug 26, 2024, 17:07:54 - deeznnutz:''' previously I used JSON API and it worked great. but that plugin is no longer under development
'''Mon, Aug 26, 2024, 17:08:04 - coinselor:''' I think syrius shows quite a few znnd metrics, we could use that as reference
'''Mon, Aug 26, 2024, 17:08:49 - georgezgeorgez:''' Long term, I think we should consider building metrics into the node
I think https://opentelemetry.io/ is worth considering
But not really the next step for us
'''Mon, Aug 26, 2024, 17:09:15 - deeznnutz:''' that would be awesome.
'''Mon, Aug 26, 2024, 17:09:34 - georgezgeorgez:''' In terms of other metrics, what would help us debug a production issue or a testnet failure?
'''Mon, Aug 26, 2024, 17:09:45 - georgezgeorgez:''' We might need different dashboards for prod and dev envs
'''Mon, Aug 26, 2024, 17:10:00 - deeznnutz:''' We can add Loki the log processos
'''Mon, Aug 26, 2024, 17:10:15 - deeznnutz:''' Ive tested that before. it can parse all the logs and you can display them any way you want
'''Mon, Aug 26, 2024, 17:10:48 - georgezgeorgez:''' Grafana has something called the LGTM stack
https://grafana.com/go/webinar/getting-started-with-grafana-lgtm-stack/
'''Mon, Aug 26, 2024, 17:10:56 - georgezgeorgez:''' I'm not familiar with Tempo or Mirmir
'''Mon, Aug 26, 2024, 17:11:29 - deeznnutz:''' cool - I've never seen that before. I can check it out
'''Mon, Aug 26, 2024, 17:12:34 - georgezgeorgez:''' These days, tools are being developed so fast it seems
I think we just go with something, relatively modern, and then if there's a big reason to change, we change
A few years ago, ELK stack was pretty popular, but I think less now. And I think it's a bit overkill.
If there is a criteria, we should consider how lightweight the stack is
'''Mon, Aug 26, 2024, 17:12:52 - georgezgeorgez:''' Considering that any resources used for the monitoring stack is taking away from znnd
'''Mon, Aug 26, 2024, 17:12:57 - georgezgeorgez:''' in a single node deploy
'''Mon, Aug 26, 2024, 17:13:41 - deeznnutz:''' so the next steps are arm support, Infinity data plugin, create znnd dashboard
'''Mon, Aug 26, 2024, 17:13:44 - georgezgeorgez:''' I'm not 100% sure how useful log aggregation will be for single node
'''Mon, Aug 26, 2024, 17:14:06 - georgezgeorgez:''' Considering that all the logs will just be on the box itself
'''Mon, Aug 26, 2024, 17:14:23 - coinselor:''' Aren't we making the monitoring stack optional when using the script?
'''Mon, Aug 26, 2024, 17:14:24 - georgezgeorgez:''' But if it helps people isolate the logs around a certain timeframe/ metric spike
It could still be useful
'''Mon, Aug 26, 2024, 17:14:35 - deeznnutz:''' we could consider a `--send-logs` flag
'''Mon, Aug 26, 2024, 17:15:13 - georgezgeorgez:''' <@coinselor:zenon.chat "Aren't we making the monitoring ..."> Yes optional, but hopefully it's useful enough where most node operators want to run it
So lightweight is better imo
'''Mon, Aug 26, 2024, 17:15:24 - deeznnutz:''' <@coinselor:zenon.chat "Aren't we making the monitoring ..."> this was one of my questions. I assumed we would add a flag for `--grafana` to install it separately
'''Mon, Aug 26, 2024,
17:15:47 - coinselor:''' I can work on the interactivity of the script. I should be able to look at how the script is installing all the stuff deez is adding and make it interactive so that the user has to choose what to install.
Maybe we can make the monitoring stack the (Default) option
'''Mon, Aug 26, 2024, 17:16:19 - georgezgeorgez:''' deeznnutz: you are the chair. You run a pillar and nodes. What would actually be useful to you?
How can we get feedback about what is important for other operators?
'''Mon, Aug 26, 2024, 17:16:57 - georgezgeorgez:''' As chair, you should try and get feedback from users/stakeholders
'''Mon, Aug 26, 2024, 17:17:13 - georgezgeorgez:''' Maybe a survey to pillars?
'''Mon, Aug 26, 2024, 17:17:45 - deeznnutz:''' ya, makes sense. It would be super helpful to me when trouble shooting stuff if I could get logs and settings when helping someone
'''Mon, Aug 26, 2024, 17:17:59 - coinselor:''' I think the survey might be more useful after we have them use the script for the first time, then get their feedback.
'''Mon, Aug 26, 2024, 17:18:13 - deeznnutz:''' i always go through a series of questions that are super simple before getting into helping someone.
'''Mon, Aug 26, 2024, 17:18:35 - georgezgeorgez:''' nice, that is the basis of the "diagnostics" i talked about
'''Mon, Aug 26, 2024, 17:18:43 - deeznnutz:''' but regarding others, I can ask them what would be useful to them as a pillar / operator
'''Mon, Aug 26, 2024, 17:18:55 - georgezgeorgez:''' yeah we can do it informally to start
'''Mon, Aug 26, 2024, 17:19:18 - georgezgeorgez:''' i just want to make sure we're building stuff with guidance from the actual community
'''Mon, Aug 26, 2024, 17:19:37 - georgezgeorgez:''' i mean we're part of the community, but broader feedback
'''Mon, Aug 26, 2024, 17:19:49 - deeznnutz:''' what about setting up a producer address like the znn controller does.
'''Mon, Aug 26, 2024, 17:20:02 - deeznnutz:''' should we have a `--producer` flag that setups up a producer address?
'''Mon, Aug 26, 2024, 17:20:20 - georgezgeorgez:''' i think that is only necessary for pillars
'''Mon, Aug 26, 2024, 17:20:34 - georgezgeorgez:''' so if that is our initial target user then yeah we would need it
'''Mon, Aug 26, 2024, 17:20:42 - georgezgeorgez:''' but changing the producer also requires changing it on-chain
'''Mon, Aug 26, 2024, 17:20:53 - georgezgeorgez:''' some people might want to re-use an existing producer
'''Mon, Aug 26, 2024, 17:21:10 - georgezgeorgez:''' maybe that would be considered a bad practice
'''Mon, Aug 26, 2024, 17:21:29 - deeznnutz:''' can a producer address be created with the CLI
'''Mon, Aug 26, 2024, 17:21:45 - deeznnutz:''' I've never created one before without using the znn-controller-software
'''Mon, Aug 26, 2024, 17:22:19 - deeznnutz:''' <@coinselor:zenon.chat "I think the survey might be more..."> maybe we do it before and after
'''Mon, Aug 26, 2024, 17:22:43 - georgezgeorgez:''' the producer is just a key-pair
The node configuration has to specify the file to use
'''Mon, Aug 26, 2024, 17:22:51 - deeznnutz:''' for example I know shai wants better monitoring tools. Would be interesting to get his feedback
'''Mon, Aug 26, 2024, 17:23:39 - deeznnutz:''' right, in the `config.json`
'''Mon, Aug 26, 2024, 17:24:01 - coinselor:''' informally asking before sounds good to brainstorm ideas, but I won't be shocked if someone goes 'a tg bot that alerts me about node going down' and similar requests
'''Mon, Aug 26, 2024, 17:25:26 - georgezgeorgez:''' Sometimes a user doesn't exactly know what they want 😅
It's up to us to translate requests into underlying problems and solve those
The surface level suggestion sometimes will be and sometimes won't be the best path
'''Mon, Aug 26, 2024, 17:25:43 - georgezgeorgez:''' So another target user could be developers
'''Mon, Aug 26, 2024, 17:25:55 - georgezgeorgez:''' I created a "devnet' branch of znnd way back
'''Mon, Aug 26, 2024, 17:26:11 - georgezgeorgez:''' And it sets up the producer and config necessary for a single node testnet
'''Mon, Aug 26, 2024, 17:26:39 - georgezgeorgez:''' It's baked into znnd
And it means that in order to use it, developers have to rebase their changes on top of the branch
'''Mon, Aug 26, 2024, 17:26:45 - georgezgeorgez:''' It would be better if creating a devnet was a separate script
'''Mon, Aug 26, 2024, 17:26:56 - georgezgeorgez:''' Not tied to a specific branch of go-zenon
'''Mon, Aug 26, 2024, 17:27:35 - georgezgeorgez:''' But I think for Operations, we should focus on node operators first
'''Mon, Aug 26, 2024, 17:27:40 - deeznnutz:''' So maybe I can start creating issues in GH for this additional functionality.
'''Mon, Aug 26, 2024, 17:28:32 - georgezgeorgez:''' Yeah it's no problem to define more work
'''Mon, Aug 26, 2024, 17:28:56 - georgezgeorgez:''' We should have a selection of possible things to do
And then work with the users/stakeholders to pick what to do next
'''Mon, Aug 26, 2024, 17:29:02 - deeznnutz:''' We are talking about
- interactive installation menu
- producer flag
- testnet flag
'''Mon, Aug 26, 2024, 17:29:26 - georgezgeorgez:''' Do you have an idea of how a menu would work?
'''Mon, Aug 26, 2024, 17:29:33 - deeznnutz:''' in addition to the things mentioned above to integrate znnd monitoring
'''Mon, Aug 26, 2024, 17:29:42 - georgezgeorgez:''' I think doing it in bash wouldn't be so pretty
'''Mon, Aug 26, 2024, 17:29:49 - deeznnutz:''' <@georgezgeorgez:hc1.chat "Do you have an idea of how a men..."> I know how it wont work... lol
'''Mon, Aug 26, 2024, 17:30:08 - deeznnutz:''' I tried it and could not get one working with an install command with curl.
'''Mon, Aug 26, 2024, 17:30:28 - deeznnutz:''' maybe I just gave up too early
'''Mon, Aug 26, 2024, 17:30:34 - georgezgeorgez:''' https://github.com/charmbracelet/bubbletea
If we do go in the direction of TUIs
'''Mon, Aug 26, 2024, 17:31:11 - deeznnutz:''' that would be awesome
'''Mon, Aug 26, 2024, 17:31:19 - georgezgeorgez:''' deeznnutz: but again, probably not the near term focus
What do you think we should try and have done before next meeting?
'''Mon, Aug 26, 2024, 17:31:59 - deeznnutz:''' my goal is to get the new datasource integrated and a custom znnd dashboard working
'''Mon, Aug 26, 2024, 17:32:06 - deeznnutz
:''' that what I can work on.
'''Mon, Aug 26, 2024, 17:32:21 - georgezgeorgez:''' Cool that's what I was thinking too
Maybe at the very least, wireframe json for the dashboard?
And some poc of the infinity plugin for at least one of the graphs
'''Mon, Aug 26, 2024, 17:32:25 - deeznnutz:''' and pull in your changes
'''Mon, Aug 26, 2024, 17:32:55 - deeznnutz:''' <@georgezgeorgez:hc1.chat "Cool that's what I was thinking ..."> yes that is doable
'''Mon, Aug 26, 2024, 17:33:52 - deeznnutz:''' The TUI framework would be super cool.
'''Mon, Aug 26, 2024, 17:33:53 - georgezgeorgez:''' <@deeznnutz:zenon.chat "and pull in your changes"> If we want to actually test it live, we would need to run an arm64 server
'''Mon, Aug 26, 2024, 17:34:12 - deeznnutz:''' <@georgezgeorgez:hc1.chat "If we want to actually test it l..."> DO does not have them. So I can test on another platform
'''Mon, Aug 26, 2024, 17:34:14 - georgezgeorgez:''' Long term, i think it would make sense for us to have some test scripts that interact with cloud provider apis to spin up nodes etc
'''Mon, Aug 26, 2024, 17:34:35 - georgezgeorgez:''' Run some tests, spit out data, and then tear it down
'''Mon, Aug 26, 2024, 17:34:52 - georgezgeorgez:''' https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
'''Mon, Aug 26, 2024, 17:35:04 - coinselor:''' I can test the arm64 changes
'''Mon, Aug 26, 2024, 17:35:39 - deeznnutz:''' I need to add arm support for the grafana install too
'''Mon, Aug 26, 2024, 17:36:00 - deeznnutz:''' maybe we can all take a look at the TUI framework.
'''Mon, Aug 26, 2024, 17:36:22 - deeznnutz:''' between that and everything else going on I think this is doable in the next 2 weeks
'''Mon, Aug 26, 2024, 17:36:40 - georgezgeorgez:''' A lot of my devnet branch could be carved out.
But let's get a dashboard out first, before we make things pretty
'''Mon, Aug 26, 2024, 17:36:58 - deeznnutz:''' cool. sounds like a plan
'''Mon, Aug 26, 2024, 17:37:04 - georgezgeorgez:''' I can help with the Infinity plugin and grafana dashboard json
'''Mon, Aug 26, 2024, 17:37:09 - georgezgeorgez:''' and whatever else really
'''Mon, Aug 26, 2024, 17:37:35 - deeznnutz:''' maybe you can take on the dashboard after I get the plugin installed and setup
'''Mon, Aug 26, 2024, 17:37:36 - georgezgeorgez:''' And yeah maybe before the next meeting, we can talk with other pillar operators
'''Mon, Aug 26, 2024, 17:38:02 - deeznnutz:''' I have time this week. I'm traveling T-TH next week.
'''Mon, Aug 26, 2024, 17:38:31 - georgezgeorgez:''' When do you think we should meet next?
'''Mon, Aug 26, 2024, 17:38:59 - deeznnutz:''' Sept 9 @ 6PM EST? does that work?
'''Mon, Aug 26, 2024, 17:39:32 - georgezgeorgez:''' Should be good
'''Mon, Aug 26, 2024, 17:39:41 - georgezgeorgez:''' coinselor hbu?
'''Mon, Aug 26, 2024, 17:39:51 - coinselor:''' Ye that works
'''Mon, Aug 26, 2024, 17:40:08 - deeznnutz:''' cool. sounds like a plan. thx everyone!!
'''Mon, Aug 26, 2024, 17:40:21 - georgezgeorgez:''' Anything else you want to go over? Or call it for today?
'''Mon, Aug 26, 2024, 17:40:47 - deeznnutz:''' I'm good. did you see my post on dynamic fusing?
'''Mon, Aug 26, 2024, 17:40:54 - deeznnutz:''' am I retarded?
'''Mon, Aug 26, 2024, 17:41:11 - georgezgeorgez:''' Not sure if either question is within scope of the SIG
'''Mon, Aug 26, 2024, 17:41:16 - georgezgeorgez:''' haha
'''Mon, Aug 26, 2024, 17:41:19 - deeznnutz:''' lol
'''Mon, Aug 26, 2024, 17:41:26 - deeznnutz:''' ya, we can chat about that elswhere
'''Mon, Aug 26, 2024, 17:41:44 - georgezgeorgez:''' Thank you everyone.