Chia Harvesters Best Practices¶

Introduction¶

Chia users need to perform real-time monitoring of multiple hosts' Harvesters, analyzing in detail the availability, status, and profits of different Harvesters, continuously enhancing Chia users' control over Harvesters. Dataflux adopts active and passive monitoring, a global anomaly detection module, truly aligning with the needs of Chia users, building a unified Chia business management platform, allowing Chia users to have a global intuitive grasp of Harvester operating states, improving output efficiency, and facilitating the troubleshooting of Harvester failures or issues of insufficient expected profits. It also provides expert-level Chia performance services and on-site support services, helping Chia users achieve higher profits. The key areas that need to be monitored are:

Harvesters
CPUs
Memory
Disk
Network

Use Cases¶

Built-in Views¶

Disk¶

Prerequisites¶

DataKit is installed (DataKit Installation Documentation)

Configuration¶

Log Collection¶

This article uses the Windows client as an example (Linux works similarly).

Create chia farm log collection script¶

Git is already installed (Git Installation Reference)

Enter the chia-blockchain directory (C:\Users{Your User}\AppData\Local\chia-blockchain\app-{Your Chia Version}\resources\app.asar.unpacked\daemon) and create a log collection script farmer.sh, saving the chia client farm information into farmer.log.

Please first create a data_collect folder in C:\Users{Your User}\AppData\Local\chia-blockchain\app-{Your Chia Version}\

#!/bin/bash

while true; do
    sleep 2
    ./chia.exe farm summary  | awk '{line=line "," $0} NR%10==0{print substr(line,1); line=""}' >> ../../../data_collect/farmer.log
done

DataKit Pipeline Configuration¶

Enter the pipeline directory under the DataKit installation directory and create farm_log.p, chai_debug_log.p respectively to complete the segmentation work of the collected logs. Example:

chia_debug_log.p¶

#chia_debug_log
#Name   Description
#strptime($timestamp, "2006-01-02T15:04:05") 
#total_plots    Plot quantities
#eligible_plots Passed preliminary screening Plot quantities
#proofs_found   Burst block quantities
#check_duration Query time

# Log Example:
#2021-04-24T11:01:53.390 harvester chia.harvester.harvester: INFO     1 plots were eligible for farming 940b588c2a... Found 0 proofs. Time: 0.98087 s. Total 19 plots


grok(_, "%{TIMESTAMP_ISO8601:strptime} harvester chia.harvester.harvester: INFO \\s+ %{NUMBER:eligible_plots} plots were eligible for farming \\w+\\.\\.\\. Found %{NUMBER:proofs_found} proofs\\. Time: %{NUMBER:check_duration} s\\. Total %{NUMBER:total_plots} plots.*")

# Convert numerical types
cast(eligible_plots, "float")
cast(proofs_found, "float")
cast(check_duration, "float")
cast(total_plots, "float")

# Drop original content
drop_origin_data()

farm_log.p¶

#farm_log
#Name   Description
#farming_status farmer status
#xch_count  XCH mined
#last_farm_height   Farm height
#total_plot Cultivated quantity
#total_plots_size_GiB   Cultivated size
#total_plots_size_TiB   Cultivated size
#total_plots_size_PiB   
#network_space  Total network power PiB

#Log Example:
#,Farming status: Farming,Total chia farmed: 0.0,User transaction fees: 0.0,Block rewards: 0.0,Last height farmed: 0,Plot count: 137,Total size of plots: 13.561 TiB,Estimated network space: 12457.266 PiB,Expected time to win: 5 months and 4 weeks,Note: log into your key using 'chia wallet show' to see rewards for each key

grok(_, ",Farming status: %{WORD:farming_status},Total chia farmed: %{NUMBER:xch_count},User transaction fees: %{NUMBER},Block rewards: %{NUMBER},Last height farmed: %{NUMBER:last_farm_height},Plot count: %{NUMBER:total_plot},Total size of plots: %{NUMBER:total_plots_size_GiB} GiB,Estimated network space: %{NUMBER:network_space_PiB} PiB.*")
grok(_, ",Farming status: %{WORD:farming_status},Total chia farmed: %{NUMBER:xch_count},User transaction fees: %{NUMBER},Block rewards: %{NUMBER},Last height farmed: %{NUMBER:last_farm_height},Plot count: %{NUMBER:total_plot},Total size of plots: %{NUMBER:total_plots_size_TiB} TiB,Estimated network space: %{NUMBER:network_space_PiB} PiB.*")
grok(_, ",Farming status: %{WORD:farming_status},Total chia farmed: %{NUMBER:xch_count},User transaction fees: %{NUMBER},Block rewards: %{NUMBER},Last height farmed: %{NUMBER:last_farm_height},Plot count: %{NUMBER:total_plot},Total size of plots: %{NUMBER:total_plots_size_PiB} PiB,Estimated network space: %{NUMBER:network_space_PiB} PiB.*")

#Convert numerical types
cast(farming_status, "str")
cast(xch_count, "float")
cast(last_farm_height, "float")
cast(total_plot, "float")
cast(total_plots_size_GiB, "float")
cast(total_plots_size_TiB, "float")
cast(total_plots_size_PiB, "float")
cast(network_space_PiB, "float")

#Drop original content
drop_origin_data()

DataKit Log Collection Configuration¶

Enter the conf.d/log directory under the DataKit installation directory, copy logging.conf.sample and rename it to logging.conf. Example:

Note to change both logfiles directories to the location of your chia log files.

[[inputs.logging]]
    # required, glob logfiles
    logfiles = ['''C:\Users\Administrator\.chia\mainnet\log\debug.*''']

    # glob filter
    ignore = [""]

    # your logging source, if it's empty, use 'default'
    source = "chia_harvester"

    # add service tag, if it's empty, use $source.
    service = ""

    # grok pipeline script path
    pipeline = "chia_debug_log.p"

    # optional status:
    #   "emerg","alert","critical","error","warning","info","debug","OK"
    ignore_status = []

    # optional encodings:
    #    "utf-8", "utf-16le", "utf-16le", "gbk", "gb18030" or ""
    character_encoding = ""

    # The pattern should be a regexp. Note the use of '''this regexp'''
    # regexp link: https://golang.org/pkg/regexp/syntax/#hdr-Syntax
    match = '''^\S'''

    [inputs.logging.tags]
    # tags1 = "value1"

[[inputs.logging]]
    # required, glob logfiles
    logfiles = ['''C:\Users\Administrator\AppData\Local\chia-blockchain\app-1.1.6\data_collect\farmer.log''']

    # glob filter
    ignore = [""]

    # your logging source, if it's empty, use 'default'
    source = "chia_farmer"

    # add service tag, if it's empty, use $source.
    service = ""

    # grok pipeline script path
    pipeline = "farm_log.p"

    # optional status:
    #   "emerg","alert","critical","error","warning","info","debug","OK"
    ignore_status = []

    # optional encodings:
    #    "utf-8", "utf-16le", "utf-16le", "gbk", "gb18030" or ""
    character_encoding = ""

    # The pattern should be a regexp. Note the use of '''this regexp'''
    # regexp link: https://golang.org/pkg/regexp/syntax/#hdr-Syntax
    match = '''^\S'''

    [inputs.logging.tags]
    # tags1 = "value1"

Restart DataKit to take effect¶

Monitoring Metrics Description¶

1 Harvesters¶

Perform real-time monitoring of Harvesters across multiple hosts, analyzing in detail the availability, status, and profits of different Harvesters, continuously enhancing Chia users' control over Harvesters.

Metric Description	Name	Measurement Standard
Total Network Power	`chia_farmer.network_space`	None
Personal Power	`chia_farmer.total_plots_size`	None
Farm Height	`chia_farmer.last_farm_height`	None
Daily Expected Profit	`chia_harvester.expected_xch`	None
Challenges Searched	`chia_harvester.count_eligible`	None
Passed Preliminary Screening Quantity	`chia_harvester.eligible_plots`	None
Burst Block Quantity	`chia_harvester.proofs_found`	Performance Metric
Average Challenge Query Duration	`chia_harvester.check_duration`	Performance Metric
XCH Profit	`chia_farmer.xch_count`	None

Daily Expected Profit¶

Ensure stable plot growth daily to stabilize your share of computing power, ensuring that your daily expected profit does not fluctuate drastically. If you notice rapid changes in your daily expected profit, it may indicate that a Harvester node has gone offline or that the total network power has grown too fast, causing your share of computing power to drop too low. It is recommended to always monitor the daily expected profit and set alerts to ensure stable income.

Harvester Preliminary Screening Pass Rate¶

To ensure stable income, please monitor the number of Harvesters passing the preliminary screening. Normally, when the network state and disk state are healthy, the pass rate of Harvester preliminary screening will remain stable within a certain range without drastic fluctuations. When you find that the pass rate of Harvester preliminary screening has changed drastically, please check the disk and network states promptly to ensure stable income.

Average Challenge Query Duration¶

Please pay attention to the average challenge query duration of Harvesters and set alerts to ensure that each Harvester's query does not exceed 5 seconds. If you find that it exceeds 5 seconds, please check the network information or disk information promptly to ensure stable income.

2 CPU Monitoring¶

CPU monitoring helps analyze CPU load peaks and identify excessive CPU usage. You can improve CPU capacity or reduce load through CPU monitoring metrics, find potential problems, and avoid unnecessary upgrade costs. CPU monitoring metrics can also help you identify unnecessary background processes running and determine the resource utilization of processes or applications and their impact on the entire network.

Metric Description	Name	Measurement Standard
CPU Load	`system.load1` `systeml.load5` `system.load15`	Resource Utilization
CPU Usage	`cpu.usage_idle` `cpu.usage_user` `cpu.usage_system`	Resource Utilization

CPU Usage¶

CPU usage can be divided into: User Time(percentage of time executing user processes); System Time(percentage of time executing kernel processes and interrupts); Idle Time(percentage of time the CPU is in Idle state). For CPU performance, first, the run queue for each CPU should not exceed 3, secondly, if the CPU is fully loaded, User Time should be around 65%~70%, System Time should be around 30%~35%, and Idle Time should be around 0%~5%.

3 Memory Monitoring¶

Memory is one of the main factors affecting Linux performance. The adequacy of memory resources directly affects the usage performance of application systems.

Metric Description	Name	Measurement Standard
Memory Usage Percentage	`mem.used_percent`	Resource Utilization
Memory Usage	mem.free `mem.used`	Resource Utilization
Memory Cache	`mem.buffered`	Resource Utilization
Memory Buffer	`mem.cached`	Resource Utilization

Memory Usage¶

It is important to closely monitor the usage of available memory because contention for RAM will inevitably lead to paging and performance degradation. To keep the machine running smoothly, ensure it has enough RAM to meet your workload. Persistent low memory availability can lead to segmentation faults and other serious issues. Remedies include increasing the physical memory capacity in the system, or enabling memory page merging if possible.

4 Disk Monitoring¶

Metric Description	Name	Measurement Standard
Disk Health Status	`disk.health` `disk.pre_fail`	Availability
Disk Space	`disk.free` `disk.used`	Resource Utilization
Disk Inode	`disk.inodes_free` `disk.inodes_used`	Resource Utilization
Disk Read/Write	`diskio.read_bytes` `diskio.write_bytes`	Resource Utilization
Disk Temperature	`disk.temperature`	Availability
Disk Model	`disk.device_model`	Basic Information
Disk Read/Write Time	`diskio.read_time` `disk.io.write_time`	Resource Utilization

Disk Space¶

For any operating system, maintaining sufficient free disk space is necessary. Besides regular disk processes, core system processes store logs and other types of data on the disk. You can configure alerts to remind you when your available disk space drops below 15%, ensuring continuous business operations.

Disk Read/Write Time¶

These metrics track the average time spent on disk read/write operations. A value greater than 50 milliseconds indicates relatively high latency (with less than 10 milliseconds being optimal), usually suggesting that business jobs should be transferred to faster disks to reduce latency. You can set different alert thresholds based on the server role, as acceptable thresholds vary by role.

Disk Read/Write¶

If your server hosts demanding applications, you will want to monitor disk I/O rates. Disk read/write metrics aggregate read (diskio.read_bytes) and write (diskio.write_bytes) activities marked by the disk. Persistent high disk activity can lead to service degradation and system instability, especially when combined with high RAM and page file usage. When experiencing high disk activity, consider increasing the number of disks used (especially when you see a large number of operations in the queue), using faster disks, increasing RAM reserved for file system caching, or distributing the workload to more machines if possible.

Disk Temperature¶

If your business requires very high disk availability, you can set alerts to monitor the working temperature of disks continuously. Temperatures above 65°C (or 75°C for SSDs) deserve attention. If your hard drive overheating protection or temperature control mechanism fails, further increases in temperature could damage the hard drive and result in data loss.

5 Network Monitoring¶

Your applications and infrastructure components depend on increasingly complex architectures, whether you are running monolithic applications or microservices, deploying to cloud infrastructure, private data centers, or both. Virtualized infrastructure allows developers to respond at any scale and creates dynamic network patterns that don't quite match traditional network monitoring tools. To provide visibility into every component in your environment and all connections between them, Datadog introduces network performance monitoring for the cloud era.

Metric Description	Name	Measurement Standard
Network Traffic	`net.bytes_recv` `net.bytes_sent`	Resource Utilization
Network Packets	`net.packets_recv` `net.packets_sent`	Resource Utilization
Retransmissions	`net.tcp_retranssegs`	Availability

Network Traffic¶

Together, these two metrics measure the total network throughput of a given network interface. For most consumer hardware, the NIC transmission speed is at least 1 GB per second or higher, so except in extreme cases, the network is unlikely to become a bottleneck in all situations. You can set alerts to remind you when the interface bandwidth is utilized more than 80%, preventing network saturation (for a 1 Gbps link, this would mean reaching 100 megabytes per second).

Retransmissions¶

TCP retransmissions happen frequently but are not errors, though their presence may indicate signs of problems. Retransmissions are typically the result of network congestion and often correlate with high bandwidth consumption. You should monitor this metric because excessive retransmissions can cause significant application delays. If the sender does not receive confirmation of the sent packets, it will delay sending more packets (usually lasting about one second), thereby increasing latency related to congestion speeds.

If not caused by network congestion, the source of retransmissions could be faulty network hardware. Few dropped packets and a high retransmission rate might lead to excessive buffering. Regardless of the cause, you should track this metric to understand seemingly random fluctuations in network application response times.

Conclusion¶

In this article, we mentioned some of the most useful metrics you can monitor while mining to retain labels. If you are conducting mining operations, monitoring the metrics listed below will give you a good understanding of the mine's operational health and availability:

Disk Read/Write Latency
Disk Temperature
Network Traffic
Daily Expected Profit
Harvester Preliminary Screening Pass Rate
Processes

Ultimately, you will recognize other metrics particularly relevant to your own use case. Of course, you can also learn more through Guance.