Skip to content

Chia Harvesters Best Practices


Introduction

Chia users need to perform real-time monitoring of multiple hosts' Harvesters, analyzing in detail the availability, status, and profits of different Harvesters, continuously enhancing Chia users' control over Harvesters. Dataflux adopts active and passive monitoring, a global anomaly detection module, truly aligning with the needs of Chia users, building a unified Chia business management platform, allowing Chia users to have a global intuitive grasp of Harvester operating states, improving output efficiency, and facilitating the troubleshooting of Harvester failures or issues of insufficient expected profits. It also provides expert-level Chia performance services and on-site support services, helping Chia users achieve higher profits. The key areas that need to be monitored are:

  • Harvesters

  • CPUs

  • Memory
  • Disk
  • Network

Use Cases

image.png

Built-in Views

Disk

image.png

Prerequisites

DataKit is installed (DataKit Installation Documentation)

Configuration

Log Collection

This article uses the Windows client as an example (Linux works similarly).

Create chia farm log collection script

Git is already installed (Git Installation Reference)

Enter the chia-blockchain directory (C:\Users{Your User}\AppData\Local\chia-blockchain\app-{Your Chia Version}\resources\app.asar.unpacked\daemon) and create a log collection script farmer.sh, saving the chia client farm information into farmer.log.

Please first create a data_collect folder in C:\Users{Your User}\AppData\Local\chia-blockchain\app-{Your Chia Version}\

#!/bin/bash

while true; do
    sleep 2
    ./chia.exe farm summary  | awk '{line=line "," $0} NR%10==0{print substr(line,1); line=""}' >> ../../../data_collect/farmer.log
done

DataKit Pipeline Configuration

Enter the pipeline directory under the DataKit installation directory and create farm_log.p, chai_debug_log.p respectively to complete the segmentation work of the collected logs. Example:

chia_debug_log.p

#chia_debug_log
#Name   Description
#strptime($timestamp, "2006-01-02T15:04:05") 
#total_plots    Plot quantities
#eligible_plots Passed preliminary screening Plot quantities
#proofs_found   Burst block quantities
#check_duration Query time

# Log Example:
#2021-04-24T11:01:53.390 harvester chia.harvester.harvester: INFO     1 plots were eligible for farming 940b588c2a... Found 0 proofs. Time: 0.98087 s. Total 19 plots


grok(_, "%{TIMESTAMP_ISO8601:strptime} harvester chia.harvester.harvester: INFO \\s+ %{NUMBER:eligible_plots} plots were eligible for farming \\w+\\.\\.\\. Found %{NUMBER:proofs_found} proofs\\. Time: %{NUMBER:check_duration} s\\. Total %{NUMBER:total_plots} plots.*")

# Convert numerical types
cast(eligible_plots, "float")
cast(proofs_found, "float")
cast(check_duration, "float")
cast(total_plots, "float")

# Drop original content
drop_origin_data()

farm_log.p

#farm_log
#Name   Description
#farming_status farmer status
#xch_count  XCH mined
#last_farm_height   Farm height
#total_plot Cultivated quantity
#total_plots_size_GiB   Cultivated size
#total_plots_size_TiB   Cultivated size
#total_plots_size_PiB   
#network_space  Total network power PiB

#Log Example:
#,Farming status: Farming,Total chia farmed: 0.0,User transaction fees: 0.0,Block rewards: 0.0,Last height farmed: 0,Plot count: 137,Total size of plots: 13.561 TiB,Estimated network space: 12457.266 PiB,Expected time to win: 5 months and 4 weeks,Note: log into your key using 'chia wallet show' to see rewards for each key

grok(_, ",Farming status: %{WORD:farming_status},Total chia farmed: %{NUMBER:xch_count},User transaction fees: %{NUMBER},Block rewards: %{NUMBER},Last height farmed: %{NUMBER:last_farm_height},Plot count: %{NUMBER:total_plot},Total size of plots: %{NUMBER:total_plots_size_GiB} GiB,Estimated network space: %{NUMBER:network_space_PiB} PiB.*")
grok(_, ",Farming status: %{WORD:farming_status},Total chia farmed: %{NUMBER:xch_count},User transaction fees: %{NUMBER},Block rewards: %{NUMBER},Last height farmed: %{NUMBER:last_farm_height},Plot count: %{NUMBER:total_plot},Total size of plots: %{NUMBER:total_plots_size_TiB} TiB,Estimated network space: %{NUMBER:network_space_PiB} PiB.*")
grok(_, ",Farming status: %{WORD:farming_status},Total chia farmed: %{NUMBER:xch_count},User transaction fees: %{NUMBER},Block rewards: %{NUMBER},Last height farmed: %{NUMBER:last_farm_height},Plot count: %{NUMBER:total_plot},Total size of plots: %{NUMBER:total_plots_size_PiB} PiB,Estimated network space: %{NUMBER:network_space_PiB} PiB.*")

#Convert numerical types
cast(farming_status, "str")
cast(xch_count, "float")
cast(last_farm_height, "float")
cast(total_plot, "float")
cast(total_plots_size_GiB, "float")
cast(total_plots_size_TiB, "float")
cast(total_plots_size_PiB, "float")
cast(network_space_PiB, "float")

#Drop original content
drop_origin_data()

DataKit Log Collection Configuration

Enter the conf.d/log directory under the DataKit installation directory, copy logging.conf.sample and rename it to logging.conf. Example:

Note to change both logfiles directories to the location of your chia log files.

[[inputs.logging]]
    # required, glob logfiles
    logfiles = ['''C:\Users\Administrator\.chia\mainnet\log\debug.*''']

    # glob filter
    ignore = [""]

    # your logging source, if it's empty, use 'default'
    source = "chia_harvester"

    # add service tag, if it's empty, use $source.
    service = ""

    # grok pipeline script path
    pipeline = "chia_debug_log.p"

    # optional status:
    #   "emerg","alert","critical","error","warning","info","debug","OK"
    ignore_status = []

    # optional encodings:
    #    "utf-8", "utf-16le", "utf-16le", "gbk", "gb18030" or ""
    character_encoding = ""

    # The pattern should be a regexp. Note the use of '''this regexp'''
    # regexp link: https://golang.org/pkg/regexp/syntax/#hdr-Syntax
    match = '''^\S'''

    [inputs.logging.tags]
    # tags1 = "value1"

[[inputs.logging]]
    # required, glob logfiles
    logfiles = ['''C:\Users\Administrator\AppData\Local\chia-blockchain\app-1.1.6\data_collect\farmer.log''']

    # glob filter
    ignore = [""]

    # your logging source, if it's empty, use 'default'
    source = "chia_farmer"

    # add service tag, if it's empty, use $source.
    service = ""

    # grok pipeline script path
    pipeline = "farm_log.p"

    # optional status:
    #   "emerg","alert","critical","error","warning","info","debug","OK"
    ignore_status = []

    # optional encodings:
    #    "utf-8", "utf-16le", "utf-16le", "gbk", "gb18030" or ""
    character_encoding = ""

    # The pattern should be a regexp. Note the use of '''this regexp'''
    # regexp link: https://golang.org/pkg/regexp/syntax/#hdr-Syntax
    match = '''^\S'''

    [inputs.logging.tags]
    # tags1 = "value1"

Restart DataKit to take effect

Monitoring Metrics Description

1 Harvesters

Perform real-time monitoring of Harvesters across multiple hosts, analyzing in detail the availability, status, and profits of different Harvesters, continuously enhancing Chia users' control over Harvesters.

image.png

Metric Description Name Measurement Standard
Total Network Power chia_farmer.network_space None
Personal Power chia_farmer.total_plots_size None
Farm Height chia_farmer.last_farm_height None
Daily Expected Profit chia_harvester.expected_xch None
Challenges Searched chia_harvester.count_eligible None
Passed Preliminary Screening Quantity chia_harvester.eligible_plots None
Burst Block Quantity chia_harvester.proofs_found Performance Metric
Average Challenge Query Duration chia_harvester.check_duration Performance Metric
XCH Profit chia_farmer.xch_count None

Daily Expected Profit

Ensure stable plot growth daily to stabilize your share of computing power, ensuring that your daily expected profit does not fluctuate drastically. If you notice rapid changes in your daily expected profit, it may indicate that a Harvester node has gone offline or that the total network power has grown too fast, causing your share of computing power to drop too low. It is recommended to always monitor the daily expected profit and set alerts to ensure stable income.

Harvester Preliminary Screening Pass Rate

To ensure stable income, please monitor the number of Harvesters passing the preliminary screening. Normally, when the network state and disk state are healthy, the pass rate of Harvester preliminary screening will remain stable within a certain range without drastic fluctuations. When you find that the pass rate of Harvester preliminary screening has changed drastically, please check the disk and network states promptly to ensure stable income.

Average Challenge Query Duration

Please pay attention to the average challenge query duration of Harvesters and set alerts to ensure that each Harvester's query does not exceed 5 seconds. If you find that it exceeds 5 seconds, please check the network information or disk information promptly to ensure stable income.

2 CPU Monitoring

CPU monitoring helps analyze CPU load peaks and identify excessive CPU usage. You can improve CPU capacity or reduce load through CPU monitoring metrics, find potential problems, and avoid unnecessary upgrade costs. CPU monitoring metrics can also help you identify unnecessary background processes running and determine the resource utilization of processes or applications and their impact on the entire network.

image.png

Metric Description Name Measurement Standard
CPU Load system.load1
systeml.load5
system.load15
Resource Utilization
CPU Usage cpu.usage_idle
cpu.usage_user
cpu.usage_system
Resource Utilization

CPU Usage

CPU usage can be divided into: User Time(percentage of time executing user processes); System Time(percentage of time executing kernel processes and interrupts); Idle Time(percentage of time the CPU is in Idle state). For CPU performance, first, the run queue for each CPU should not exceed 3, secondly, if the CPU is fully loaded, User Time should be around 65%~70%, System Time should be around 30%~35%, and Idle Time should be around 0%~5%.

3 Memory Monitoring

Memory is one of the main factors affecting Linux performance. The adequacy of memory resources directly affects the usage performance of application systems.

image.png

Metric Description Name Measurement Standard
Memory Usage Percentage mem.used_percent Resource Utilization
Memory Usage mem.free
mem.used
Resource Utilization
Memory Cache mem.buffered Resource Utilization
Memory Buffer mem.cached Resource Utilization

Memory Usage

It is important to closely monitor the usage of available memory because contention for RAM will inevitably lead to paging and performance degradation. To keep the machine running smoothly, ensure it has enough RAM to meet your workload. Persistent low memory availability can lead to segmentation faults and other serious issues. Remedies include increasing the physical memory capacity in the system, or enabling memory page merging if possible.

4 Disk Monitoring

image.png

Metric Description Name Measurement Standard
Disk Health Status disk.health
disk.pre_fail
Availability
Disk Space disk.free
disk.used
Resource Utilization
Disk Inode disk.inodes_free
disk.inodes_used
Resource Utilization
Disk Read/Write diskio.read_bytes
diskio.write_bytes
Resource Utilization
Disk Temperature disk.temperature Availability
Disk Model disk.device_model Basic Information
Disk Read/Write Time diskio.read_time
disk.io.write_time
Resource Utilization

Disk Space

For any operating system, maintaining sufficient free disk space is necessary. Besides regular disk processes, core system processes store logs and other types of data on the disk. You can configure alerts to remind you when your available disk space drops below 15%, ensuring continuous business operations.

Disk Read/Write Time

These metrics track the average time spent on disk read/write operations. A value greater than 50 milliseconds indicates relatively high latency (with less than 10 milliseconds being optimal), usually suggesting that business jobs should be transferred to faster disks to reduce latency. You can set different alert thresholds based on the server role, as acceptable thresholds vary by role.

Disk Read/Write

If your server hosts demanding applications, you will want to monitor disk I/O rates. Disk read/write metrics aggregate read (diskio.read_bytes) and write (diskio.write_bytes) activities marked by the disk. Persistent high disk activity can lead to service degradation and system instability, especially when combined with high RAM and page file usage. When experiencing high disk activity, consider increasing the number of disks used (especially when you see a large number of operations in the queue), using faster disks, increasing RAM reserved for file system caching, or distributing the workload to more machines if possible.

Disk Temperature

If your business requires very high disk availability, you can set alerts to monitor the working temperature of disks continuously. Temperatures above 65°C (or 75°C for SSDs) deserve attention. If your hard drive overheating protection or temperature control mechanism fails, further increases in temperature could damage the hard drive and result in data loss.

5 Network Monitoring

Your applications and infrastructure components depend on increasingly complex architectures, whether you are running monolithic applications or microservices, deploying to cloud infrastructure, private data centers, or both. Virtualized infrastructure allows developers to respond at any scale and creates dynamic network patterns that don't quite match traditional network monitoring tools. To provide visibility into every component in your environment and all connections between them, Datadog introduces network performance monitoring for the cloud era.

image.png

Metric Description Name Measurement Standard
Network Traffic net.bytes_recv
net.bytes_sent
Resource Utilization
Network Packets net.packets_recv
net.packets_sent
Resource Utilization
Retransmissions net.tcp_retranssegs Availability

Network Traffic

Together, these two metrics measure the total network throughput of a given network interface. For most consumer hardware, the NIC transmission speed is at least 1 GB per second or higher, so except in extreme cases, the network is unlikely to become a bottleneck in all situations. You can set alerts to remind you when the interface bandwidth is utilized more than 80%, preventing network saturation (for a 1 Gbps link, this would mean reaching 100 megabytes per second).

Retransmissions

TCP retransmissions happen frequently but are not errors, though their presence may indicate signs of problems. Retransmissions are typically the result of network congestion and often correlate with high bandwidth consumption. You should monitor this metric because excessive retransmissions can cause significant application delays. If the sender does not receive confirmation of the sent packets, it will delay sending more packets (usually lasting about one second), thereby increasing latency related to congestion speeds.

If not caused by network congestion, the source of retransmissions could be faulty network hardware. Few dropped packets and a high retransmission rate might lead to excessive buffering. Regardless of the cause, you should track this metric to understand seemingly random fluctuations in network application response times.

Conclusion

In this article, we mentioned some of the most useful metrics you can monitor while mining to retain labels. If you are conducting mining operations, monitoring the metrics listed below will give you a good understanding of the mine's operational health and availability:

  • Disk Read/Write Latency
  • Disk Temperature
  • Network Traffic
  • Daily Expected Profit
  • Harvester Preliminary Screening Pass Rate
  • Processes

Ultimately, you will recognize other metrics particularly relevant to your own use case. Of course, you can also learn more through Guance.

Feedback

Is this page helpful? ×