How-To: Use of Analytics API for data collection - Solar Panel Inverter

I thought it would be nice to show a bit what I came across when showing my Solar Panel stats in Analytics. Also tried with moderate success the other routes Energy insight app? CSV to Graph and the HTML grabber (which was not succesful).

Intention:
My Solar Panels are connected with an inverter which has a web interface where it shows the current stats. From searching the web I found out best way to get this data was to go to the webserver and get the */js/status.js file. In that file the current stats are included in a string called webData. As a reference the web interface shows the status page as well.

This data must be transformed into a json constuct which will be the input for the fact table in Analytics in a certain dataset.

From the crontab every 10 minutes during the day the inverter is polled.

  • But the solar panels only deliver data when the power level on the panels are not equal to 0 W.

The data is fed into the fact table of Analytics which is a kind of datalake solution where all users feed their data per dataset in two dimensions where (only two out of three are currently available in the interface). The definition of the table in the database reveals some limitations.

MariaDB [nc]> desc oc_analytics_facts ;
+------------+---------------+------+-----+---------+----------------+
| Field      | Type          | Null | Key | Default | Extra          |
+------------+---------------+------+-----+---------+----------------+
| id         | int(11)       | NO   | PRI | NULL    | auto_increment |
| user_id    | varchar(64)   | NO   |     | NULL    |                |
| dataset    | int(11)       | NO   | MUL | NULL    |                |
| dimension1 | varchar(64)   | YES  |     | NULL    |                |
| dimension2 | varchar(64)   | YES  |     | NULL    |                |
| dimension3 | decimal(10,0) | YES  |     | NULL    |                |
| value      | decimal(15,2) | YES  |     | 0.00    |                |
| timestamp  | int(11)       | YES  |     | 0       |                |
+------------+---------------+------+-----+---------+----------------+
  • index not based on dimensions or values
  • dimension1 and dimension2 are varchar
  • timestamp is not a timestamp but an int used as a epoch timestamp. Analytics fills this field.

All logic will be written in a bash script which is run from the crontab of the web user (the same crontab is used by Nextcloud for maintenance).

Step 1: Gather data
I succeeded to put the relevant data in a script variable with roughly this command:

 value_array=$(curl -s --connect-timeout 20 --retry 3 --max-time 120 http://inverter/js/status.js | \
    tr ';\|{' '\n' | grep -i "^var webData.*\"" | tr ',' 'x' | \
    sed -e 's/^var webData=.* x1500x//;s/x"//')

This will give me a ‘x’ delimited line with [‘Power Level’x’Yield Today’x’All Yield’x’Error Code’x’Data Age’]

value_array=34x345x543210xF03x4

I used an “x” delimited line instead of comma separated because in the json file I need comma’s. Originally I used sed to convert the string to json by replacing the x with fixed text. Later I chose a different way as shown below.

The data however are all integers. When I compare the given data with the status page from the inverter, I see that the “Yield Today” is expressed in [kWh] but the number given has to be divided by 100. The “All Yield” must be divided by 10 to express it in [kWh]. Error Code and Data Age are irrelevant (F03x4). After gathering data I kind of figured out that the (maximum) interval of Data Age is approx 10 minutes, which I used to schedule the polling.

Sidenote, the string Error Code (with a value like F03 when the system sleeps) cannot be stored in Value since it is a string and not a float.

Step2: Transform data to json
For the fact table we need to convert the dataset into something like this:

{"data":[
{"dimension1":"Power Level [W]","dimension2":"%now%","value":"(A)"},
{"dimension1":"Yield Today [kWh]","dimension2":"%now%","value":"(B)"},
{"dimension1":"All Yield [kWh]","dimension2":"%now%","value":"(C)"}
]}

Now these 3 Aspects will be available in the dataset:

  • Value (A) is the first element from the string in our example 34, but in case it is 0 we do not need to make a json message at all (except once a day).
  • Value (B) is the second element from the string which is 345 (${part2}), but it has to be divided by 100 to express it in kWh like 3.45. So we call bc to divide it:

part_2 = $(bc <<< $(echo “scale=2;${part2}/100”))

  • Value (C) is the third element from the string 543210 (${part3}), but it has to be divided by 10 to express it in kWh as 54321.0. So we call bc to divide it:

part_3 = $(bc <<< $(echo “scale=1;${part3}/10”))

  • The rest of the json file is actually rather plain text and can be inserted by replacing “^” or “x” in the string (depending on position). Or just creating a fixed text string inserting the variable parts.

With this json construction we need to post the data to the API. Basically all kind of data can be fed to the fact table with this construction. By using %now% as date/time (dimension2) we can create timelines from the dataset aspects using a filter on the aspect.

Step 3: Setup DataSet and REST API endpoint
In Analytics choose “Dataset Maintenance” and press “+ New Dataset”, I named the dataset Solar Panel and left the column names as is. Note: now we can also use these names instead of the technical table field names in the json file - did not verify that.
After the Dataset is created, it is assigned an dataset ID (linked to the field dataset in the fact table).
Click on the icon next to the dataset to enter the “Advanced configuration”. Here we can review the Dataset characteristics (first tab), input data (second tab) manually, by importing some data from file or clipboard, but we are looking for the REST API. Clicking it will show a clipboard with a note:

Use this endpoint to submit data via an API:
/nextcloud/index.php/apps/analytics/api/3.0/data/ID/add

This is the URL where the json construct needs to be posted to.

Step 4: Data collection
As said now the datacollection must take place based on a crontab script (initiated from the cloudserver itself, but outside Nextcloud) and the data needs to flow into the fact table on regular basis (but only when relevant) into the API endpoint.

Now curl can do the job, but there is a caveat or two.
First: curl needs to post data using your account: leaving the credentials lying somewhere in a crontab or bash script.
Second: I used the –data argument to post the actual data as output from a Bash call. But because of the way curl handles this argument (as a memory built construction based on script variables, including quotes and other special characters to Bash), I kept getting a

{“success”:true,“message”:“No -data- parameter”}

even with a imho correct json. This is caused by curl transforming the json instead of using it literally. Some web-search pointed to this solution:

curl -u <USER>:<PASSWORD> -s  -H "Content-Type: application/json" -H "Accept: application/json" \
     --data "$(preparepostdata)" \
     https://mycloudserver/nextcloud/index.php/apps/analytics/api/3.0/data/<ID>/add

The –data part now is a quoted call to function preparepostdata in the script. This function contains:

preparepostdata()
{       # convert value_array into json message (with conversion from integer to float)
        # has to be done in a file-like fashion to avoid inability to parse data on commandline of cURL.
cat <<EOF
{"data":[
  {"dimension1":"Power Level [W]","dimension2":"%now%","value":"${part_1}"},
  {"dimension1":"Yield Today [kWh]","dimension2":"%now%","value":"${part_2}"},
  {"dimension1":"All Yield [kWh]","dimension2":"%now%","value":"${part_3}"}
]}
EOF
}

Initially I used a -f translated into a ${force} variable. So I chose only to send data when the power level ${part_1} had a non zero value or when the script is called with the “-f” parameter, allowing it to also send the day’s end from the crontab.
But when plotting the graph I observed:

  1. The power graph starts with a the first non zero value and ends with a non zero value. Between those two points a straight line is drawn which suggest there was power where there is not. To have a sane graph I would need one point with 0 Watt before the start of the new production and after the last new data input.
  2. The “Daily Yield [kWh]” is a daily accumulating value which is reset midnight. The steepness of this curve is related to the Power level, so after production it becomes level until midnight.
  3. The "Daily Yield [kWh] is added to the “All Yield [kWh]”
  4. The data scale of the “All Yield [kWh]” is different from the rest, making it hard to read in one graph.

I cannot know when the first datapoint will be recorded to add one “empty value” just before that nor what the last relevant measurement is to add an extra “empty value”. So I now choose to send all data points even when nothing changes. Now the crontab has:

# Call the Inverter Poll Script every 10 minutes
*/10 * * * * /usr/bin/bash -c /var/lib/wwwrun/json.post.solar

Now I am waiting for enough datapoints and some knowhow on how to build nice graphs based on the dataset using filters on dimension1 and conditions on value… But that would be another post.

1 Like