My understanding of publish/subscribe is very limited, but does it allow to selectively copy over only some of the data from a table, e.g. only custom_json ops with a certain ID?
It doesn't, but there's a simple answer for this: run a HAF app on the server that just generates tables with the desired data, then publish those tables.
Much more of interest to me is any feedback on the proposed overall approach of creating an API layer made up of a large number of small databases, which get their data from the much smaller number of full HAF servers. Do you have any thoughts on that as an approach?
I've always envisioned some of this approach, but I also suspect you're overestimating the cost of running a HAF server because you're only looking at "full" HAF servers that capture all blockchain data.
We're also developing quite lightweight HAF servers that will only require a few gigabytes of space. That's still not "tiny", of course, but it would be quite easy to use the publish/subscribe features to further reduce the amount of storage required on subscribing databases.
So what I ultimately see happening is people running lightweight HAF servers (i.e. ones that filter operations/transactions they are not interested in) and some amount of replication of the tables of these databases on subscribing databases. To put this into perspective, just filtering out splinterlands and hive engine transactions makes for a very comfortable size database.
For sure. When we wanted to develop the polls protocol using a regular HAF app, the limiting factor wasn't HAF (which we could fill up with only the data we need) but hived. We couldn't run a tiny HAF without hived, so we started looking for some other way. So this is how the idea of the standalone HAF app came about and the reason behind it.
Sounds like a viable way to do it. Just thinking through it for the polls protocol. If we used publish/subscribe, we'd need hived + HAF filled with polls custom_jsons and account data. And then we can subscribe to those particular tables. But we end up still needing hived. Is there any way you can see us avoiding hived in this kind of setup?