You are viewing a single comment's thread from:

RE: Text analytics reveal thirty two percent of comments on hive are not unique and at least ten percent add no value to discussion

Good idea, as it will be a lot for it to process along with everything already being processed, through the mass transactions the chain is going through daily.

PowerBI is so useful, I haven't dealt with it much and mostly used Excel but PowerBI is on my list for learning.

Distributing it would be good for people to see the stats, same as hosting on Github and displaying on a web interface. Keen to see more posts on this, as you feel the need to share progress for anyone else that is interested.

Sort:  

PBI is useful, but can be a royal pain to use. I think for this, I'll have a data pipeline something like...

Each week, grab the comments table for the most recent period. (sql)
Append it to the existing data. (python)
Complete my feature engineering and analysis (pbi)
Join it to any other relevant data (python)
Identify any new edge cases. (my brain)
Run the analytical compute. (probably python)
Update table, refresh dashboard, publish interesting bits on hive. (not sure on what tech i'll use)