This thesis describes the development of tools for the integration and analysis of data for complete protein superfamilies. These data and tools are applied to facilitate protein engineering as well as diagnostics of human protein mutations. A huge amount of protein related data is publicly available online, but it remains difficult to make sense out of all this data due to it being scattered across different resources and the difficulty to relate heterogeneous datasets to each other. The work in this thesis describes the effort to extract and connect this data for whole families of proteins. With the integrated data we can learn about these proteins to gain insight and help develop more efficient enzymes. We can also apply this data to train machine learning models to help distinguish between disease causing- and natural genetic variation in human proteins.