The Human Protein Atlas allow you to download very detailed data for
each protein in the form of an xml file, and hpaXmlGet and
hpaXml allow you to retrieve those files automatically from
HPA server and parse them. However, due to technical limitation, you
will not be able to save those "xml_document"/"xml_node"
objects. The question is: How do you keep a version of these files to
use when you are not connected to the internet, or for
reproducibility?
Look at the “Downloadable
data” page from HPA website, you will see how these files are
downloaded. Basically, you add [ensembl_id].xml to
http://www.proteinatlas.org to download individual entries
(that’s what hpaXmlGet does behind the scene), or download
the whole
big set.
From there, you can import the file using
xml2::read_xml(). The output should be exactly the same as
hpaXmlGet.
hpaXml functionsSince the umbrella function hpaXml take either the
ensembl id or the imported xml_document object,
you can feed what you just imported to it and get the expected
result.
You can obviously use other hpaXml functions as
well.
Anh Tran, 2018-2025
Please cite: Tran, A.N., Dussaq, A.M., Kennell, T. et al. HPAanalyze: an R package that facilitates the retrieval and analysis of the Human Protein Atlas data. BMC Bioinformatics 20, 463 (2019) https://doi.org/10.1186/s12859-019-3059-z