Using the API
The API provides two primary endpoints for discovering, accessing and analyzing data, they share the same query logic, thus the same query object can be used to retrieve results and statistics.
Building a Query
The API allows users to build custom queries using the Mongoengine logic:
Fields:
Nested fields can be accessed by using a .
metadata.assembly_info.assembly_level
--> assembly level field of the assemblies data model
Queries:
__exists
: whether a field exists in some object of a data model__gt, __lt, __gte, __lte
: retrieve a list of objects within a range of values, useful with dates and numbers:
/api/assemblies?metadata.assembly_info.release_date__lte=2024/12/01&metadata.assembly_info.release_date__gte=2022/12/01&metadata.assembly_stats.contig_N50__exists&metadata.assembly_info.assembly_level=Chromosome
This query will retrieve all the assemblies released between the 2024/12/01 and 2022/12/01 that have a contig_N50 field in the nested field assembly_stats and with assembly_level Chromosome
Endpoints
/<data_model>
This endpoint retrieves all data related to the specified data model in either JSON or TSV format:
/api/organisms
/api/biosamples
/api/assemblies
/api/local_samples
/api/annotations
/api/experiments
/stats/<data_model>/<field_to_query>
This endpoint returns value counts for a specified field within the given data model:
/api/stats/organisms/insdc_status
/api/stats/biosamples/metadata.lifestage
/api/stats/assemblies/metadata.assembly_info.assembly_level
Query parameters
Valid for all data models
taxon_lineage
: filter data under a NCBI taxonomic identifierformat
: retrieve resulst in tsv or json format (used only in/<data_model>
endopoint)fields[]
: list of selected columns for the tsv (used only in/<data_model>
endopoint)
For data model specific fields check the API schema Docs
Examples
Here some examples on how to explore and analyse data.
Example 1
Filter all the biosamples under Mammalia with collection_date field between 2020-12-31&2023-12-31 and with sex field equal to male:
JSON:
TSV:
Now we want to retrieve statistics of the ENA-FIRST-PUBLIC metadata field with the query above:
Example 2
Working with metadata can be harsh and many times metadata of the same data model can differ, for instance a biosample submitted to NCBI will have a different set of metadata then one submitted to ENA.
The API allows users to check for this differences via the __exists
query. For instance we want to take all the biosamples that have the metadata field ENA-CHECKLIST:
https://genome.crg.es/ebp/api/biosamples?metadata.ENA-CHECKLIST__exists=true
and then those that lack of the ENA-CHECKLIST metadata field, we can assume that the lack of this field correspond to biosamples submitted to NCBI:
https://genome.crg.es/ebp/api/biosamples?metadata.ENA-CHECKLIST__exists=false
Now we can retrieve cleaner statistics, for instance we may want to retrieve of the geographical locations of the biosamples submitted to the ENA: