In Part 1 we set up a project to translate Notion pages using the Azure Cognitive Services Translator API, and started by translating the title and printing the translation. In this part, weāll look into translating the most common text blocks and creating a new Notion page with the translation.
The basic structure of the code from part 1 is:
NotionClient
class to handle interactions with the Notion APITranslatorClient
class to call the Translator APINotionTranslator
class to translate pieces of Notion pagesmain
function to set up the above classes and make the magic happenIn part 2, weāll expand on NotionClient
and NotionTranslator
, and make some minor changes to main
in order to call our new functionality instead of just printing out the translated title. The other bits will stay the same!
You can duplicate this simple page for use with this example code here, or make your own.
Weāll start this tutorial by adding some functionality to NotionClient
so that we can get the text of blocks in order to translate them. To keep this tutorial straightforward, we will focus on the most common block types, paragraph
and heading_1
through heading_3
and just get the plain text out of it so we can translate it, and worry about formatting in a later tutorial.
To get the blocks, we need to make a Notion API call. A page could have many blocks, and the Notion API will return a maximum of 100 per request, so we also want to be able to handle making multiple requests to fetch all of the blocks. The blocks API, like all other paginated APIs in Notion, optionally takes a start_cursor
and page_size
parameter you can use to fetch multiple pages.
class NotionClient():
# ... Existing Code ...
def get_notion_blocks(self, block_id, start_cursor=None, page_size=None):
url = f'https://api.notion.com/v1/blocks/{block_id}/children'
params = {}
if start_cursor is not None:
params["start_cursor"] = start_cursor
if page_size is not None:
params["page_size"] = page_size
response = self.session.get(url, params=params)
return response.json()
This will return a fairly wordy object which ultimately has a list of all of the blocks (up to page_size
or 100 of them) and their content.
Once weāve fetched the blocks from the API, weāll need to pull out just the text so that we can translate it. We will focus on the plain_text
property of paragraph
, heading_1
, heading_2
, and heading_3
blocks, which could be split into multiple segments if there is some formatting in the original.
Notion represents the content of various kinds of text blocks as lists of Rich Text Objects, each of which can have separate formatting. A rich text object looks like this:
"rich_text": [
{
"type": "text",
"text": {
"content": "Notion (productivity software)",
"link": null
},
"annotations": {
"bold": false,
"italic": false,
"strikethrough": false,
"underline": false,
"code": false,
"color": "default"
},
"plain_text": "Notion (productivity software)",
"href": null
}
]
If one word in the middle of a paragraph was underlined, youād get 3 items in the list: the text before the underlined word, the underlined word, with āunderlineā set to true, and the text after the underlined word.
get_text
is a function to turn those pieces of rich text back into one string that we can send to the translation API.
class NotionClient():
# ... Existing Code ...
def get_text(self, rich_text_object):
# Concatenates a rich text array into plain text
text = ""
for rt in rich_text_object["rich_text"]:
text += rt["plain_text"]
return text
Weāll call that from a function that handles the logic for supported block types. There are others that we could add to make this script more robust, some of which might need to extract text in a different way, such as image captions.
class NotionClient():
# ... Existing Code ...
def get_block_text(self, block):
if block["type"] == "paragraph":
return self.get_text(block["paragraph"])
if block["type"] == "heading_1":
return self.get_text(block["heading_1"])
if block["type"] == "heading_2":
return self.get_text(block["heading_2"])
if block["type"] == "heading_3":
return self.get_text(block["heading_3"])
return None
At this point, we now have in NotionClient
a function to get the blocks, and a function we can use to pull out the raw text.
The next step is to fetch all blocks from a page and send the text to the translation API. To do this, weāll add one more small function to NotionClient
to handle updating the text of a block without affecting any of itās other properties.
class NotionClient():
# ... Existing Code ...
def update_block_text(self, block, new_text):
block[block["type"]]["rich_text"] = [
{"type": "text", "text": {"content": new_text}}]
return block
The rest of this section updates NotionTranslator
While we work through translation, weāll be going through to APIs - weāll fetch some blocks from Notion, translate them if they have text, and then fetch more blocks from Notion.
class NotionTranslator():
# ... Existing Code ...
def translate_blocks(self, blocks):
translated_blocks = []
for block in blocks:
source_text = self.notion_client.get_block_text(block)
if source_text is not None:
translated = self.translate_client.translate(
source_text, self.source_language, self.target_language)
translated_blocks.append(
self.notion_client.update_block_text(block, translated))
elif block["type"] == "child_page":
pass
elif block["type"] != "unsupported":
translated_blocks.append(block)
return translated_blocks
Notice that we check if there actually was text before making a call to the translation API. We are also skipping over any child_page
blocks because they need to be handled differently, and unsupported
blocks, which we canāt create via the API at this time. If a block is supported by the API, but we havenāt translated it, we just copy over the original block and make a copy.
In order to run through all of the blocks in a page, we call translate_blocks
in a loop with requests to get more blocks in translate_all_blocks
. I cover paginated requests in more detail in Paginated Requests with the Notion API in Python, but the gist is if thereās more than one page, has_more
will be true and we pass in the value of next_cursor
to the next request.
class NotionTranslator():
# ... Existing Code ...
def translate_all_blocks(self, source_page_id):
blocks_response = self.notion_client.get_notion_blocks(
source_page_id)
translated_blocks = self.translate_blocks(
blocks_response.get("results"))
while blocks_response.get("has_more"):
blocks_response = self.notion_client.get_notion_blocks(
source_page_id, blocks_response.get("next_cursor"))
translated_blocks.extend(self.translate_blocks(
blocks_response.get("results")))
return translated_blocks
Finally, we want to translate the title and the blocks, and create a new translated page.
We need to add one last method to NotionClient
in order to create a page. Pages in Notion must have a parent, and for this example, we will create our original page as the parent of a translated page (but with some minor changes to the script you could change this!). The children
parameter represents the blocks of the page, and the title
is naturally the title of the page.
class NotionClient():
# ... Existing Code ...
def create_page(self, parent_page_id, children, title):
create_page_body = {
"parent": {"page_id": parent_page_id},
"properties": {
"title": {
"title": [{"type": "text", "text": {"content": title}}]
}
},
"children": children
}
create_response = self.session.post(
"https://api.notion.com/v1/pages", json=create_page_body)
return create_response
Once we can create a page in NotionClient
, we can tie things together in NotionTranslator
and create a translated page by translating the title, then the blocks, and creating the new page.
class NotionTranslator():
# ... Existing Code ...
def create_translated_page(self, source_page_id, target_page_id=None):
# Create as a child of source page if a parent is not set
if target_page_id is None:
target_page_id = source_page_id
translated_title = self.translate_title(source_page_id)
translated_content = self.translate_all_blocks(source_page_id)
response = self.notion_client.create_page(
target_page_id, translated_content, translated_title)
return response
Finally, weāll update the main
method from the previous sample, and create a translated page and print a status and the URL to the new page, rather than just translating the title.
def main(notion_page_id, source_language="en", target_language="fr"):
notion_client = NotionClient(os.getenv('NOTION_KEY'))
translate_client = TranslatorClient(
os.getenv('COG_SERVICE_KEY'), os.getenv('COG_SERVICE_REGION'))
translator = NotionTranslator(
notion_client, translate_client, source_language, target_language)
# Code below this comment is modified from Part 1
response = translator.create_translated_page(notion_page_id)
if (response.ok):
print("Page translated successfully!")
print(response.json()["url"])
else:
print("Error translating page")
Letās translate our page into Portuguese:
>python translate-notion-page.py f4be3aa4fe9c45989e44067effbbc7f9 pt
Page translated successfully!
https://www.notion.so/Demonstra-o-de-Tradu-o-1c43a0800d3840b89e59c83a8e6f8dc3
And we should be able to go to the URL to see the newly created, translated page!
The full code is available on Github Gists.
This tutorial has covered making a script to translate a simple Notion page using Microsoftās Translate API.
Like any tutorial, thereās lots more youād need to do to make a robust Notion page translation system, both in terms of supporting Notion things like child pages and more block types, and other things like if you want to generate a static automatic translation, or have a workflow that allows editing the translated content to improve quality, but this covers the basic pieces to get started!
Have questions or feedback? Hit me up on Twitter at @lisa_gaud!