Translate a Notion page using Azure Cognitive Services Translator (Part 2)

In part 2, we add in translating page content and turning the translated title and blocks into a new page.

In Part 1 we set up a project to translate Notion pages using the Azure Cognitive Services Translator API, and started by translating the title and printing the translation. In this part, weā€™ll look into translating the most common text blocks and creating a new Notion page with the translation.

The basic structure of the code from part 1 is:

  • NotionClient class to handle interactions with the Notion API
  • TranslatorClient class to call the Translator API
  • NotionTranslator class to translate pieces of Notion pages
  • main function to set up the above classes and make the magic happen
  • Code to parse arguments and run main when itā€™s called as a script.

In part 2, weā€™ll expand on NotionClient and NotionTranslator, and make some minor changes to main in order to call our new functionality instead of just printing out the translated title. The other bits will stay the same!

You can duplicate this simple page for use with this example code here, or make your own.

Getting Block Text #

Weā€™ll start this tutorial by adding some functionality to NotionClient so that we can get the text of blocks in order to translate them. To keep this tutorial straightforward, we will focus on the most common block types, paragraph and heading_1 through heading_3 and just get the plain text out of it so we can translate it, and worry about formatting in a later tutorial.

Get the blocks #

To get the blocks, we need to make a Notion API call. A page could have many blocks, and the Notion API will return a maximum of 100 per request, so we also want to be able to handle making multiple requests to fetch all of the blocks. The blocks API, like all other paginated APIs in Notion, optionally takes a start_cursor and page_size parameter you can use to fetch multiple pages.

class NotionClient():
# ... Existing Code ...
def get_notion_blocks(self, block_id, start_cursor=None, page_size=None):
url = f'https://api.notion.com/v1/blocks/{block_id}/children'
params = {}
if start_cursor is not None:
params["start_cursor"] = start_cursor
if page_size is not None:
params["page_size"] = page_size

response = self.session.get(url, params=params)
return response.json()

This will return a fairly wordy object which ultimately has a list of all of the blocks (up to page_size or 100 of them) and their content.

Parse the block text #

Once weā€™ve fetched the blocks from the API, weā€™ll need to pull out just the text so that we can translate it. We will focus on the plain_text property of paragraph, heading_1, heading_2, and heading_3 blocks, which could be split into multiple segments if there is some formatting in the original.

Notion represents the content of various kinds of text blocks as lists of Rich Text Objects, each of which can have separate formatting. A rich text object looks like this:

"rich_text": [
{
"type": "text",
"text": {
"content": "Notion (productivity software)",
"link": null
},
"annotations": {
"bold": false,
"italic": false,
"strikethrough": false,
"underline": false,
"code": false,
"color": "default"
},
"plain_text": "Notion (productivity software)",
"href": null
}
]

If one word in the middle of a paragraph was underlined, youā€™d get 3 items in the list: the text before the underlined word, the underlined word, with ā€œunderlineā€ set to true, and the text after the underlined word.

get_text is a function to turn those pieces of rich text back into one string that we can send to the translation API.

class NotionClient():
# ... Existing Code ...
def get_text(self, rich_text_object):
# Concatenates a rich text array into plain text
text = ""
for rt in rich_text_object["rich_text"]:
text += rt["plain_text"]
return text

Weā€™ll call that from a function that handles the logic for supported block types. There are others that we could add to make this script more robust, some of which might need to extract text in a different way, such as image captions.

class NotionClient():
# ... Existing Code ...
def get_block_text(self, block):
if block["type"] == "paragraph":
return self.get_text(block["paragraph"])
if block["type"] == "heading_1":
return self.get_text(block["heading_1"])
if block["type"] == "heading_2":
return self.get_text(block["heading_2"])
if block["type"] == "heading_3":
return self.get_text(block["heading_3"])

return None

At this point, we now have in NotionClient a function to get the blocks, and a function we can use to pull out the raw text.

Translating Blocks #

The next step is to fetch all blocks from a page and send the text to the translation API. To do this, weā€™ll add one more small function to NotionClient to handle updating the text of a block without affecting any of itā€™s other properties.

class NotionClient():
# ... Existing Code ...
def update_block_text(self, block, new_text):
block[block["type"]]["rich_text"] = [
{"type": "text", "text": {"content": new_text}}]

return block

The rest of this section updates NotionTranslator

While we work through translation, weā€™ll be going through to APIs - weā€™ll fetch some blocks from Notion, translate them if they have text, and then fetch more blocks from Notion.

class NotionTranslator():
# ... Existing Code ...
def translate_blocks(self, blocks):
translated_blocks = []
for block in blocks:
source_text = self.notion_client.get_block_text(block)
if source_text is not None:
translated = self.translate_client.translate(
source_text, self.source_language, self.target_language)
translated_blocks.append(
self.notion_client.update_block_text(block, translated))
elif block["type"] == "child_page":
pass
elif block["type"] != "unsupported":
translated_blocks.append(block)

return translated_blocks

Notice that we check if there actually was text before making a call to the translation API. We are also skipping over any child_page blocks because they need to be handled differently, and unsupported blocks, which we canā€™t create via the API at this time. If a block is supported by the API, but we havenā€™t translated it, we just copy over the original block and make a copy.

In order to run through all of the blocks in a page, we call translate_blocks in a loop with requests to get more blocks in translate_all_blocks. I cover paginated requests in more detail in Paginated Requests with the Notion API in Python, but the gist is if thereā€™s more than one page, has_more will be true and we pass in the value of next_cursor to the next request.

class NotionTranslator():
# ... Existing Code ...
def translate_all_blocks(self, source_page_id):
blocks_response = self.notion_client.get_notion_blocks(
source_page_id)
translated_blocks = self.translate_blocks(
blocks_response.get("results"))
while blocks_response.get("has_more"):
blocks_response = self.notion_client.get_notion_blocks(
source_page_id, blocks_response.get("next_cursor"))
translated_blocks.extend(self.translate_blocks(
blocks_response.get("results")))

return translated_blocks

Creating a translated page #

Finally, we want to translate the title and the blocks, and create a new translated page.

We need to add one last method to NotionClient in order to create a page. Pages in Notion must have a parent, and for this example, we will create our original page as the parent of a translated page (but with some minor changes to the script you could change this!). The children parameter represents the blocks of the page, and the title is naturally the title of the page.

class NotionClient():
# ... Existing Code ...
def create_page(self, parent_page_id, children, title):
create_page_body = {
"parent": {"page_id": parent_page_id},
"properties": {
"title": {
"title": [{"type": "text", "text": {"content": title}}]
}
},
"children": children
}

create_response = self.session.post(
"https://api.notion.com/v1/pages", json=create_page_body)

return create_response

Once we can create a page in NotionClient, we can tie things together in NotionTranslator and create a translated page by translating the title, then the blocks, and creating the new page.

class NotionTranslator():
# ... Existing Code ...
def create_translated_page(self, source_page_id, target_page_id=None):
# Create as a child of source page if a parent is not set
if target_page_id is None:
target_page_id = source_page_id

translated_title = self.translate_title(source_page_id)
translated_content = self.translate_all_blocks(source_page_id)

response = self.notion_client.create_page(
target_page_id, translated_content, translated_title)
return response

Finishing Pieces and running the code #

Finally, weā€™ll update the main method from the previous sample, and create a translated page and print a status and the URL to the new page, rather than just translating the title.

def main(notion_page_id, source_language="en", target_language="fr"):
notion_client = NotionClient(os.getenv('NOTION_KEY'))

translate_client = TranslatorClient(
os.getenv('COG_SERVICE_KEY'), os.getenv('COG_SERVICE_REGION'))

translator = NotionTranslator(
notion_client, translate_client, source_language, target_language)

# Code below this comment is modified from Part 1
response = translator.create_translated_page(notion_page_id)

if (response.ok):
print("Page translated successfully!")
print(response.json()["url"])
else:
print("Error translating page")

Letā€™s translate our page into Portuguese:

>python translate-notion-page.py f4be3aa4fe9c45989e44067effbbc7f9 pt
Page translated successfully!
https://www.notion.so/Demonstra-o-de-Tradu-o-1c43a0800d3840b89e59c83a8e6f8dc3

And we should be able to go to the URL to see the newly created, translated page!

Screenshot of Notion page translated to Portuguese

The full code is available on Github Gists.

Final Thoughts #

This tutorial has covered making a script to translate a simple Notion page using Microsoftā€™s Translate API.

Like any tutorial, thereā€™s lots more youā€™d need to do to make a robust Notion page translation system, both in terms of supporting Notion things like child pages and more block types, and other things like if you want to generate a static automatic translation, or have a workflow that allows editing the translated content to improve quality, but this covers the basic pieces to get started!

Have questions or feedback? Hit me up on Twitter at @lisa_gaud!