-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replay HTTP requests against an air-gapped Confluence server #823
Comments
This topic was discussed in #627 last year, if you would like to read up about the opinions/issues at the time. Note that part of the hesitation outlined in these comments were due to some prospect implementation in supporting Confluence' ADF format (in part to support the v2 editor style). The implementation would require a multi-pass publishing design, otherwise Confluence would reject uploads. Since this extension was able to support the v2 editor while still using the storage format, publishing should be as simple as it is right now in the current implementation. So, when considering the other issue's comments, it may be somewhat easier to implement a separation between building and publishing. While my first impressions think that a replay functionality should not be something directly implemented in this extension, I do think it's a great idea for your use case. Isolating request data made should be easily translatable to PowerShell (or whatever) scripts. While the suggested approach of wrapping on the requests library should work, I wonder if it would be simpler if this extension supported an advanced configuration or introduce extension hooks which allows a user to override either the The biggest thing to consider when looking at the separation between building and publishing is not so much the generated documentation set, but the dynamic values needed in some of the data requests. We could initially ignore some features like purging old pages or publishing on updated pages/attachments, and just focus on publishing an entire documentation/attachment set. One example to consider is that the As for the Confluence Builder extension, my current opinion on supporting the separation of building and publishing is as follows. I am in favor to add support for such a feature. I would not say it is a priority feature to get out by next release, but it could be the next big feature we aim to bring in this extension. The following overview is how I would imagine such a design would be done: flowchart LR
P --> S
Tpp --> S
E --> A
A -.-> Tpp
A -.-> I
subgraph Sphinx/Confluence Builder
C[Initialization] --> B
C --> I
B[Builder] --> P
B --> E
E[Export]
I[Import] --> P
P[Publisher]
end
A[Archive]
subgraph Third Party
Tpp[Publisher]
end
subgraph "Confluence"
S[Space]
end
I'm welcome to other design considerations as well. Finally, I would imagine the timelines wanted for OPs requirements may not be align with the timelines of this extension providing full support for build/publish separation. For sure it should be possible to tweak this extension in order to be flexible for a third-party replay functionality (if tweaks to this extension are needed), but a serious look at a full fledged solution may not be looked into (by this maintainer) until possible Fall, if not next year. |
Marry me! 🥹 Thank you so much for such a quick and precise reply!
That's exactly what i had in mind. 😄 Quick-n-Nasty! Should this become a feature and it will be implemented similar to what @jdknight proposes, i'd volunteer for providing a reference implementation of a third-party publisher. Having a plan B is always nice, so i don't see a problem in me dedicating time for this, probably Q4 '23. |
UPDATE: I've upgraded the boilerplate since the interims solution will be in place longer than i would like to. Now, the boilerplate suppresses any connection to a Confluence instance and dumps contents and attachments to seperate files (had issues with Anti-Virus when normalized). The schema is now redundant. Attached you'll find my quick-n-dirty boilerplate implementation of the wrapper. Fixing up a custom Rest Client in PowerShell/Perl/whatever can be achieved in less than 50 lines, hence i don't see the point in sharing. Maybe this will be useful for someone else. @jdknight if i can support the project with any feature implementations, let me know. I feel comfortable with the code now. In addition, i've appended a schema for the output. By default the dump will go to One can use the schema in conjunction with e.g. the Attachment data is decoded as ISO-8859-1. #!/usr/bin/env python3
"""Publishment delay wrapper sphinxcontrib.confluencebuilder
This is a lightweight pass-through wrapper for
``sphinxcontrib.confluencebuilder``, which intercepts all ``store_*`` calls on
a ``ConfluencePublisher`` instance, dumps all data into interchange.
The index and dumps can be used in conjunction with the PowerShell
helper to delay/replay the publishment of pages and attachments for a different
Confluence instance, than what the programmatic target is.
The builder name is ``x_confluence``
``Publisher``, ``Builder``, as well as ``Rest`` instances are mocked and are
supressing any HTTP connectivity should the ``confluence_publish_dry_run`` be
set to ``True``.
.. warning::
``confluence_publish_dry_run`` MUST be set to ``True``
Content (pages) and attachments are dumped into separate files and indexed.
The output directory can be set through ``x_confluence_outdir``.
The use-case for this implementation is as follows:
I am currently facing the situation where i need to publish to an
air-gapped Confluence server inside a virtualised and privatised
environment (over Windows VDI). In addition, company policy forbids me from
using, or installing Python
on the VDI. I can freeload a perl executable that came bundled with Git for
Windows, but there are no other scripting means besides Windows PowerShell.
"""
__author__ = 'tiara.rodney@adesso.de'
__copyright__ = 'adesso SE'
__license__ = 'DL-DE-BY-2.0'
from dataclasses import dataclass, asdict
import json
from mimetypes import guess_extension
from pathlib import Path
from typing import Any, Optional, ByteString, Dict, Tuple, List
from unittest.mock import patch
from uuid import uuid4
from sphinx.application import Sphinx
from sphinx.util import logging
from sphinxcontrib.confluencebuilder import setup as _setup
from sphinxcontrib.confluencebuilder.builder import (
ConfluenceBuilder as _ConfluenceBuilder
)
from sphinxcontrib.confluencebuilder.publisher import (
ConfluencePublisher as _ConfluencePublisher
)
from sphinxcontrib.confluencebuilder.rest import Rest as _Rest
logger = logging.getLogger(__name__)
@dataclass
class ConfluenceContentMeta:
"""
see
`https://docs.atlassian.com/ConfluenceServer/rest/8.4.0/#api/content-createContent`_,
for more information
"""
#:
title: str
#:
ancestor_id: str
@dataclass
class ConfluenceChildAttachmentMeta:
"""
"""
#:
container_id: str
#:
name: str
#:
mimetype: str
@dataclass
class ConfluencePublisherDump:
"""
"""
#:
pages: Dict[str, ConfluenceContentMeta]
#:
attachments: Dict[str, ConfluenceChildAttachmentMeta]
class Rest(_Rest):
"""
"""
def __init__(self, config):
"""
"""
super().__init__(config)
def __getattr__(self, name: str) -> Any:
"""
"""
print('Hallo')
return super().__getattribute__(name)
def __setattr__(self, name: str, value: Any) -> None:
"""
"""
return super().__setattr__(name, value)
def get(self, key, params=None):
"""
"""
from pprint import pprint
return {'results': [
{
'id': 776536065,
'key': self.config.confluence_space_key,
'name': 'Testitest',
'type': 'personal'
}
], 'size': 1, 'limit': 1, 'start': 0}
class ConfluencePublisher(_ConfluencePublisher):
"""
"""
def __init__(self):
"""
"""
super().__init__()
self.dump = ConfluencePublisherDump(
pages = {},
attachments = {}
)
def __getattr__(self, name: str) -> Any:
"""
"""
return super().__getattribute__(name)
def __setattr__(self, name: str, value: Any) -> None:
"""
"""
return super().__setattr__(name, value)
def connect(self):
"""initialize a REST client and probe the target Confluence instance
.. note::
Actually, i don't want the extension to initialize a connection,
but there is too much entanglement, so we're mocking the absolute
minimum for the publisher object to assume everything is fine
"""
with patch('sphinxcontrib.confluencebuilder.publisher.Rest', Rest):
return super().connect()
def get_page_by_id(self, page_id, expand = 'version') -> Tuple[None, List]:
"""get page information with the provided page name
:param page_id: the page identifier
:param expand: data to expand on
:returns: page id and page object
"""
return (None, [])
def store_attachment(
self,
page_id: str,
name: str,
data: Any,
mimetype: Any,
hash_: str,
force: bool = False
) -> str:
"""request to store an attachment on a provided page
:returns: the attachment identifier
"""
logger.info('pass-through intercept: store_attachment')
attachment_id = uuid4()
mime_extension = guess_extension(mimetype, False)
if mime_extension:
attachment_id = f'{attachment_id}{mime_extension}'
file = (Path(getattr(self.config, 'x_confluence_outdir')) /
'attachments' / attachment_id)
file.parent.mkdir(parents=True, exist_ok=True)
file.write_bytes(data)
self.dump.attachments[attachment_id] = ConfluenceChildAttachmentMeta(
container_id = page_id,
name = name,
mimetype= mimetype
)
return attachment_id
def store_page(
self,
page_name: str,
data: Any,
parent_id: Optional[str] = None
) -> str:
"""request to store page information to a confluence instance
:param page_name: the page title to use on the updated page
:param data: the page data to apply
:param parent_id: the id of the ancestor to use
:returns: id of uploaded page
"""
logger.info('pass-through intercept: store_page')
content_id = str(uuid4())
file = (Path(getattr(self.config, 'x_confluence_outdir')) / 'content' /
f'{content_id}.xml')
file.parent.mkdir(parents=True, exist_ok=True)
file.write_bytes(data['content'].encode('utf-8'))
self.dump.pages[content_id] = ConfluenceContentMeta(
title = page_name,
ancestor_id = parent_id
)
return content_id
def store_page_by_id(
self,
page_name: str,
page_id: str,
data: Any
) -> str:
"""request to store page information on the page with a matching id
:param page_name: the page title to use on the updated page
:param data: the page data to apply
:param parent_id: the id of the ancestor to use
:returns: id of uploaded page
"""
logger.info('pass-through intercept: store_page_by_id')
return 'NULL'
def disconnect(self):
"""terminate the REST client
.. note::
Freeloading this method to dump the index.
"""
file = Path(getattr(self.config, 'x_confluence_outdir')) / 'data.json'
file.parent.mkdir(parents = True, exist_ok=True)
raw = json.dumps(asdict(self.dump), indent=4)
file.write_text(raw)
logger.info(f'content dump count: {len(self.dump.pages)}')
logger.info(f'attachments dump count: {len(self.dump.attachments)}')
logger.info(f'dump index: {file}')
class ConfluenceBuilder(_ConfluenceBuilder):
"""
"""
name = 'x_confluence'
def __init__(self, app: Sphinx, env = None):
"""
"""
patch_target = ('sphinxcontrib.confluencebuilder'
'.builder.ConfluencePublisher')
with patch(patch_target, ConfluencePublisher):
super().__init__(app, env)
def __getattribute__(self, name: str) -> Any:
"""
"""
return super().__getattribute__(name)
def __setattr__(self, name: str, value: Any) -> None:
"""
"""
return super().__setattr__(name, value)
def setup(app: Sphinx):
"""
"""
patch_target = 'sphinxcontrib.confluencebuilder.ConfluenceBuilder'
app.add_config_value(
name = 'x_confluence_outdir',
default = str(Path(app.outdir) / 'confluence.out'),
rebuild = True
)
with patch(patch_target, ConfluenceBuilder):
logger.info(f'patching: {patch_target}')
return _setup(app) {
"$id": "https://github.com/tiara-adessi/confluencebuilder/schema/top",
"x-authors": [
"tiara.rodney@adesso.de"
],
"type": "object",
"properties": {
"pages": {
"type": "array",
"items": {
"$ref": "#/definitions/page"
}
},
"attachments": {
"type": "array",
"items": {
"$ref": "#/definitions/attachment"
}
}
},
"required": [
"pages",
"attachments"
],
"definitions": {
"page": {
"type": "object",
"properties": {
"page_name": {
"type": "string"
},
"page_id": {
"type": "string"
},
"parent_id": {
"type": "string"
},
"data": {
"type": "object",
"properties": {
"content": {
"type": "string"
},
"labels": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"content",
"labels"
]
}
},
"required": [
"page_name",
"page_id",
"parent_id",
"data"
]
},
"attachment": {
"type": "object",
"properties": {
"page_id": {
"type": "string"
},
"name": {
"type": "string"
},
"data": {
"type": "string"
},
"mimetype": {
"type": "string"
},
"attachment_id": {
"type": "string"
}
},
"required": [
"page_id",
"name",
"data",
"mimetype",
"attachment_id"
]
}
}
} |
Thanks for the example, provides some good insights to understand what is desired. I'm curious about the schema definition, since I cannot say I've created a schema definition file for JSON before. Noticed there is the use of |
That was a very nice subtle hint, that i was missing the Yes, these are common practices as defined in the JSONSchema specification(s). There are some pretty good validators out there (e.g., I use this one in Python, and this one in Node.js, and this one in Perl). Yes, the schema was created manually and i tend to use older versions of the specification to author them, so that it stays compatible across multiple validators. Besides the Btw. i refactored the code once more as it turns out, that my temporary solution probably has to stay long-term as the promised CI/CD environment won't suffice. I'll share the repos some day next week. |
Sorry for the delayed (promised) notice. We now have published two programs and made this an open-source effort:
We're hoping the reference implementations suffice for you giving it a test drive. The manifest schema (currently only part of the {
"$id": "https://spec.victory-k.it/psconfluencepublisher.json",
"x-authors": [
"theodor.rodweil@victory-k.it"
],
"type": "object",
"properties": {
"Pages": {
"type": "array",
"item": {
"$ref": "#/definitions/page"
}
},
"Attachments": {
"type": "array",
"item": {
"$ref": "#/definitions/attachment"
}
}
},
"required": [
"Pages",
"Attachments"
],
"definitions": {
"page": {
"type": "object",
"description": "Local Confluence page/container attachment metadata",
"properties": {
"Title": {
"type": "string",
"description": "Title of page"
},
"Id": {
"type": "string",
"description": "Id of attachment defined by Confluence instance. The id is generated after the publishing of a page."
},
"Version": {
"type": "string"
},
"Hash": {
"type": "string",
"description": "SHA512 hexadecimal content hash value"
},
"Ref": {
"type": "string",
"description": "Local filesystem reference/path"
},
"AncestorTitle": {
"type": "string",
"description": "Title of Confluence page this page is a child of. The title must be a property key of the pages object."
}
},
"required": [
"Title",
"Ref"
]
},
"attachment": {
"type": "object",
"description": "Local Confluence page/container attachment metadata",
"properties": {
"Name": {
"type": "string",
"description": "name of attachment, which must be unique within the container page"
},
"Id": {
"type": "string",
"description": "Id of attachment defined by Confluence instance. The id is generated after the publishing of an attachment."
},
"Hash": {
"type": "string",
"description": "SHA512 hexadecimal attachment content hash value"
},
"MimeType": {
"type": "string",
"description": "MIME type of attachment",
"default": "binary/octet-stream"
},
"ContainerPageTitle": {
"type": "string",
"description": "Title of Confluence page this attachment is contained in. The title must be a property key of the pages object."
},
"Ref": {
"type": "string",
"description": "Local filesystem reference/path"
}
},
"required": [
"Name",
"Hash",
"MimeType",
"ContainerPageTitle",
"Ref"
]
}
}
} |
Hi all,
i am currently facing the situation where i need to publish to an air-gapped Confluence server inside a virtualised and privatised environment (over Windows VDI). In addition, company policy forbids me from using, or installing Python on the VDI. I can freeload a perl executable that came bundled with Git for Windows, but there are no other scripting means besides Windows PowerShell.
So... I'm either thinking about hacking something in PowerShell or Perl.
Quick-n-Nasty Approach
requests
package. (i'd useunittest.mock
or something similar)Quick-n-Dirty Approach
Basically just like the Quick-n-Nasty solution except that i will rewrite the
ConfluencePublisher
classI just require this as an interims workaround, since i'll be getting access to proper CI/CD facilities soon.
My question
Was a scenario, as described by me, ever considered during the development? I'd be interested in the thoughts.
It's probably not worth adding it as a feature, but publishing replays could be a feature, if the publishing task was split into building and publishing, where the build task could output an index of HTTP requests metadata and request bodies and the publisher executes upon the index.
The text was updated successfully, but these errors were encountered: