Friday, September 27, 2024

DAG Hamilton Graph Presented as SVG in Blogger

Through the kindness of the DAG Hamilton project team, I was able to secure an official svg version of the DAG Hamilton logo. It looks significantly better than the one I had generated with an online image to svg converter and is much smaller and easy to work with (4 kilobytes versus 200 kb). The DAG Hamilton graphviz graph now shows up in Blogger; it is unlikely to show up on the planet(python) feed. Blogger is not liking the code and svg I have included (complaints of malformed html). In the interest of preserving the rendering of the graph(s), I am constraining the text here to a few paragraphs

The first graph has the code provided. This graph is from a previous post.

The second graph represents the DAG Hamilton workflow for the production of the first graph. This is in keeping with the "Eat your own dogfood" mantra. I happen to like the DAG Hamilton dogfood as I've mentioned in previous posts. It allows me to visualize my workflows and track complexity and areas for improvement in the code.

The third one I did with a scaled down version of the code presented (no logos). I hand pasted the DAG Hamilton official logo into the third one. It is not subtle (the logo is huge), but it provides an idea of what one can do creatively with the logo or any svg element. Also, it shows the DAG Hamilton workflow for a graph.

All the code is a work in progress. Ideally I would like to keep reducing this to the most simple svg implementation possible to get it to show up or "work." Realistically, I'm afraid to sneeze for fear Blogger will protest. For now, I'm leaving good enough alone. Links and thoughts on svg (there is at least one python library (orsinium-labs/svg.py) out there that is way more elegant in its treatment of the medium than my rough regular expressions / text processing) will have to wait for another post.

Thanks for stopping by.

Toy Web Scraping Script Run Diagram Web Scraping Functions Highlighted Legend datafile str commodity_word_counts dict info_dict_merged dict colloquial_company_word_counts dict data_with_wikipedia dict data_with_company dict parsed_data dict wikipedia_report str info_output str input function

run.py code


"""
Hamilton wrapper.
"""

# run.py

import sys

import pprint

from hamilton import driver

import dag_hamilton_to_blogger as dhtb

dr = driver.Builder().with_modules(dhtb).build()

dr.display_all_functions('dhtb.svg',
                         deduplicate_inputs=True,
                         keep_dot=True,
                         orient='BR')

results = dr.execute(['defluffed_lines',
                      'scale_and_translation',
                      'logo_positions',
                      'captured_values',
                      'scaled_elements',
                      'translated_elements',
                      'hamilton_logo_data',
                      'scale_and_translation_hamilton_logo',
                      'fauxcompany_logo_data',
                      'scale_and_translation_fauxcompany_logo',
                      'svg_ready_doc',
                      'written_svg'],
                      inputs={'svg_file':'web_scraping_functions_highlighted.svg',
                              'outputfile':'test_output.svg',
                              'hamiltonlogofile':'hamilton_official_stripped.svg',
                              'hamiltonlogo_coords':{'min_x':-0.001,
                                                     'max_x':4353.846,
                                                     'min_y':-0.0006,
                                                     'max_y':4177.257},
                              'fauxcompanylogofile':'fauxcompanylogo_stripped_down.svg',
                              'fauxcompanylogo_coords':{'min_x':11.542786063261742,
                                                        'max_x':705.10684,
                                                        'min_y':4.9643821,
                                                        'max_y':74.47416391682819}})

Main DAG Hamilton functions (dag_hamilton_to_blogger.py)


# python 3.12

"""
Make DAG Hamilton graph show up in Blogger.
"""

import re
import sys
import pprint
import math
import copy

import reusedfunctions as rf

VIEWBOX_PAT = (r'[ ]viewBox[=]["][-]?[0-9]+[.]?[0-9]*[ ][-]?[0-9]+[.]?[0-9]*[ ]'
               r'([0-9]+[.]?[0-9]*)[ ]([0-9]+[.]?[0-9]*)')

# 5 coordinates.
POLYGON_PAT = (r'[<]polygon'
               r'.*([ ]points[=]["])([-]?[0-9]+[.]?[0-9]*)[,]'
                                  r'([-]?[0-9]+[.]?[0-9]*)[ ]'
                                  r'([-]?[0-9]+[.]?[0-9]*)[,]'
                                  r'([-]?[0-9]+[.]?[0-9]*)[ ]'
                                  r'([-]?[0-9]+[.]?[0-9]*)[,]'
                                  r'([-]?[0-9]+[.]?[0-9]*)[ ]'
                                  r'([-]?[0-9]+[.]?[0-9]*)[,]'
                                  r'([-]?[0-9]+[.]?[0-9]*)[ ]'
                                  r'([-]?[0-9]+[.]?[0-9]*)[,]'
                                  r'([-]?[0-9]+[.]?[0-9]*)["]')

# 4 coordinates instead of 5.
POLYGON_PAT_4 = (r'[<]polygon'
                 r'.*([ ]points[=]["])([-]?[0-9]+[.]?[0-9]*)[,]'
                                    r'([-]?[0-9]+[.]?[0-9]*)[ ]'
                                    r'([-]?[0-9]+[.]?[0-9]*)[,]'
                                    r'([-]?[0-9]+[.]?[0-9]*)[ ]'
                                    r'([-]?[0-9]+[.]?[0-9]*)[,]'
                                    r'([-]?[0-9]+[.]?[0-9]*)[ ]'
                                    r'([-]?[0-9]+[.]?[0-9]*)[,]'
                                    r'([-]?[0-9]+[.]?[0-9]*)["]')

# x, y
TEXTPAT = (r'')

IMAGE_FLAG = '')
# 4 coords (arrow head).
POLYGON_STR_4 = (r' points="{0:.3f},{1:.3f} {2:.3f},{3:.3f} '
                         r'{4:.3f},{5:.3f} {6:.3f},{7:.3f}"/>')
PATH_START_STR = r' d="M{0:.3f},{1:.3f}C'
PATH_STR_SEGMENT = ' {0:.3f},{1:.3f}'
PATH_STR = r' {0:s}"/>'
TEXT_STR = r' x="{0:.3f}" y="{1:.3f}"'
TEXT_STR_FONT = r' font-size="{0:.3f}"'

HAMILTON_LOGO_DIMENSIONS_PAT = (r'.*width[=]["]([0-9]+[.]?[0-9]*)px["][ ]'
                                  r'height[=]["]([0-9]+[.]?[0-9]*)px["][>]')

FAUXCOMPANY_LOGO_DIMENSIONS_PAT = (r'[ ]width[=]["]([0-9]+[.]?[0-9]*)["][ ]'
                                      r'height[=]["]([0-9]+[.]?[0-9]*)["][ ][>]')

# The official Hamilton logo splits the path into multiple
# lines with the last one having the absolute location
# ("C") of a bezier curve.
HAMILTON_CHANGE_LINE_PAT = r'.*C[-]?[0-9]+[.]?[0-9]*'

HAMILTON_TRANSFORM_FMT = (' transform="scale({scale:f}) '
                           'translate({translate_x:f},{translate_y:f})" />')

# One line of paths in Inkscape generated file.
FAUXCOMPANY_CHANGE_LINE_PAT = r'.*d[=]["]m[ ]'

# Inkscape put the closing tag /> on the following line.
FAUXCOMPANY_TRANSFORM_FMT = (' transform="scale({scale:f}) '
                              'translate({translate_x:f},{translate_y:f})"')

# * - get rid of first 6 lines.

# * - get rid of any line starting with:
#
#   " list:
    """
    Purge svg files of unnecessary lines (for 
    further operations).

    Returns list of file string lines.
    """
    print('Defluffing graphviz svg output . . .')
    with open(svg_file, 'r') as f:
        lines = [linex for linex in f]
    lines = lines[6:]
    bifflist = [' -1
                          for n in bifflist])]
    return retval

def scale_and_translation(defluffed_lines:list) -> dict:
    """
    Get relevant values for scaling and translation.

    defluffed_lines is a list of line strings.

    Returns dictionary.
    """
    retval = {}
    print('Getting scale and translation . . .')
    pat = re.compile(VIEWBOX_PAT)
    # Second line has everything.
    match = pat.match(defluffed_lines[1])
    retval['viewBox_x'], retval['viewBox_y'] = [float(x) for x in match.groups()]
    pat = re.compile(POLYGON_PAT)
    # Third line has this.
    match = pat.match(defluffed_lines[2])
    polycoords = [float(x) for x in match.groups()[1:]]
    x_coords = polycoords[0::2]
    y_coords = polycoords[1::2]
    retval['x_translation'] = -1.0 * min(x_coords)
    retval['y_translation'] = -1.0 * min(y_coords) 
    # Try to make it about 600 wide.
    scale = X_SIZE / max(x_coords)
    retval['x_translation_scaled'] = retval['x_translation'] * scale
    retval['y_translation_scaled'] = retval['y_translation'] * scale
    retval['scale'] = scale
    retval['width'] = math.ceil(scale * retval['viewBox_x'])
    retval['height'] = math.ceil(retval['y_translation_scaled'])
    return retval

def logo_positions(defluffed_lines:list) -> dict:
    """
    Get logo positions, size, etc.

    defluffed_lines is a list of svg file lines.

    Returns dictionary.
    """
    retval = {}
    target_indices = [x for x in range(len(defluffed_lines))
                      if defluffed_lines[x].find(IMAGE_FLAG) > -1]
    pat = re.compile(POLYGON_PAT)
    # Hamilton logo.
    match = pat.match(defluffed_lines[target_indices[0] - 1])
    polycoords = [float(x) for x in match.groups()[1:]]
    retval['hamilton_posit'] = polycoords
    # Company logo.
    match = pat.match(defluffed_lines[target_indices[1] - 1])
    polycoords = [float(x) for x in match.groups()[1:]]
    retval['company_posit'] = polycoords
    retval['target_indices'] = target_indices
    return retval

def captured_values(defluffed_lines:list,
                    logo_positions:dict) -> list:
    """
    Make list of dictionaries for each line
    in stripped down svg file.
    """
    # Idea is to get values to be scaled and 
    # translated and stop indices within string
    # for later processing.
    polygonpat = re.compile(POLYGON_PAT)
    # For polygon with only 4 coordinates instead of 5.
    polygonpat_4 = re.compile(POLYGON_PAT_4) 
    textpat = re.compile(TEXTPAT)
    # For font size.
    textpat_fontsize = re.compile(TEXTPAT_FONTSIZE)
    pathpat = re.compile(PATHPAT)
    retval = []
    for idx, linex in enumerate(defluffed_lines):
        if idx in logo_positions['target_indices']:
            retval.append({'type':'logo position'})
        else:
            if match := polygonpat.match(linex):
                newdict = {'type':'polygon',
                           'start':match.start(1),
                           'groups':[float(x) for x in match.groups()[1:]]}
                retval.append(newdict)
            elif match := polygonpat_4.match(linex):
                newdict = {'type':'polygon',
                           'start':match.start(1),
                           'groups':[float(x) for x in match.groups()[1:]]}
                retval.append(newdict)
            elif match := textpat.match(linex):
                newdict = {'type':'text',
                           'span':match.span(),
                           'start':match.start(1),
                           'groups':[float(x) for x in match.groups()[1:]]}
                match_fontsize = textpat_fontsize.match(linex)
                newdict['fontsize_start'] = match_fontsize.start(1)
                newdict['font-size'] = float(match_fontsize.groups()[1])
                newdict['fontsize_end'] = match_fontsize.span()[1]
                retval.append(newdict)
            elif match := pathpat.match(linex):
                span = match.span()
                newdict = {'type':'path',
                           'span':span,
                           'start':match.start(1),
                           'groups':[float(x) for x in match.groups()[1:]],
                           'tail':rf.parse_path(linex, span[1])}
                retval.append(newdict)
            else:
                retval.append({'type':'missed'})
    return retval

def scaled_elements(captured_values:list, scale_and_translation:dict) -> list:
    """
    Takes list of svg file line
    dictionaries and scales all
    the coordinates by a factor.

    Returns new list.
    """
    scale = scale_and_translation['scale']
    retval = []
    for linex in captured_values:
        if linex['type'] == 'missed':
            retval.append(linex)
        elif linex['type'] == 'logo position':
            retval.append(linex)
        elif 'groups' in linex:
            el = copy.deepcopy(linex)
            el['groups'] = [scale * x for x in el['groups']]
            # path
            if 'tail' in linex:
                el['tail'] = [[x * scale for x in n]
                              for n in el['tail']]
            if el['type'] == 'text':
               el['font-size'] = scale * el['font-size']
            retval.append(el)
    return retval

def translated_elements(scaled_elements:list, scale_and_translation:dict) -> list:
    """
    Takes list of svg file line
    dictionaries and translates all
    the coordinates by a distance
    
    scale_and_translation is a dictionary.

    Returns new list.
    """
    x_translation = scale_and_translation['x_translation_scaled']
    y_translation = scale_and_translation['y_translation_scaled']
    retval = []
    for linex in scaled_elements:
        el = copy.deepcopy(linex)
        if linex['type'] == 'missed':
            retval.append(el)
        elif linex['type'] == 'logo position':
            retval.append(linex)
        elif 'groups' in linex:
            el['groups'] = []
            for idx, num in enumerate(linex['groups']):
                if idx % 2 == 0:
                    el['groups'].append(num + x_translation)
                else:
                    el['groups'].append(num + y_translation)
            # path
            if 'tail' in linex:
                el['tail'] = []
                coord = []
                for coordx in linex['tail']:
                    el['tail'].append([coordx[0] + x_translation,
                                       coordx[1] + y_translation])
            retval.append(el)
    return retval

def hamilton_logo_data(hamiltonlogofile:str,
                       hamiltonlogo_coords:dict) -> dict:
    """
    Attempt to get relevant data for Hamilton logo svg file.

    Returns dictionary of information on svg.
    """
    retval = {}
    retval.update(hamiltonlogo_coords)
    with open(hamiltonlogofile, 'r') as f:
        originallines = [linex for linex in f]
    retval['originallines'] = originallines
    dimensionspat = re.compile(HAMILTON_LOGO_DIMENSIONS_PAT)
    # Second line.
    match = dimensionspat.match(originallines[1])
    # x, y
    retval['dimensions'] = [float(x) for x in match.groups()]
    return retval

def scale_and_translation_hamilton_logo(hamilton_logo_data:dict,
                                        logo_positions:dict,
                                        defluffed_lines:list,
                                        scale_and_translation:dict) -> dict:
    """
    Returns dictionary with scale and x and y
    translations for Hamilton logo.
    """
    retval = {}
    # index just before imaage position has polygon coords (rectangle).
    logo_poly_idx = logo_positions['target_indices'][0] - 1
    target_line = defluffed_lines[logo_poly_idx]
    pat = re.compile(POLYGON_PAT)
    match = pat.match(target_line)
    polycoords = [float(x) for x in match.groups()[1:]]
    x_coords = polycoords[0::2]
    y_coords = polycoords[1::2]
    # Need to scale and translate these.
    y_size = max(y_coords) - min(y_coords)
    y_size *= scale_and_translation['scale']
    scale = y_size / hamilton_logo_data['dimensions'][1]
    print('hamilton logo scale = {0:f}'.format(scale))
    retval['scale'] = scale
    retval['x_posit'] = (min(x_coords) *
                         scale_and_translation['scale'] +
                         scale_and_translation['x_translation_scaled'])
    retval['y_posit'] = (min(y_coords) *
                         scale_and_translation['scale'] +
                         scale_and_translation['y_translation_scaled'])
    # Get translation from upper left corner of logo
    retval['translate_x'] = retval['x_posit'] - hamilton_logo_data['min_x']
    retval['translate_y'] = retval['y_posit'] - hamilton_logo_data['min_y']
    # Translation of the path element remains in the original coordinate
    # space - not impacted by the scale transformation on the same svg line.
    retval['translate_x'] /= scale
    retval['translate_y'] /= scale
    return retval

def fauxcompany_logo_data(fauxcompanylogofile:str,
                          fauxcompanylogo_coords:dict) -> dict:
    """
    Attempt to get relevant data for the contrived
    company logo svg file.

    Returns dictionary of information on svg.
    """
    retval = {}
    retval.update(fauxcompanylogo_coords)
    with open(fauxcompanylogofile, 'r') as f:
        originallines = [linex for linex in f]
    retval['originallines'] = originallines
    dimensionspat = re.compile(FAUXCOMPANY_LOGO_DIMENSIONS_PAT)
    # Second line.
    match = dimensionspat.match(originallines[1])
    # x, y
    retval['dimensions'] = [float(x) for x in match.groups()]
    return retval

def scale_and_translation_fauxcompany_logo(fauxcompany_logo_data:dict,
                                           logo_positions:dict,
                                           defluffed_lines:list,
                                           scale_and_translation:dict) -> dict:
    """
    Returns dictionary with scale and x and y
    translations for contrived company logo.
    """
    retval = {}
    # index just before imaage position has polygon coords (rectangle).
    logo_poly_idx = logo_positions['target_indices'][1] - 1
    target_line = defluffed_lines[logo_poly_idx]
    pat = re.compile(POLYGON_PAT)
    match = pat.match(target_line)
    polycoords = [float(x) for x in match.groups()[1:]]
    x_coords = polycoords[0::2]
    y_coords = polycoords[1::2]
    # Need to scale and translate these.
    y_size = max(y_coords) - min(y_coords)
    y_size *= scale_and_translation['scale']
    scale = y_size / fauxcompany_logo_data['dimensions'][1]
    print('fauxcompany logo scale = {0:f}'.format(scale))
    retval['scale'] = scale
    # hack
    retval['scale'] /= 1.08
    retval['x_posit'] = (min(x_coords) *
                         scale_and_translation['scale'] +
                         scale_and_translation['x_translation_scaled'])
    retval['y_posit'] = (min(y_coords) *
                         scale_and_translation['scale'] +
                         scale_and_translation['y_translation_scaled'])
    # Get translation from upper left corner of logo
    retval['translate_x'] = retval['x_posit'] - fauxcompany_logo_data['min_x']
    retval['translate_y'] = retval['y_posit'] - fauxcompany_logo_data['min_y']
    # hack
    retval['translate_x'] += 24
    retval['translate_y'] += 17
    # Translation of the path element remains in the original coordinate
    # space - not impacted by the scale transformation on the same svg line.
    retval['translate_x'] /= scale
    retval['translate_y'] /= scale
    return retval

def svg_ready_hamilton_logo(scale_and_translation_hamilton_logo:dict,
                            hamilton_logo_data:dict) -> list:
    """
    Get list of strings for Hamilton logo svg
    to insert into final svg.
    """
    pat = re.compile(HAMILTON_CHANGE_LINE_PAT)
    retval = []
    # Adobe generated file has initial svg tag split
    # into two lines.
    for linex in hamilton_logo_data['originallines'][2:-1]:
        if pat.match(linex):
            # do thing.
            retval.append(linex[:-3] + 
                          HAMILTON_TRANSFORM_FMT.format(**scale_and_translation_hamilton_logo))
        else:
            retval.append(linex)
    return retval

def svg_ready_fauxcompany_logo(scale_and_translation_fauxcompany_logo:dict,
                               fauxcompany_logo_data:dict) -> list:
    """
    Get list of strings for fauxcompany logo svg
    to insert into final svg.
    """
    pat = re.compile(FAUXCOMPANY_CHANGE_LINE_PAT)
    retval = []
    # Inkscape generated file has initial svg tag split
    # into two lines.
    for linex in fauxcompany_logo_data['originallines'][2:-1]:
        if pat.match(linex):
            # do thing.
            retval.append(linex[:] + 
              FAUXCOMPANY_TRANSFORM_FMT.format(**scale_and_translation_fauxcompany_logo))
        else:
            retval.append(linex)
    return retval

def svg_ready_doc(translated_elements:list,
                  scale_and_translation:dict,
                  logo_positions:dict,
                  defluffed_lines:list,
                  svg_ready_hamilton_logo:list,
                  svg_ready_fauxcompany_logo:list) -> list:
    """
    Returns list of string lines of svg file
    ready to write.

    Inputs: list of dictionaries, dictionary, dictionary, list, list.
    """
    retval = []
    retval.append(NEW_FIRST_LINE.format(scale_and_translation['width'],
                                        # Add padding at bottom. CBT 2024-09-20
                                        scale_and_translation['height'] + 10))
    retval.append(NEW_SECOND_LINE.format(scale_and_translation['viewBox_x'] *
                                         scale_and_translation['scale'],
                                         scale_and_translation['y_translation_scaled']))
    for idx, text_values in enumerate(zip(defluffed_lines, translated_elements)):
        # First two lines already dealt with.
        if idx in (0, 1):
            continue
        # Skip images.
        if idx in logo_positions['target_indices']:
            # Dummy empty string. CBT 2024-09-24
            retval.append('')
        text = text_values[0]
        values = text_values[1]
        linestr = ''
        if values['type'] == 'polygon':
            linestr += text[:values['start']]
            # rectangle
            if len(values['groups']) == 10:
                linestr += POLYGON_STR.format(*values['groups'])
            # 4 coordinates (triangle).
            elif len(values['groups']) == 8:
                linestr += POLYGON_STR_4.format(*values['groups'])
            retval.append(linestr)
        elif values['type'] == 'text': 
            retval.append(text[:values['start']] +
                          TEXT_STR.format(*values['groups']) +
                          text[values['span'][1]:values['fontsize_start']] +
                          TEXT_STR_FONT.format(values['font-size']) +
                          text[values['fontsize_end']:])
        elif values['type'] == 'path': 
            linestr = text[:values['start']]
            linestr += PATH_START_STR.format(*values['groups'])
            pathstr = ''
            for coordx in values['tail']:
                pathstr += PATH_STR_SEGMENT.format(*coordx)
            linestr += PATH_STR.format(pathstr[:-1])
            retval.append(linestr)
    retval.append('')
    # Deal with logos here.
    # Company logo first.
    # "erase" box.
    target_index = logo_positions['target_indices'][1] - 1
    retval[target_index] = retval[target_index].replace('stroke="black"','stroke="none"')
    retval = (retval[:logo_positions['target_indices'][1]] +
              svg_ready_fauxcompany_logo +
              retval[logo_positions['target_indices'][1] + 1:])
    # Hamilton logo.
    # "erase" box.
    target_index = logo_positions['target_indices'][0] - 1
    retval[target_index] = retval[target_index].replace('stroke="black"','stroke="none"')
    retval = (retval[:logo_positions['target_indices'][0]] +
              svg_ready_hamilton_logo +
              retval[logo_positions['target_indices'][0] + 1:])
    return retval

def written_svg(svg_ready_doc:list,
                outputfile:str) -> str:
    """
    Write out doc.
    """
    with open(outputfile, 'w') as f:
        for linex in svg_ready_doc:
            f.write(linex)
    return outputfile tag

reusedfunctions.py


    # python 3.12

"""
Auxiliary module to Hamilton svg script.
"""

# reusedfunctions.py

import re

import pprint

# rest of points
COORDPATHPAT = (r'([-]?[0-9]+[.]?[0-9]*)[,]'
                r'([-]?[0-9]+[.]?[0-9]*)[ ]')

COORDPATH_ENDPAT = (r'([-]?[0-9]+[.]?[0-9]*)[,]'
                    r'([-]?[0-9]+[.]?[0-9]*)["][/][>]')

# For follow on points.
coordpathpat = re.compile(COORDPATHPAT)
# For final follow on point.
coordpath_endpath = re.compile(COORDPATH_ENDPAT)

def parse_path(pathstr, startposit):
    """
    Parse remainder of svg path element.

    pathstr string input format is

        261.26,-362.22 245.25,-355.88 229.64,-349.7 etc

    startposit is an integer.

    Returns list with path coordinate lists (floats).
    """
    retval = []
    while match := coordpathpat.match(pathstr, startposit):
        span = match.span()
        retval.append([float(x) for x in match.groups()])
        startposit = span[1]
    match = coordpath_endpath.match(pathstr[startposit:])
    retval.append([float(x) for x in match.groups()])
    return retval tag

Legend fauxcompany_logo_data dict scale_and_translation_fauxcompany_logo dict svg_ready_fauxcompany_logo list svg_ready_hamilton_logo list svg_ready_doc list written_svg str defluffed_lines list scale_and_translation_hamilton_logo dict scale_and_translation dict captured_values list logo_positions dict hamilton_logo_data dict scaled_elements list translated_elements list fauxcompanylogofile str fauxcompanylogo_coords dict svg_file str hamiltonlogofile str hamiltonlogo_coords dict outputfile str input function Legend scale_and_translation dict translated_elements list scaled_elements list svg_ready_doc list defluffed_lines list captured_values list written_svg str svg_file str outputfile str input function