Decompressing feather files

41 views
Skip to first unread message

seriangelo agriesti

unread,
Apr 14, 2025, 12:14:44 PMApr 14
to AequilibraE
Hello Pedro,
We are using Aequilibrae for our research work and so far all is going smoothly!
I would need to check with the community one thing though, as we need to store the new shortest path calculated at each iteration of the bfw assignment. I see that this information is stored in the .feather files and am trying to convert it into an interpretable .csv file. To do so, I have adapted part of the code shared from Kai Tang (from another thread) but in said thread it is mentioned there is a mistake somewhere in his code, which is not exactly pinpointed. I think the mistake is in the assignment part, not in the .feather part, but need to be sure.

I would need to check with you then if the following lines of code work as intended, or if instead I am incorporating a mistake without realizing:

paths_ = r'C:\Users\Aequilibrae\Sioux\path_files'
shutil.move(r'C:\Users\Aequilibrae\Sioux\path_files\f61bc3183e4b4cab82b7f5e6d5513b3d\correspondence_cc_c.feather', r'C:\Users\Aequilibrae\Sioux\path_files\zzcorrespondence_cc_c.feather')
shutil.move(r'C:\Users\Aequilibrae\Sioux\path_files\f61bc3183e4b4cab82b7f5e6d5513b3d\nodes_to_indices_cc_c.feather', r'C:\Users\Aequilibrae\Sioux\path_files\zznodes_to_indices_cc_c.feather')

def parse_path_file(path_fldr=None, mode_name=None, cen_array=None):
    processor_id = os.listdir(path_fldr)[0] ### this gives the id of the folder in path_files
    iter_len = len(os.listdir(os.path.join(path_fldr, processor_id))) # this is equal to the number of iterations
    map_dict = {}
    map_node_path = r'C:\Users\Aequilibrae\Sioux\path_files\zznodes_to_indices_cc_c.feather'
    map_link_path = r'C:\Users\Aequilibrae\Sioux\path_files\zzcorrespondence_cc_c.feather'
    map_link_df = pd.read_feather(map_link_path)
    map_node_df = pd.read_feather(map_node_path)
    node_new_origin_dict = {n: o for n, o in zip(map_node_df['node_index'], map_node_df.index)}
    map_link_df['__compressed_id__'] = map_link_df['__compressed_id__'].astype(int)
    compressed_link_ft_dict = {int(l): (node_new_origin_dict[int(f)], node_new_origin_dict[int(t)])
                               for l, f, t in zip(map_link_df['__compressed_id__'],
                                                  map_link_df['a_node'],
                                                  map_link_df['b_node'])}    
    last_iter_path_fldr = os.path.join(path_fldr, processor_id, rf'iter{iter_len}')
    path_res_df = pd.DataFrame()
   
    for t in range(0, iter_len):
        curr_iter = os.path.join(path_fldr, processor_id, rf'iter{t+1}')
        for i in range(0, len(cen_array)):

            path_file_path = os.path.join(curr_iter, rf'path_cc_c', rf'o{i}.feather')
            index_file_path = os.path.join(curr_iter, rf'path_cc_c',
                                        rf'o{i}_indexdata.feather')

            feather_path_df = pd.read_feather(path_file_path)
            feather_index_df = pd.read_feather(index_file_path)

            path_df = generate_path_from_feather(feather_path_df=feather_path_df,
                                                feather_index_df=feather_index_df,
                                                compressed_link_ft_dict=compressed_link_ft_dict)
            path_df['origins'] = i
            path_df['class_type'] = mode_name
            path_df['origins'] = path_df['origins'].apply(lambda x: node_new_origin_dict[x])
            path_df['destinations'] = path_df['destinations'].apply(lambda x: node_new_origin_dict[x])

            path_res_df = path_res_df._append(path_df)


    return path_res_df[['origins', 'destinations', 'link_seq']]


def generate_path_from_feather(feather_path_df=None, feather_index_df=None, compressed_link_ft_dict=None):
    feather_path_df['ft'] = feather_path_df['data'].apply(lambda x: compressed_link_ft_dict[x])

    feather_index_df = feather_index_df.reset_index(drop=False).rename(columns={'index': 'destinations', 'data': 'to'})
    feather_index_df['from'] = feather_index_df['to'].shift(1)
    feather_index_df['from'] = feather_index_df['from'].fillna(0)
    feather_index_df['from'] = feather_index_df['from'].astype(int)
    feather_index_df['link_seq'] = feather_index_df[['from', 'to']].apply(
        lambda x: feather_path_df.loc[x[0]: x[1] - 1, 'ft'].to_list()[::-1] if x[1] != x[0] else [], axis=1)
    return feather_index_df[['destinations', 'link_seq']]


path_df = parse_path_file(path_fldr=paths_,
                            mode_name='tc_car', cen_array=[i for i in range(0, 24)])

path_df.to_csv("paths.csv")

Would this work in your opinion? Not asking for a full debugging but a feedback about the approach (first time working with .feather files for me). Also, if Kai Tang ever reads this thread, I thank him for sharing the code :)

And thank you and the team Pedro, for the amazing job.
Best,
Serio

Pedro Camargo

unread,
Apr 14, 2025, 4:36:39 PMApr 14
to AequilibraE
Hi Serio,

The process would definitely look something like that (I have not looked long into the code), but I would recommend you continue working with efficient binary formats such as feather or even hdf5, as these outputs would be huge in disk.

Cheers,
Pedro



---- On Tue, 15 Apr 2025 02:14:44 +1000 seriangelo agriesti <seria...@gmail.com> wrote ---

--
You received this message because you are subscribed to the Google Groups "AequilibraE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aequilibrae...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/aequilibrae/ebe00da2-b9ac-4824-9623-446d6317c72cn%40googlegroups.com.


seriangelo agriesti

unread,
Apr 15, 2025, 5:18:06 AMApr 15
to AequilibraE
Thank you for the feedback and the suggestion!
Currently we are doing some research on the FW algorithm itself, and for it we need an interpretable version of the paths at each iteration. Unfortunately, I haven't found any comprehensive guide on how to interpret .feather files, that is why I am turning to .csv for the time being. But I'll keep trying :) 
Thank you again.

Jamie Cook

unread,
May 8, 2025, 10:44:14 PMMay 8
to seriangelo agriesti, AequilibraE
Serio, 

I think that the code you shared is a very good example of "interpreting feather files" - Pedro is suggesting that you build the functionality for reading them into your process rather than converting them to csv. If your process works with csv files it will be significantly slower to read in the data compared to just leaving the data in feather format. 

Good luck which ever way you decide to go.



seriangelo agriesti

unread,
May 12, 2025, 9:35:00 AMMay 12
to AequilibraE
Thank you Jamie, I understand and it makes lots of sense :)

seriangelo agriesti

unread,
May 12, 2025, 9:46:42 AMMay 12
to AequilibraE
Sorry for the double message, but I actually need to verify one key thing. I have adapted the code used to calculate turn volumes (turn_volumes_results.py https://github.com/AequilibraE/aequilibrae/pull/358), to maintain the OD relation in the output. This way, I can get the "step" that shifts volumes to the new shortest path at each iteration and the path-based volumes.

As it shows in the code, I am using the alpha and betas from the report (https://www.aequilibrae.com/docs/python/V.1.1.4/useful_links/_generated/aequilibrae.paths.TrafficAssignment.html#aequilibrae.paths.TrafficAssignment.report) as I am assuming that the "wideness" of the step in Frank-Wolfe is the same across the network and only the values of the blended volumes from previous iterations change for each link. Is this correct? Or does Aequilibrae aggregate more detailed results into the final betas and alpha in the report? 

Here is the function. df is attached to the email. I am not asking for a debugging and am attaching the code only as an attempt to clarify what I am asking :)

Best, Serio

def calculate_volume(self, df: pd.DataFrame, ta_report: pd.DataFrame) -> pd.Series:
        iterations = df["iteration"].max()
        grouping_cols = ["origin", "destination","a","b","c"]
        blended_list = []
        for group_vals, group_df in df.groupby(grouping_cols):
            aon_volume = group_df.set_index("iteration")[["origin", "destination", "demand"]].sort_index()
            blended_volumes = pd.Series(data=0, index=aon_volume.index)
            blended_volumes.loc[1] = aon_volume.loc[1, "demand"] # extract all rows with iteration == 1
            for it in range(2, iterations+1):
                betas_for_it = pd.Series(ta_report.loc[it, ["beta0", "beta1", "beta2"]]).sort_index(ascending=True)
                alpha_for_it = ta_report.at[it, "alpha"]
                if (betas_for_it != -1).any():
                    min_idx = max(0, it - betas_for_it.size) + 1
                    max_idx = min_idx + min(it, betas_for_it.size)
                    window = range(min_idx, max_idx)
                    it_volume = (aon_volume.loc[window]["demand"] * betas_for_it[0 : min(it, betas_for_it.size)].values).sum()
                else:
                    it_volume = aon_volume.loc[it]
                group_df.loc[group_df["iteration"] == it, "step"]=(it_volume * alpha_for_it) + (blended_volumes.loc[it - 1] * (1 - alpha_for_it))
                blended_volumes.loc[it]=(it_volume * alpha_for_it) + (blended_volumes.loc[it - 1] * (1 - alpha_for_it))
        return blended_volumes

calculate_vol_input_m.csv

Pedro Camargo

unread,
May 14, 2025, 6:21:23 PMMay 14
to AequilibraE
Hi Serio,

I am not sure I understand your question, but an All-or-nothing iteration in FW is the linear approximation of the optimization problem at that point and, as such, it would make no sense to blen the current traffic with the AoN iteration in any funky way: i.e. yes, it is a single factor for all of the network.

Pedro



---- On Mon, 12 May 2025 23:46:42 +1000 seriangelo agriesti <seria...@gmail.com> wrote ---

seriangelo agriesti

unread,
May 19, 2025, 10:47:04 AMMay 19
to AequilibraE
Hello Pedro,
Thank you for the reply! That is quite what I expected (the single factor for all the network - in the end I need to quantify the step size for each path separately) but I really needed to be sure. 
Best,
Serio

Reply all
Reply to author
Forward
0 new messages