Agribalyse is a French LCI database of agricultural products. It builds on top of ecoinvent 2.2. It was exported from SimaPro, so the names of ecoinvent processes are mangled, and need to be fixed back to standard ecoinvent.
This notebook uses Agribalyse 1.2, released in March 2015.
from brightway2 import *
Create a new project for this notebook
projects.set_current("Agribalyse")
Biosphere flow names follow the standard in ecoinvent 3.3. We will need to match these names to those in Agribalyse.
bw2setup()
path = "/Users/cmutel/Documents/LCA Documents/Ecoinvent/2.2/processes"
importer = SingleOutputEcospold1Importer(path, "ecoinvent 2.2")
importer.apply_strategies()
importer.statistics()
importer.write_database()
This notebook uses the ecospold 1 version of Agribalyse, but the SimaPro CSV version should be quite similar, it would just use a different Importer
class.
We only need to give the directory, the Importer
will find the XML file.
path = "/Users/cmutel/Documents/LCA Documents/Agribalyse"
ag = SingleOutputEcospold1Importer(path, "Agribalyse 1.2")
ag.apply_strategies()
ag.statistics()
This is quite a lot of linking problems. Let's export the unlinked exchanges to a spreadsheet so we can browse them.
ag.write_excel(True)
One obvious problem is the names of biosphere flows changed from ecoinvent 2 to ecoinvent 3, and SimaPro uses another set of biosphere names and categories.
Let's fix the SimaPro-specific problems first.
from bw2io.strategies.simapro import normalize_simapro_biosphere_categories, normalize_simapro_biosphere_names
ag.apply_strategy(normalize_simapro_biosphere_categories)
ag.apply_strategy(normalize_simapro_biosphere_names)
We have modified the source data, but still need to try to link to the biosphere database.
Read more about currying functions if this is new to you.
from bw2io.strategies import link_iterable_by_fields
import functools
ag.apply_strategy(functools.partial(link_iterable_by_fields, other=Database("biosphere3"), kind="biosphere"))
ag.statistics()
That solved 70% of the biosphere flows, but there are still many unmatched flows. Again, we export the full list of unmatched exchanges.
ag.write_excel(True)
The remaining unlinked biosphere flows can't be linked, because they don't exist in our biosphere database. This isn't the end of the world - we can add these new flows - but it does mean that they won't be assessed by our current LCIA methods.
You can search the biosphere database to see what is in the current biosphere database using the search function:
Database("biosphere3").search("nitrogen")
We add these missing biosphere flows. We could add them to the default biosphere3
database, but it is cleaner to create a new database with just the new flows added for Agribalyse.
Database("Agribalyse new biosphere").register()
ag.add_unlinked_flows_to_biosphere_database("Agribalyse new biosphere")
We should now have no unlinked biosphere flows:
ag.statistics()
This is a weird one - production exchanges represent the flow produced by an activity, and should have the exact same name (because this is the standard in ecospold 1 - in ecospold 2 there is a difference between activity and product names).
Let's look at the data for an unlinked production exchange and its activity. We are trying to figure out which field is different. We pick the first exchange in our spreadsheet.
def get_unlinked(data):
for ds in ag.data:
for exc in ds['exchanges']:
if exc['type'] == 'production' and exc['name'] == 'Alfalfa, conventional, for animal feeding, at farm gate':
return ds, exc
ds, exc = get_unlinked(ag.data)
for field in ('name', 'unit', 'location', 'categories'):
print(field)
print("\tActivity:", field, ds.get(field))
print("\tProduct:", field, exc.get(field))
In this case, for whatever reason, the categories
are different. The solution is to link without using the categories
field. This strategy is smart - if excluding categories
led to multiple possible links, it would raise an error instead of linking the (possibly) incorrect activity.
from bw2io.strategies import link_technosphere_based_on_name_unit_location
ag.apply_strategy(link_technosphere_based_on_name_unit_location)
ag.statistics()
Still a few problems. Let's look at one of them:
def get_unlinked(data):
for ds in ag.data:
for exc in ds['exchanges']:
if exc['type'] == 'production' and not exc.get('input'):
return ds, exc
ds, exc = get_unlinked(ag.data)
for field in ('name', 'unit', 'location', 'categories'):
print(field)
print("\tActivity:", field, ds.get(field))
print("\tProduct:", field, exc.get(field))
All the remaining outputs are disposal or recycling processes.
for exc in ag.unlinked:
if exc['type'] == "production":
print(exc['name'])
The disposal processes are in ecoinvent, but the recycling processes aren't.
Database("ecoinvent 2.2").search("Disposal, plastics, mixture")
Database("ecoinvent 2.2").search("recycling mixed plastics")
We have to be a little careful here. SimaPro considers these exchanges outputs, but ecoinvent models disposal as in input (you consume the disposal service). The easiest way to handle this is to simply change these outputs into inputs, which will fix the sign.
Note that we can't use ag.unlinked
, as this only gives each unlinked exchange once, not every time it appears in the original data.
for ds in ag.data:
for exc in ds['exchanges']:
if exc['type'] == 'production' and not exc.get('input'):
print("Fixing:", exc['name'])
exc['type'] = 'technosphere'
We will leave the recycling processes alone for now; first, we will fix all the ecoinvent links, including the disposal ones, and then we will get back to recycling.
Looking at the spreadsheet, you notice that there is no categories
field for any of the inputs. By default, categories
is used when linking, so if ecoinvent 2.2 has the categories
field (it does), then no suitable link will be found.
This is a common problem with SimaPro, and we already know have a strategy to handle it already. We will try to fix both the internal links and the links to ecoinvent 2.2.
ag.apply_strategy(link_technosphere_based_on_name_unit_location)
ag.apply_strategy(functools.partial(link_technosphere_based_on_name_unit_location, external_db_name="ecoinvent 2.2"))
ag.statistics()
So, that was relatively simple.
The recycling processes don't exist, and don't have any impact, so the easiest way to handle these exchanges is to create new activities that produce the recycling flows. Luckily we have a method that does that for us. Note that the new recycling activities will be created in the Agribalyse database.
ag.add_unlinked_activities()
ag.statistics()
We are finished with the importing process.
ag.write_database()
OK, that is not good. The unique identifying codes for the activities come from the source data, which wouldn't be so foolish as to give non-unique identifiers to activities in the same export file, would it? Let's look at the codes.
print(len({ds['code'] for ds in ag.data}), len(ag.data))
print({ds['code'] for ds in ag.data})
That is not good. 826 activities, and only 265 unique codes. Let's look at the source data:
<dataset number="28" timestamp="2015-02-22T17:27:17" generator="SimaPro 8.0.3.14">
<referenceFunction name="Bovine feed,MAT18, at farm gate">
<dataset number="28" timestamp="2014-12-21T14:10:26" generator="SimaPro 8.0.3.14">
<referenceFunction name="Greenhouse, glass walls and roof, plastic tubes">
<dataset number="28" timestamp="2013-09-18T16:53:22" generator="CDT V1.2">
<referenceFunction name="Harrowing, with rotary harrow (standard equipment)">
We need to add unique codes. We have a strategy for this, set_code_by_activity_hash
, but it won't overwrite codes already present. We can fix that :)
for ds in ag.data:
del ds['code']
from bw2io.strategies import set_code_by_activity_hash
ag.apply_strategy(set_code_by_activity_hash)
Only the internal links will need to be redone - the links to ecoinvent 2.2 and the biosphere database are fine.
We can't use link_technosphere_based_on_name_unit_location
, because we need to pass the parameter relink
.
ag.apply_strategy(functools.partial(
link_iterable_by_fields,
other=ag.data,
fields=('name', 'location', 'unit'),
relink=True
))
ag.statistics()
We are now ready to try again.
ag.write_database()
We need to do some basic validation to make sure we have meaningful results. Here I just do some basic testing, but you should validate against known scores if you are frequently using this database. The following code is rahter simple and is not a real validation check.
gwp = [x for x in methods if "IPCC 2013" in str(x)][0]
gwp
db = Database("Agribalyse 1.2")
lca = LCA({db.random(): 1}, gwp)
lca.lci(factorize=True)
lca.lcia()
lca.score
Let's calculate the LCIA scores of all activities in Agribalyse
import pyprind
scores = []
for act in pyprind.prog_bar(db):
lca.redo_lcia({act: 1})
scores.append(lca.score)
import numpy as np
scores = np.array(scores)
mask = scores == 0
print(mask.sum(), len(db))
scores = scores[~mask]
%matplotlib notebook
import seaborn as sns
sns.distplot(scores)
We have imported the Agribalyse database. In the process of importing, we found and resolved several problems:
categories
, as this field is not given consistently in SimaPro exports.categories
, because SimaPro.This was a bit of a pain, but compared to other database exports, was actually not all that difficult. This is the sad truth of LCA data compatibility - it currently isn't all that great.